featurePSSM {BioSeqClass} | R Documentation |
A set of functions for extract features from biological sequences, and coding features by numeric vector.
featurePSSM(seq, start.pos, stop.pos, psiblast.path, database.path)
seq |
a string vector for the protein, DNA, or RNA sequences. |
start.pos |
a integer vector denoting the start position of the fragment window. |
stop.pos |
a integer vector denoting the stop position of the fragment window. |
psiblast.path |
a string for the path of blastpgp program. blastpgp will be employed to do PSI-BLAST and get Position-Specific Scoring Matrix. |
database.path |
a string for the path of a formated reference database. Database can be formated by "formatdb" program. |
featurePSSM
returns a matrix with 20*N+N columns. Each row
represented features of one sequence coding by a 20*N+N dimension numeric
vector generated by PSI-BLAST. It contains two kinds of fatures: normalized
position-specific score of PSSM (Position-Specific Scoring Matrix), Shannon
entropy for each position of WOP (weighted observed percentages). Program
PSI-BLAST and formatted NCBI non-redundant protein database are needed.
Hong Li
if(interactive()){ file = file.path(path.package("BioSeqClass"), "example", "acetylation_K.fasta") tmp = readAAStringSet(file) proteinSeq = as.character(tmp) ## Need "blastpgp" program and a formated database. Database can be formated by "formatdb" program. PSSM1 = featurePSSM(proteinSeq[1:2], start.pos=rep(1,2), stop.pos=rep(10,2), psiblast.path="blastpgp", database.path="./result1.fasta") }