subdivideData {transite} | R Documentation |
Preprocessing function for SPMA, divides transcript sequences into n bins.
subdivideData(background.set, n.bins = 40)
background.set |
character vector of named sequences (names are usually RefSeq identifiers and sequence region labels, e.g., "NM_1_DUMMY|3UTR"). It is important that the sequences are already sorted by fold change, signal-to-noise ratio or any other meaningful measure. |
n.bins |
specifies the number of bins in which the sequences will be divided, valid values are between 7 and 100 |
An array of n.bins
length, containing the binned sequences
Other SPMA functions: runKmerSPMA
,
runMatrixSPMA
, scoreSpectrum
,
spectrumClassifier
# toy example toy.background.set <- c( "CAACAGCCUUAAUU", "CAGUCAAGACUCC", "CUUUGGGGAAU", "UCAUUUUAUUAAA", "AAUUGGUGUCUGGAUACUUCCCUGUACAU", "AUCAAAUUA", "AGAU", "GACACUUAAAGAUCCU", "UAGCAUUAACUUAAUG", "AUGGA", "GAAGAGUGCUCA", "AUAGAC", "AGUUC", "CCAGUAA" ) # names are used as keys in the hash table (cached version only) # ideally sequence identifiers (e.g., RefSeq ids) and # sequence region labels (e.g., 3UTR for 3'-UTR) names(toy.background.set) <- c( "NM_1_DUMMY|3UTR", "NM_2_DUMMY|3UTR", "NM_3_DUMMY|3UTR", "NM_4_DUMMY|3UTR", "NM_5_DUMMY|3UTR", "NM_6_DUMMY|3UTR", "NM_7_DUMMY|3UTR", "NM_8_DUMMY|3UTR", "NM_9_DUMMY|3UTR", "NM_10_DUMMY|3UTR", "NM_11_DUMMY|3UTR", "NM_12_DUMMY|3UTR", "NM_13_DUMMY|3UTR", "NM_14_DUMMY|3UTR" ) foreground.sets <- subdivideData(toy.background.set, n.bins = 7) # example data set background.df <- transite:::ge$background # sort sequences by signal-to-noise ratio background.df <- dplyr::arrange(background.df, value) # character vector of named sequences background.set <- background.df$seq names(background.set) <- paste0(background.df$refseq, "|", background.df$seq.type) foreground.sets <- subdivideData(background.set)