token_vector {ttgsea} | R Documentation |
A vectorization of words or tokens of text is necessary for machine learning. Vectorized sequences are padded or truncated.
token_vector(text, token, length_seq)
text |
text data |
token |
result of tokenization (output of "text_token") |
length_seq |
length of input sequences |
sequences of integers
Dongmin Jung
tm::removeWords, stopwords::stopwords, textstem::lemmatize_strings, tokenizers::tokenize_ngrams, keras::pad_sequences
library(reticulate) if (keras::is_keras_available() & reticulate::py_available()) { library(fgsea) data(examplePathways) data(exampleRanks) names(examplePathways) <- gsub("_", " ", substr(names(examplePathways), 9, 1000)) set.seed(1) fgseaRes <- fgsea(examplePathways, exampleRanks) tokens <- text_token(data.frame(fgseaRes)[,"pathway"], num_tokens = 1000) sequences <- token_vector("Cell Cycle", tokens, 10) }