text_token {ttgsea} | R Documentation |
An n-gram is used for tokenization. This function can also be used to limit the total number of tokens.
text_token(text, ngram_min = 1, ngram_max = 1, num_tokens)
text |
text data |
ngram_min |
minimum size of an n-gram (default: 1) |
ngram_max |
maximum size of an n-gram (default: 1) |
num_tokens |
maximum number of tokens |
token |
result of tokenizing text |
ngram_min |
minimum size of an n-gram |
ngram_max |
maximum size of an n-gram |
Dongmin Jung
tm::removeWords, stopwords::stopwords, textstem::lemmatize_strings, text2vec::create_vocabulary, text2vec::prune_vocabulary
library(fgsea) data(examplePathways) data(exampleRanks) names(examplePathways) <- gsub("_", " ", substr(names(examplePathways), 9, 1000)) set.seed(1) fgseaRes <- fgsea(examplePathways, exampleRanks) tokens <- text_token(data.frame(fgseaRes)[,"pathway"], num_tokens = 1000)