To install and load the package, run:
peco uses SingleCellExperiment class objects.
library(peco)
library(SingleCellExperiment)
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     Filter, Find, Map, Position, Reduce, anyDuplicated, aperm, append,
#>     as.data.frame, basename, cbind, colnames, dirname, do.call,
#>     duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
#>     lapply, mapply, match, mget, order, paste, pmax, pmax.int, pmin,
#>     pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table,
#>     tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#> 
#>     findMatches
#> The following objects are masked from 'package:base':
#> 
#>     I, expand.grid, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#> 
#>     rowMedians
#> The following objects are masked from 'package:matrixStats':
#> 
#>     anyMissing, rowMedians
library(doParallel)
#> Loading required package: foreach
#> Loading required package: iterators
#> Loading required package: parallel
library(foreach)peco is a supervised approach for PrEdicting cell cycle phase in a COntinuum using single-cell RNA sequencing data. The R package provides functions to build training dataset and also functions to use existing training data to predict cell cycle on a continuum.
Our work demonstrated that peco is able to predict continuous cell cylce phase using a small set of cylcic genes: CDK1, UBE2C, TOP2A, HISTH1E, and HISTH1C (identified as cell cycle marker genes in studies of yeast (Spellman et al., 1998) and HeLa cells (Whitfield et al., 2002)).
Below we provide two use cases. Vignette 1 shows how to use the built-training dataset to predict continuous cell cycle. Vignette 2 shows how to make a training datast and build a predictor using training data.
Users can also view the vigenettes via browseVignettes("peco").
training_human stores built-in training data of 101 significant cyclic genes. Below are the slots contained in training_human:
predict.yy: a gene by sample matrix (101 by 888) that stores predict cyclic expression values.cellcycle_peco_reordered: cell cycle phase in a unit circle (angle), ordered from 0 to 2\(pi\)cellcycle_function: lists of 101 function corresponding to the top 101 cyclic genes identified in our datasetsigma: standard error associated with cyclic trends of gene expressionpve: proportion of variance explained by the cyclic trendpeco is integrated with SingleCellExperiment object in Bioconductor. Below shows an example of inputting SingleCellExperiment object to perform cell cycle phase prediction.
sce_top101genes includes 101 genes and 888 single-cell samples and one assay slot of counts.
Transform the expression values to quantile-normalizesd counts-per-million values. peco uses the cpm_quantNormed slot as input data for predictions.
sce_top101genes <- data_transform_quantile(sce_top101genes)
#> computing on 2 cores
assays(sce_top101genes)
#> List of length 3
#> names(3): counts cpm cpm_quantNormedApply the prediction model using function cycle_npreg_outsample and generate prediction results contained in a list object pred_top101genes.
pred_top101genes <- cycle_npreg_outsample(
    Y_test=sce_top101genes,
    sigma_est=training_human$sigma[rownames(sce_top101genes),],
    funs_est=training_human$cellcycle_function[rownames(sce_top101genes)],
    method.trend="trendfilter",
    ncores=1,
    get_trend_estimates=FALSE)The pred_top101genes$Y contains a SingleCellExperiment object with the predict cell cycle phase in the colData slot.
head(colData(pred_top101genes$Y)$cellcycle_peco)
#> 20170905-A01 20170905-A02 20170905-A03 20170905-A06 20170905-A07 20170905-A08 
#>     1.099557     4.680973     2.481858     4.303982     4.052655     1.413717Visualize results of prediction for one gene. Below we choose CDK1 (“ENSG00000170312”). Because CDK1 is a known cell cycle gene, this visualization serves as a sanity check for the results of fitting. The fitted function training_human$cellcycle_function[[1]] was obtained from our training data.
plot(y=assay(pred_top101genes$Y,"cpm_quantNormed")["ENSG00000170312",],
     x=colData(pred_top101genes$Y)$theta_shifted, main = "CDK1",
     ylab = "quantile normalized expression")
points(y=training_human$cellcycle_function[["ENSG00000170312"]](seq(0,2*pi, length.out=100)),
       x=seq(0,2*pi, length.out=100), col = "blue", pch =16)Visualize results of prediction for the top 10 genesone genes. Use fit_cyclical_many to estimate cyclic function based on the input data.
# predicted cell time in the input data
theta_predict = colData(pred_top101genes$Y)$cellcycle_peco
names(theta_predict) = rownames(colData(pred_top101genes$Y))
# expression values of 10 genes in the input data
yy_input = assay(pred_top101genes$Y,"cpm_quantNormed")[1:6,]
# apply trendfilter to estimate cyclic gene expression trend
fit_cyclic <- fit_cyclical_many(Y=yy_input, 
                                theta=theta_predict)
#> computing on 2 cores
gene_symbols = rowData(pred_top101genes$Y)$hgnc[rownames(yy_input)]
par(mfrow=c(2,3))
for (i in 1:6) {
plot(y=yy_input[i,],
     x=fit_cyclic$cellcycle_peco_ordered, 
     main = gene_symbols[i],
     ylab = "quantile normalized expression")
points(y=fit_cyclic$cellcycle_function[[i]](seq(0,2*pi, length.out=100)),
       x=seq(0,2*pi, length.out=100), col = "blue", pch =16)
}sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /home/biocbuild/bbs-3.18-bioc/R/lib/libRblas.so 
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_GB              LC_COLLATE=C              
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: America/New_York
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
#> [8] methods   base     
#> 
#> other attached packages:
#>  [1] doParallel_1.0.17           iterators_1.0.14           
#>  [3] foreach_1.5.2               SingleCellExperiment_1.24.0
#>  [5] SummarizedExperiment_1.32.0 Biobase_2.62.0             
#>  [7] GenomicRanges_1.54.0        GenomeInfoDb_1.38.0        
#>  [9] IRanges_2.36.0              S4Vectors_0.40.0           
#> [11] BiocGenerics_0.48.0         MatrixGenerics_1.14.0      
#> [13] matrixStats_1.0.0           peco_1.14.0                
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.2.0          viridisLite_0.4.2        
#>  [3] vipor_0.4.5               dplyr_1.1.3              
#>  [5] viridis_0.6.4             bitops_1.0-7             
#>  [7] fastmap_1.1.1             RCurl_1.98-1.12          
#>  [9] pracma_2.4.2              digest_0.6.33            
#> [11] rsvd_1.0.5                lifecycle_1.0.3          
#> [13] magrittr_2.0.3            compiler_4.3.1           
#> [15] rlang_1.1.1               sass_0.4.7               
#> [17] tools_4.3.1               igraph_1.5.1             
#> [19] utf8_1.2.4                yaml_2.3.7               
#> [21] knitr_1.44                S4Arrays_1.2.0           
#> [23] DelayedArray_0.28.0       abind_1.4-5              
#> [25] BiocParallel_1.36.0       grid_4.3.1               
#> [27] fansi_1.0.5               beachmat_2.18.0          
#> [29] colorspace_2.1-0          ggplot2_3.4.4            
#> [31] scales_1.2.1              cli_3.6.1                
#> [33] mvtnorm_1.2-3             rmarkdown_2.25           
#> [35] crayon_1.5.2              generics_0.1.3           
#> [37] DelayedMatrixStats_1.24.0 genlasso_1.6.1           
#> [39] scuttle_1.12.0            ggbeeswarm_0.7.2         
#> [41] cachem_1.0.8              geigen_2.3               
#> [43] zlibbioc_1.48.0           assertthat_0.2.1         
#> [45] XVector_0.42.0            vctrs_0.6.4              
#> [47] boot_1.3-28.1             Matrix_1.6-1.1           
#> [49] jsonlite_1.8.7            BiocSingular_1.18.0      
#> [51] BiocNeighbors_1.20.0      ggrepel_0.9.4            
#> [53] irlba_2.3.5.1             beeswarm_0.4.0           
#> [55] scater_1.30.0             jquerylib_0.1.4          
#> [57] glue_1.6.2                codetools_0.2-19         
#> [59] gtable_0.3.4              circular_0.5-0           
#> [61] ScaledMatrix_1.10.0       munsell_0.5.0            
#> [63] tibble_3.2.1              pillar_1.9.0             
#> [65] htmltools_0.5.6.1         conicfit_1.0.4           
#> [67] GenomeInfoDbData_1.2.11   R6_2.5.1                 
#> [69] sparseMatrixStats_1.14.0  evaluate_0.22            
#> [71] lattice_0.22-5            bslib_0.5.1              
#> [73] Rcpp_1.0.11               gridExtra_2.3            
#> [75] SparseArray_1.2.0         xfun_0.40                
#> [77] pkgconfig_2.0.3