scMAGeCK is a computational model to identify genes associated with multiple expression phenotypes from CRISPR screening coupled with single-cell RNA sequencing data (CROP-seq).
scMAGeCK is based on our previous MAGeCK and MAGeCK-VISPR models for pooled CRISPR screens, but further extends to scRNA-seq as the readout of the screening experiment. scMAGeCK consists of two modules: scMAGeCK-Robust Rank Aggregation (RRA), a sensitive and precise algorithm to detect genes whose perturbation links to one single marker expression; and scMAGeCK-LR, a linear-regression based approach that unravels the perturbation effects on thousands of gene expressions, especially from cells undergo multiple perturbations.
    library(scMAGeCK)
    ### BARCODE file contains cell identity information, generated from the cell identity collection step
    BARCODE <- system.file("extdata","barcode_rec.txt",package = "scMAGeCK")
    
    ### RDS can be a Seurat object or local RDS file path that contains the scRNA-seq dataset
    RDS <- system.file("extdata","singles_dox_mki67_v3.RDS",package = "scMAGeCK")
    
    ### Set RRA executable file path. 
    ###    You can generate RRA executable file by following commands:
    ###        wget https://bitbucket.org/weililab/scmageck/downloads/RRA_0.5.9.zip
    ###        unzip RRA_0.5.9.zip
    ###        cd RRA_0.5.9
    ###        make
    RRAPATH <- "/Library/RRA_0.5.9/bin/RRA"
    
    target_gene <- "MKI67"
    
    rra_result <- scmageck_rra(BARCODE=BARCODE, RDS=RDS, GENE=target_gene, RRAPATH=RRAPATH, 
             LABEL='dox_mki67', NEGCTRL=NULL, KEEPTMP=FALSE, PATHWAY=FALSE, SAVEPATH=NULL)## Checking RRA...## RRA is not does not exist! Please check RRA executable file path    library(scMAGeCK)
    ### BARCODE file contains cell identity information, generated from the cell identity collection step
    BARCODE <- system.file("extdata","barcode_rec.txt",package = "scMAGeCK")
    ### RDS can be a Seurat object or local RDS file path that contains the scRNA-seq dataset
    RDS <- system.file("extdata","singles_dox_mki67_v3.RDS",package = "scMAGeCK")
    
    lr_result <- scmageck_lr(BARCODE=BARCODE, RDS=RDS, LABEL='dox_scmageck_lr', 
            NEGCTRL = 'NonTargetingControlGuideForHuman', PERMUTATION = 1000, SAVEPATH=NULL, LAMBDA=0.01)## Total barcode records: 8425## Neg Ctrl guide: NonTargetingControlGuideForHuman## Reading RDS file: /tmp/RtmpcxY7ur/Rinst159d52acc223/scMAGeCK/extdata/singles_dox_mki67_v3.RDS## Cell names in expression matrix and barcode file do not match. Try to remove possible trailing "-1"s...## 6704 ...## 6229 ...## Index matrix dimension: 5698 , 30## Selected genes: 25## Permutation: 100 / 1000 ...## Permutation: 200 / 1000 ...## Permutation: 300 / 1000 ...## Permutation: 400 / 1000 ...## Permutation: 500 / 1000 ...## Permutation: 600 / 1000 ...## Permutation: 700 / 1000 ...## Permutation: 800 / 1000 ...## Permutation: 900 / 1000 ...## Permutation: 1000 / 1000 ...The scmageck_rra function will output the ranking and p values of each perturbed genes, using the RRA program in MAGeCK. Users familiar with the MAGeCK program may find it similar with the gene_summary output in MAGeCK.
Here is the example output of scMAGeCK-RRA:
Row.names  items_in_group.low  lo_value.low  p.low  FDR.low goodsgrna.low  items_in_group.high  lo_value.high  p.high  FDR.high  goodsgrna.high
TP53    271     0.11832 0.95619 1       48      271     1.014e-83       4.9975e-06      0.00015 184Explanations of each column are below:
| Column | Content | 
|---|---|
| Row.names | Perturbed gene name | 
| items_in_group.low | The number of single-cells with each gene perturbed | 
| lo_value.low | The RRA score in negative selection (reducing the marker expression if this gene is perturbed). The RRA score uses a p value from rank order statistics to measure the degree of selection; the smaller score, the stronger the selection is. More information on the calculation of RRA score can be found in our original MAGeCK paper. | 
| p.low | The raw p-value (using permutation) of this gene in negative selection | 
| FDR.low | The false discovery rate of this gene in negative selection | 
| goodsgrna.low | The number of single-cells that passes the threshold and is considered in the RRA score calculation in negative selection | 
| items_in_group.high | The same as items_in_group.low: the number of single-cells with each gene perturbed) | 
| lo_value.high | The RRA score in positive selection (increasing the marker expression if this gene is perturbed | 
| p.high | The raw p-value (using permutation) of this gene in positive selection | 
| FDR.high | The false discovery rate of this gene in positive selection | 
| goodsgrna.high | The number of single-cells that passes the threshold and is considered in the RRA score calculation in positive selection | 
The scmageck_lr function will generate several files below:
| File | Description | 
|---|---|
| lr_score | The score (similar with log fold change) of each perturbed gene (rows) on each marker gene (columns) | 
| lr_score.pval | The associated p values of each score | 
| LR.RData | An R object to store scores and p values | 
The format of score.txt and score.pval.txt is a simple table file with rows corresponding to perturbed genes and columns corresponding to marker genes. For example in the score.txt,
Perturbedgene  APC                ARID1A               TP53               MKI67
     APC       0.138075836476524  -0.0343441660045313  0.214449590551132  -0.150287676553705This row records the effects of perturbing APC gene on the expressions of APC, ARID1A, TP53 and MKI67.
Questions? Comments? Join the MAGeCK Google group or email us (wli2@childrensnational.org) directly.
Any advice and suggestions will be greatly appreciated.