ID-translation {TCGAutils} | R Documentation |
These functions allow the user to enter a character vector of
identifiers and use the GDC API to translate from TCGA barcodes to
Universally Unique Identifiers (UUID) and vice versa. These relationships
are not one-to-one. Therefore, a data.frame
is returned for all
inputs. The UUID to TCGA barcode translation only applies to file and case
UUIDs. API queries for this translation service with other types of UUIDS
are not fully supported. Please double check any results before using these
features for analysis. Case / submitter identifiers are translated by
default, see the id_type
argument for details.
UUIDtoBarcode(id_vector, id_type = c("case_id", "file_id"), end_point = "participant", legacy = FALSE) barcodeToUUID(barcodes, id_type = c("case_id", "file_id"), legacy = FALSE) translateBuild(from, to = "UCSC") extractBuild(string, build = c("UCSC", "NCBI"))
id_vector |
A |
id_type |
Either |
end_point |
The cutoff point of the barcode that should be returned,
only applies to |
legacy |
(logical default FALSE) whether to search the legacy archives |
barcodes |
A |
from |
A build version name |
to |
The name of the desired version |
string |
A single character string |
build |
A vector of build version names (default UCSC, NCBI) |
The end_point
options reflect endpoints in the Genomic Data Commons
API. These are summarized as follows:
participant: This default snippet of information includes project, tissue source site (TSS), and participant number (barcode format: TCGA-XX-XXXX)
sample: This adds the sample information to the participant barcode (TCGA-XX-XXXX-11X)
portion, analyte: Either of these options adds the portion and analyte information to the sample barcode (TCGA-XX-XXXX-11X-01X)
plate, center: Additional plate and center information is returned, i.e., the full barcode (TCGA-XX-XXXX-11X-01X-XXXX-XX)
Only these keywords need to be used to target the specific barcode endpoint.
These endpoints only apply to "file_id" type translations to TCGA barcodes
(see id_type
argument).
A data.frame
of TCGA barcode identifiers and UUIDs
A couple of functions are available to search for build versions, either from
NCBI or UCSC. translateBuild
will translate between UCSC and NCBI
build versions. extractBuild
will use grep patterns to find the first
build within a string.
Sean Davis, M. Ramos
## Translate UUIDs >> TCGA Barcode uuids <- c("0001801b-54b0-4551-8d7a-d66fb59429bf", "002c67f2-ff52-4246-9d65-a3f69df6789e", "003143c8-bbbf-46b9-a96f-f58530f4bb82") UUIDtoBarcode(uuids, id_type = "file_id", end_point = "sample") UUIDtoBarcode("ae55b2d3-62a1-419e-9f9a-5ddfac356db4", id_type = "case_id") ## Translate TCGA Barcode >> UUIDs fullBarcodes <- c("TCGA-B0-5117-11A-01D-1421-08", "TCGA-B0-5094-11A-01D-1421-08", "TCGA-E9-A295-10A-01D-A16D-09") sample_ids <- TCGAbarcode(fullBarcodes, sample = TRUE) barcodeToUUID(sample_ids) participant_ids <- c("TCGA-CK-4948", "TCGA-D1-A17N", "TCGA-4V-A9QX", "TCGA-4V-A9QM") barcodeToUUID(participant_ids) translateBuild("GRCh35", "UCSC") extractBuild( "SCENA_p_TCGAb29and30_SNP_N_GenomeWideSNP_6_G05_569110.nocnv_grch38.seg.txt" )