GSK Cancer Cell Line Genomic Profiling Data
Marking another positive step in the collaborative fight against cancer, GlaxoSmithKline (GSK) has released the genomic profiling data for over 300 cancer cell lines via the National Cancer Institute’s cancer Bioinformatics Grid™ (caBIG®). Cancer cell lines can be manipulated in the laboratory and have been used extensively by GSK in the discovery and development of novel cancer therapeutics. These data are available through caArray.
caArray is an open source microarray data management system that allows users to submit, annotate and download microarray data. caArray was developed using the caBIG compatibility guidelines, as well as the Microarray Gene Expression Data (MGED) society standards for microarray data. Learn more about the caArray tool.
Data usage guidelines
This data set was generated and provided by GlaxoSmithKline. Any publication or presentation of results utilizing this data set will (i) include appropriate cell line sourcing reference (e.g. ECACC/HPACC, DSMZ, ATCC); (ii) catalogue reference number; and (iii) in the case of ECACC/HPACC acknowledge accurate reference to the work of the original depositor into ECACC/HPACC. The cell line sourcing reference and the catalog number are provided as Source Annotations in the Experiment and are additionally in the tab-delimited .SDRF file that is available as a Supplemental File with this experiment. The original depositor references are provided as Publications associated with this Experiment. Any publications or presentations generated from use of this data set should include appropriate acknowledgement to GSK.
Citing this dataset
The following citation should be used in association with this dataset:
Greshock J, Bachman KE, Degenhardt YY, Jing J, Wen YH, Eastman S, McNeil E, Moy C, Wegrzyn R, Auger K, Hardwicke MA, Wooster R. Cancer Res. 2010 May 1;70(9):3677-86. Epub 2010 Apr 20.
Links to data in caArray
https://array.nci.nih.gov/caarray/project/woost-00035 - SNP profiling data within caArray. SNP profiling data for the cancer cell lines generated using the Affymetrix GeneChip® 500K Mapping Set (Mapping250K_Nsp and Mapping 250K_Sty).
https://array.nci.nih.gov/caarray/project/woost-00041 - Transcript profiling data for the cancer cell lines, generated using Affymetrix GeneChip® U133 Plus 2.0 arrays.
For both of these datasets, the following information is provided in caArray:
- Overview tab: Summary information about the Experiment
- Contacts tab: GSK contact for the Experiment
- Annotations tab: Detailed annotations about the cell lines and their characteristics
- Data tab:Array data files available local download
- Publications tab: Publications relevant to cell lines used in these studies
Links to data on FTP site
Given the very large size of these datasets, we are also hosting these datasets on an FTP site for convenient bulk download. For those interested in downloading the entire dataset, we strongly encourage you to download the data via this route.
ftp://caftpd.nci.nih.gov/pub/caARRAY/SNP - Contains all raw data (*.cel) files for the cancer cell lines run on the Affymetrix GeneChip® 500K Mapping Set (Mapping250K_Nsp and Mapping 250K_Sty). In addition, the MAGE-TAB document sets describing these data sets are provided. GSK_500K.sdrf provides the mapping of the array data files to the corresponding samples.
Total download size: 18.23 GB
ftp://caftpd.nci.nih.gov/pub/caARRAY/transcript_profiling - Contains all raw data (*.cel) and mas5 normalized ( *.txt) files for the cancer cell lines run on the Affymetrix GeneChip® U133 Plus 2.0 arrays. In addition, the MAGE-TAB document sets describing these data sets are provided. GSK_RNA.sdrf provides the mapping of the array data files to the corresponding samples.
Total download size: 6.05 GB