Skip Navigation
GSK Cancer Cell Line Genomic Profiling Data —
National Cancer Institute   U.S. National Institutes of Health www.cancer.gov

The caBIG program has been retired, and while this website is being maintained temporarily to prevent broken links and provide access to information on the subset of caBIG projects that were transitioned into the new NCIP program, it will be archived in the near future. For information on the NCI's biomedical informatics program, please visit http://ncip.nci.nih.gov.

The information and links on this website are no longer being updated and are provided for reference purposes only.

Home » GSK Cancer Cell Line Genomic Profiling Data
Document Actions

GSK Cancer Cell Line Genomic Profiling Data

Marking another positive step in the collaborative fight against cancer, GlaxoSmithKline (GSK) has released the genomic profiling data for over 300 cancer cell lines via the National Cancer Institute’s cancer Bioinformatics Grid™ (caBIG®). Cancer cell lines can be manipulated in the laboratory and have been used extensively by GSK in the discovery and development of novel cancer therapeutics. These data are available through caArray.

caArray is an open source microarray data management system that allows users to submit, annotate and download microarray data. caArray was developed using the caBIG compatibility guidelines, as well as the Microarray Gene Expression Data (MGED) society standards for microarray data. Learn more about the caArray tool.

Data usage guidelines

This data set was generated and provided by GlaxoSmithKline. Any publication or presentation of results utilizing this data set will (i) include appropriate cell line sourcing reference (e.g. ECACC/HPACC, DSMZ, ATCC); (ii) catalogue reference number; and (iii) in the case of ECACC/HPACC acknowledge accurate reference to the work of the original depositor into ECACC/HPACC. The cell line sourcing reference and the catalog number are provided as Source Annotations in the Experiment and are additionally in the tab-delimited .SDRF file that is available as a Supplemental File with this experiment. The original depositor references are provided as Publications associated with this Experiment. Any publications or presentations generated from use of this data set should include appropriate acknowledgement to GSK.

Citing this dataset

The following citation should be used in association with this dataset:

Greshock J, Bachman KE, Degenhardt YY, Jing J, Wen YH, Eastman S, McNeil E, Moy C, Wegrzyn R, Auger K, Hardwicke MA, Wooster R. Cancer Res. 2010 May 1;70(9):3677-86. Epub 2010 Apr 20.

 

Links to data in caArray

https://array.nci.nih.gov/caarray/project/woost-00035  - SNP profiling data within caArray.  SNP profiling data for the cancer cell lines generated using the Affymetrix GeneChip®  500K Mapping Set (Mapping250K_Nsp and Mapping 250K_Sty).

https://array.nci.nih.gov/caarray/project/woost-00041 - Transcript profiling data for the cancer cell lines, generated using Affymetrix GeneChip® U133 Plus 2.0 arrays.

For both of these datasets, the following information is provided in caArray:

  • Overview tab: Summary information about the Experiment
  • Contacts tab:  GSK contact for the Experiment
  • Annotations tab: Detailed annotations about the cell lines and their characteristics
  • Data tab:Array data files available local download
  • Publications tab: Publications relevant to cell lines used in these studies

 

Links to data on FTP site

Given the very large size of these datasets, we are also hosting these datasets on an FTP site for convenient bulk download.  For those interested in downloading the entire dataset, we strongly encourage you to download the data via this route.

ftp://caftpd.nci.nih.gov/pub/caARRAY/SNP - Contains all raw data (*.cel) files for the cancer cell lines run on the Affymetrix GeneChip®  500K Mapping Set (Mapping250K_Nsp and Mapping 250K_Sty).  In addition, the MAGE-TAB document sets describing these data sets are provided.  GSK_500K.sdrf provides the mapping of the array data files to the corresponding samples.  

Total download size: 18.23 GB

 

ftp://caftpd.nci.nih.gov/pub/caARRAY/transcript_profiling - Contains all raw data (*.cel) and mas5 normalized ( *.txt) files for the cancer cell lines run on the Affymetrix GeneChip® U133 Plus 2.0 arrays. In addition, the MAGE-TAB document sets describing these data sets are provided.  GSK_RNA.sdrf provides the mapping of the array data files to the corresponding samples.  

Total download size: 6.05 GB

last modified 09-29-2011 03:34 PM