Previous Article |
Table of Contents
| Next Article
BIOLOGICAL SCIENCES / MEDICAL SCIENCES
Cytosine methylation profiling of cancer cell lines





*Sequenom, Inc., 3595 John Hopkins Court, San Diego, CA 92109; and
Ludwig Institute for Cancer Research, P.O. Box 2008, Royal Melbourne Hospital, Parkville, Victoria 3050, Australia
Contributed by Charles Cantor, December 27, 2007 (received for review December 14, 2007)
| Abstract |
|---|
|
|
|---|
colon cancer | DNA methylation | NCI-60 | MALDI-TOF
We compiled a set of >400 cancer-relevant genes and used these for a high-resolution scan of DNA methylation. The genes were selected to include a majority of cancer consensus genes as described by Futreal et al. (3) and a subset of known imprinted genes (www.geneimprint.com/). All genes were analyzed in 59 cell lines derived from nine different tumor types and control DNA from six normal tissues. The cancer cell lines are compiled by the National Cancer Institute (NCI) as the NCI-60 panel, which has been widely used for in vitro anticancer drug testing. Over the years, the NCI-60 set has become one of the best characterized cell line sets available. These also have been analyzed by using a variety of methods including transcriptional profiling (refs. 4 and 5, and see http://dtp.nci.nih.gov/index.html), spectral karyotyping (6), and proteomic profiling (7). In addition, cytotoxicity profiling has been documented for >100,000 chemical compounds by the NCI's Developmental Therapeutics Program (DTP) (http://dtp.nci.nih.gov/index.html and ref. 8).
Here, we report the results of a large-scale DNA methylation profiling study, which includes the quantitative analysis of >500 genomic target regions representing >400 genes in 59 cell lines with confirmation of a subset of targets in 48 colorectal cancer/normal tissue pairs. The resulting data provide a comprehensive panel of cancer-related DNA methylation changes and can be integrated with previous datasets on mutational, transcriptional, and proteomic profiles to obtain a more comprehensive understanding of neoplastic transformations.
| Results |
|---|
|
|
|---|
The initial methylation data were filtered to exclude poor quality measurements. Poor quality was defined as amplicons with data available for <75% of all samples. These regions were excluded from further analysis. The filtered dataset contained 531 amplification regions, representing 430 genes. For excluded amplicons, PCR was identified as the leading cause of reaction failure.
All autosomal chromosomes and the X chromosome are represented in the current gene set. The median amplicon length was 413 bp (range = 171–683 bp) and the median CpG content per amplicon was 33 CpG/amplicon (range = 6–81 CpG per amplicon). For each sample, a total of 11,723 CpG sites were analyzed. The analytical method used herein uses a biochemistry that does not always allow quantitative read out of methylation values for every single CpG in an amplification region. Some values represent the methylation state of a short stretch of subsequent CpG sites, which we refer to as CpG units. In this study, the 11,723 CpG sites were represented by 7,216 CpG units. To reduce the complexity of the dataset, we build amplicon-specific mean methylation values for each sample, which were used for the later analysis. Sequence-specific details are given in supporting information (SI) Table 2. We analyzed DNA methylation in the NCI-60 panel composed of 59 cell lines and used six commercially available DNAs from adult tissues to represent "healthy" control samples.
Stability. All bisulfite-based methylation analysis methods suffer from a considerable amount of measurement variability introduced by the chemical treatment of genomic DNA. To assess the degree of this inherent variability, we previously dissected the method into its four components and measured the variability for each step in the process (10). The results demonstrated that the greatest source of process-dependent variability is the bisulfite conversion reaction (SD = 10–15%). To determine whether the previous results are applicable to the model system used in this study, we performed duplicate measurements of four control DNAs in 96 amplicons and observed sufficient data stability (R2 = 0.98, SI Fig. 5). We also were interested in evaluating the effect of primer design on the quantitative measurements. We designed two different but overlapping amplification regions for the ERBB2 gene. The quantitative values from both reactions were almost identical and showed a high correlation (R2 = 0.96; see also Fig. 1a).
|
Recent studies have shown a decrease of epigenetic marking in a 1-kb window around the transcription start site. In active Drosophila melanogaster promoters, histone occupancy is decreased, and in normal human tissue samples, DNA methylation is reduced within this core region (11, 12). To further investigate this relationship, we mapped the distance from the 5' UTR for each measured CpG in the dataset (>700,000 data points). CpG methylation in normal samples showed the expected core window of unmethylated CpG sites within 1 kb around the 5' UTR (Fig. 1e). In cancer cell lines, methylation averages are generally elevated, but the same symmetrical methylation decrease is observed. Thus, these results confirm previous findings and expand their applicability to cancer cell lines (see also SI Text and SI Fig. 6).
Methylation-Based Cell-Line Clustering. To examine relationships among cell lines and CpG sites, we performed an unsupervised two-dimensional hierarchical clustering analysis, which provides an unbiased view on these relationships (Fig. 2).
|
Confirmation of Cell-Line Results in Colon Cancer Samples.
It remains unclear whether the observed methylation differences are a consequence of the manipulation of cell lines during in vitro growth or whether they represent cancer-specific characteristics. Accordingly, our model system bears the risk of overinterpreting the detected methylation differences. To explore the validity of our findings, we chose the colon cancer cell-line models for confirmation in clinical samples. A set of 50 genes was selected that showed significant differential methylation (
M >20%, P < 0.001, two-sided t test) in the colon cancer cell lines. To assess the specificity of our finding, we also selected 14 genes that did not show any cell line methylation differences. We investigated the methylation status of these genes in 48 matched sample pairs of colon cancer tissue and adjacent normal colon tissue. The majority of patients were male [male (M) = 30, female (F) = 18]; the median age at diagnosis was 65 years (range 46–83). Fourteen patients had experienced local or distant cancer recurrence, and all stages (I–IV) were evenly represented. The analysis of methylation differences between the normal and cancer tissue samples confirmed the previous cell-line findings for the majority of genes. In the set of differentially methylated genes, we found 42 of 50 (84%) genes to be significantly differentially methylated in the clinical tissue samples. Additionally, all 14 genes that did not show a methylation difference were still not differentially methylated in the clinical samples (SI Table 2).
We next used the methylation patterns to characterize relationships among the colon cancer samples and to explore potential associations to their clinical features. None of the clinical features showed a strong correlation to the resulting colon cancer methylation groups (SI Fig. 7a). We explored the degree of similarity between methylation patterns derived from cell-line samples and their tissue counterparts by using hierarchical clustering (Fig. 3). As expected, the normal tissues grouped with the normal colon tissue samples, and the colon cancer cell lines grouped with the colon cancer tissue samples. However, the segregation of normal and colon cancer tissue samples was not perfect. A subset (n = 10) of colon cancer samples is found in the group of normal tissue samples (Fig. 3).
|
Finally, we compared our findings to results from a recent methylation study of colon cancer tumors that analyzed DNA methylation with a different technology (16). A total of 38 genes were shared by both datasets. The results of both studies are in good agreement (92% concordance). Nine genes were found to be hypermethylated in colon cancer in both datasets, 26 genes showed no colon cancer-specific methylation in both datasets. Two genes were identified as hypermethylated only by the previous study, and one gene was found to be hypermethylated only in our study.
Differentially Methylated Genes. Tissue-specific DNA methylation has been observed in normal tissues (17), and several cancer-specific methylation markers have been described. The specificity of such cancer markers to a single cancer type remains unclear. Several markers have been found to be differentially methylated in multiple cancer types. These markers might be more universally involved in the progression of cancer. Here, we attempted to identify groups of genes that are differentially methylated between each type of cancer cell line and normal tissues. We then examined the individual groups and determined which genes overlap in multiple cancer types and which are found in specific tumor types only.
Because several genetic loci were tested in many separate runs (one for each cell line), the results will contain false positives that arise from multiple testing in high-dimensional datasets. Although this does not completely erase the issue, we included a minimum difference of 20% as an additional selection criterion to filter out false positives. We classified a gene to be differentially methylated when the difference in methylation values between the normal samples and the subset of cancer cell lines was >20% and the P value for a two-sided t test was <0.001.
The results for the group of leukemia samples and the group of prostate cancer cell lines should be viewed with some caution, because the group of leukemia samples represent a biologically heterogeneous group. Prostate cancer is the smallest subgroup containing only two cell-line samples, which, in addition, do not show prostate cancer-specific gene-expression signatures (5). Hence, their results are less likely to be representative.
A total of 71 genes were statistically significantly hypermethylated in at least one tumor type. A large fraction of these genes (n = 30, 42%) were found only in one tumor type, and nearly 10% were found in more than five tumor types (TSPYL, PAX8, LEP, PHOX2B, and TMPRSS2 were found in five tumor types; MYOD1 was found in six tumor types; PAX5 was found in eight tumor types).
Seven genes were hypomethylated (TCL1A, SLC22A2, TRPM5, IGF2, PEG3 were found in one tumor type; 2 KCNQ1, DLK1 were found in two tumor types). As suggested from our previous analysis depicted in SI Fig. 8, CNS neoplasms (n = 4) and melanomas (n = 3) had the highest number of hypomethylated genes. Interestingly almost all of the hypomethylated genes are known to be imprinted, which might point to a loss of imprinting in these cases (SI Table 3).
PRC2 Target Identification for Colon Cancer and All Others. A retrospective analysis of DNA methylation by Widschwendter et al. provided evidence that genes targeted by the Polycomb repressive complex 2 are silenced in human colon cancer (18).
We were able to retrieve information about PRC2 binding sites for 440 amplicons, including 79 amplicons with more than one PRC2 -binding site. We calculated the fraction of amplicons that contain one or more PRC2 binding sites for both: The set of genes that did not show significant methylation differences and the set of genes that did show significant methylation differences in cancer cell lines versus normal tissue. Our findings show a significant (P < 0.001, Fisher's exact test) enrichment for PRC2 targets in the set of significantly hypermethylated genes in six of the nine tumor types. In the group of tumor cell lines with sufficient numbers of samples (excluding leukemia and prostate), we find that only the melanoma-specific gene set is not enriched for PRC2 targets. All other tumor types are 2- to 6-fold enriched for PRC2 targets (Table 1). A graphical representation of gene–tumor associations reveals that highly connected genes also tend to be PRC2 targets (Fig. 4 and SI Fig. 9).
|
|
| Discussion |
|---|
|
|
|---|
We are aware that the use of normal tissue DNA samples for comparison to cancer cell-line samples represents a great limitation of our study. It remains unclear what fraction of observed methylation changes has to be attributed to cell-line transformation in vitro. In cultured embryonic stem cells, epigenetic instability has been reported (21), but high-resolution scans have not been performed in cancer cell lines. It has been shown that expression profiles of glioblastoma cell lines (U251 and U87) can diverge remarkably when cultured in vitro versus growth in vivo as s.c. or intracerebral xenografts (22). To assess the impact of these limitations and to evaluate the practical applicability of our findings, we chose to verify the results by using the colon cancer model. Using 96 clinical samples from 48 patients, we were able to confirm
85% of the differentially methylated genes detected in the cell-line model. Eight genes no longer showed a significant methylation difference, which might be attributed to the fact that the observed methylation differences between colon cancer tissue samples and normal colon cancer samples were generally smaller than those observed in cell lines. To exclude overinterpretation of the results based on nonspecific promoter hypermethylation, we also analyzed a set of genes that did not show methylation differences in the cell-line model and one gene that showed hypomethylation. We confirmed indifferent methylation in 12 genes with low methylation levels (<20%) and in two genes with higher methylation levels (RUNX3 = 40% and TRPM5 = 80%). KCNQ1, a gene known to be imprinted, was hypomethylated in colon cancer cell lines and colon cancer tissues.
We further cross-compared our results with findings from another comprehensive methylation study performed on colon cancer tissue samples (16). The results from both studies are highly concordant. The few observed disparities might simply be caused by using different interrogation sites. Unfortunately, we were unable to investigate this issue, because the publication did not reveal the exact genomic location of the analyzed amplicons.
One of the most interesting results of this study was that we were able to integrate our results into a recently developed biological model of the polycomb repressive complex 2. Most functional studies have so far been performed on stem cells and association studies mainly focused on colon cancer samples or colon cancer cell lines (18, 23). Our study provides quantitative methylation data for at least seven different tumor types (excluding leukemias and prostate cancer). For every tumor type, a different but not exclusive set of genes was identified that showed higher methylation levels compared with normal samples. Interestingly, six of these seven sets of hypermethylated genes were enriched for PRC2 target genes. These findings suggest that PRC2 target methylation is a common event in cancer and is not limited to individual tumor types.
Our study also confirms the feasibility of the current method for large-scale quantitative high-resolution mapping of DNA methylation. This approach provides an ideal method for large-scale validation of genome-wide methylation studies or to expand the current dataset by analyzing additional regions.
We hope that the availability of this large-scale methylation dataset will initiate interdisciplinary research that integrates multiple available datasets. This information might help to design and execute experiments that will elucidate some of questions that remained unsolved.
| Methods |
|---|
|
|
|---|
Clinical Colon Cancer Tissue. Tissue colorectal cancer and normal colon tissue samples were obtained from the Royal Melbourne Hospital (RMH) Tissue Bank as part of the Ludwig Colon Cancer Initiative biomarker project. Samples were obtained with informed consent under the approved protocol from patients having a resection for histologically confirmed colorectal cancer, and the normal matched tissue was obtained from the same resection specimen at a site adjacent to the tumor.
Tissue samples were snap-frozen in liquid nitrogen within 30 min of collection and stored in a –80°C freezer. Matched tissue sample pairs were cut into 2-mm (see www.geneimprint.com/) cube sections weighing
10–15 mg. After manual dissection, DNA was extracted from the tissue sections by using a Qiagen DNeasy blood and tissue kit. Briefly, samples were first lysed by using a Proteinase K digestion for 3 h at 56°C followed by selective binding of DNA to a membrane; final steps involving a spin-column procedure allowed for the washing and subsequent elution of DNA with precipitated DNA resuspended in a buffer solution. DNA was quantified by using a biophotometer, with A260/A280 ratio in the range of 1.7- 2.0. DNA samples were normalized to a concentration of 50 µg/ml.
Clinical data regarding patients and histopathological data for tumor samples were derived from the Australian Comprehensive Clinical Outcomes and Research Database (ACCORD) of Bio21 Molecular Medicine Informatics Model (Bio21:MMIM) (http://mmim.ssg.org.au). This is a unique resource of datasets, physically located at various organizations that are able to be integrated, searched, and queried seamlessly via a federated data integrator. All patients' consent to their data being captured (data are deidentified where appropriate) and all data collection and linkage are approved by the relevant human research ethics committees.
DNA Methylation Analysis. Bisulfite treatment. Genomic DNA sodium bisulfite conversion was performed by using EZ-96 DNA methylation kit (Zymo Research). The manufacture's protocol was followed by using 1 µg of genomic DNA and the alternative conversion protocol (a two-temperature DNA denaturation).
Methylation analysis. Sequenom's MassARRAY platform was used to perform quantitative methylation analysis. This system utilizes MALDI-TOF mass spectrometry in combination with RNA base specific cleavage (MassCLEAVE). A detectable pattern is then analyzed for methylation status. PCR primers were designed by using Methprimer (www.urogene.org/methprimer/). When it was feasible, amplicons were designed to cover CGIs in the same region as the 5' UTR. For each reverse primer, an additional T7 promoter tag for in vivo transcription was added, as well as a 10-mer tag on the forward primer to adjust for melting-temperature differences. The MassCLEAVE biochemistry was performed as previously described (for details also see SI Text) (9). Mass spectra were acquired by using a MassARRAY Compact MALDI-TOF (Sequenom) and spectra's methylation ratios were generated by the Epityper software v1.0 (Sequenom).
Statistical Methods. All statistical analyses were performed by using the R statistical environment (www.r-project.org). Distances from gene start sites have been calculated by using the RMySQL package and the SQL database version of the UCSC genome browser (http://genome.ucsc.edu/cgi-bin/hgGateway). Two-dimensional clustering has been performed by using the heatmap.2 function in the gregmisc package. Classical multidimensional scaling has been performed by using the cmdscale function, and visualization was done through the scatterplot3d function in the same-named package. Tests for statistical significance (t test, Wilcox test, and Fisher's exact test) have been used with standard function in R build into the stats package.
For sequence-motif detection, we used a permutation-based method. We randomly sampled n sequences from the pool of all analyzed sequences (n is equal to the number of sequences in the low- or high-methylation group). We then counted how often every possible 6-mer (n = 4,096) is present in the sampled subset. One thousand permutations were performed for each analysis. A sequence motif was identified as being overrepresented if it occurred more often in the analyzed group of sequences than in any of the 1,000 random draws. Graphical representation of the gene-tissue relationships was performed by using the dot algorithm implemented in Graphviz.
| Footnotes |
|---|
To whom correspondence may be addressed. E-mail: mehrich{at}sequenom.com or ccantor{at}sequenom.comFreely available online through the PNAS open access option.
Author contributions: M.E. designed research; M.E. and J.T. performed research; M.E., P.G., L.L., and M.G. contributed new reagents/analytic tools; M.E. analyzed data; and M.E., P.G., L.L., M.G., C.C., and D.v.d.B. wrote the paper.
Conflict of interest statement: M.E., J.T., C.C., and D.v.d.B. are shareholders and full-time employees of Sequenom, Inc.
This article contains supporting information online at www.pnas.org/cgi/content/full/0712251105/DC1.
© 2008 by The National Academy of Sciences of the USA
| References |
|---|
|
|
|---|
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||