Project 2. Genomic Fingerprinting

The very high predictive value of ER/PR status for response to Endocrine therapy has been repeatedly confirmed. The most recent overview analysis of adjuvant therapy essentially shows no benefit of tamoxifen for tumors lacking ER expression and the NSABP P1 trial showed a reduction of only ER+ tumors by tamoxifen. Gene expression profiling of breast tumors has begun to reveal that there are several distinct phenotypes that may have characteristic biologic behaviors. Genomic approaches allow investigators to study many thousands of genes simultaneously, and may lead to more fundamental classification of cancer by profiling expression (transcriptional or proteomic profiling) or cataloging genomic variability (mutations or sequence polymorphisms). Transcriptional profiling has been done using spotted cDNA or synthetic oligonucleotide arrays, which produce large data sets requiring sophisticated computational and statistical tools to analyze. We have used self-organizing maps (SOMs) and hierarchical clustering algorithms to cluster groups of tumors and to uncover new classifications, or validate existing classifications. Other investigators have used similar clustering algorithms to study breast cancer and suggested new ways to classify this heterogeneous disease, based on a profile of gene expression. In addition to cancer classification, array data may identify genes whose expression varies among clinically or pathologically defined groups, hopefully providing targets for diagnosis and therapy.


OBJECTIVES

This project combines investigators from the Harvard School of Public Health, the Brigham and Women's Hospital, the Dana-Farber Cancer Institute and the Whitehead Institute for Biomedical Research. ER- breast cancer comprises 30-40% of all breast cancer in U.S. women. The etiology and growth-promoting pathways in these cancers are not fully elucidated, impeding development of successful therapies. Investigators in our COE will use expression arrays and transcriptional profiling to study ER- breast cancers. This project will test the following hypotheses:

  1. All, or a distinct group of, estrogen-insensitive breast cancers are distinguished from estrogen-sensitive breast cancers by the expression of clusters of genes. Natural classifications, based upon non-biased expression profiles, will define truly estrogen-sensitive from estrogen-insensitive cancers.

  2. Estrogen-insensitive cancers utilize a variable, but finite number of growth-promoting pathways to escape proliferation control, differentiation and apoptotic pathways. The "fingerprints" of these pathways will classify ER- cancers into discrete groups, segregating by predominate carcinogenic pathways.

  3. By comparing expression profiles to detailed studies of specific pathways, fingerprints of different biochemical pathways may be identified. These will help to rationally sub-classify ER- cancers, and suggest targets for therapy.

COLLABORATORS

Myles A. Brown, M.D.-Dr. Brown Associate Professor of Medicine, Dana-Farber Cancer Institute and Harvard Medical School. He is a member of the Executive Committee of the Dana-Farber Cancer Institute Women's Cancer's Program and the DF/HCC Breast Cancer Program. His lab focus is on the role of coregulators in nuclear receptor function. He will serve as Principal Investigator of the COE. In addition he will serve as a co-investigator on Project 6 and a collaborator on Projects 2 and 3. Email: Myles_Brown@dfci.harvard.edu

Todd Golub, M.D.-Dr. Golub is Assistant Professor of Pediatrics, Dana-Farber Cancer Institute and Harvard Medical School. He is also Director, Cancer Genomics, Whitehead/MIT Center for Genome Research. He has made several of the major innovations in the phenotyping of cancers by expression profiling. He will serve as a co-investigator on Project 2. Email: TGOLUB@PARTNERS.ORG

J. Dirk Iglehart, M.D.-Dr. Iglehart is the Richard Wilson Professor of Surgery, Harvard Medical School and Chief, Division of Surgical Oncology, Brigham and Women's Hospital and the Charles Dana Investigator in Cancer Genetics, Dana-Farber Cancer Institute. He is the Director of the Dana-Farber Cancer Institute Women's Cancers Program, DF/HCC Breast Cancer Program and the Principal Investigator of the DF/HCC SPORE in Breast Cancer. He is an expert in the genetics of breast cancer and will serve as a co-investigator on Project 2. Email: JIGLEHART@PARTNERS.ORG

Andrea L. Richardson, M.D., Ph.D.- Dr. Richardson is an Instructor in Pathology, Brigham and Womenâs Hospital and Harvard Medical School. She is a member of the DF/HCC Breast Cancer Program and the DF/HCC SPORE in Breast Cancer. She is an expert in the pathology of breast cancer and will serve as a co-investigator on Project 2. Email: Andrea_Richardson@dfci.harvard.edu

Wing Hung Wong, Ph.D.-Dr. Wong is Professor of Computational Biology (Biostatistics), Harvard School of Public Health. His group focuses on the computational methods required for gene expression profiling. He is a co-investigator on Project 2. Email: wwong@hsph.harvard.edu


DATA

Project Update 2003

A comprehensive study was performed on 90 tumors from the DF/HCC Breast Program Tissue Resource. Expression array data was collected from U95A Affymetrix GeneChips and analyzed by dChip. dChip reads .cel files from Affymetrix software, normalizes arrays and calculates scaled expression values. dChip also excludes 'outlier' probe sets and arrays; this program is described by Li and Wong . Clustering algorithms in dChip filter expression data and provide hierarchical representation of these tumors (Figure 1). Clinical information, linked by prospective informed consent, is available in our clinical database for all tissue analyzed in this project. We used dChip to perform unsupervised clustering of 89 tumors (excluding one outlier array) that were chosen to look for molecular determinants of lymph node metastasis. Genes were filtered on the basis of their expression value and variability across the data set. The 443 resulting filtered genes are clustered on the vertical axis and the individual tumors are clustered across the top, both according to similarity of expression patterns. A dendrogram is provided showing the relation of individual samples to the whole group and the panel across the top displays individual characteristics of each tumor.

Figure 1. Hierarchical clustering of 89 primary breast cancers. Clustering routines in dChip software were used to order the primary breast cancers in this study. The dendrogram at the top orders the 89 tumors into hierarchical clusters. Two main clusters are indicated above the dendrogram, Cluster I and II. The array shows the expression of 443 probe sets (genes). Highly expressed genes are darker and genes expressed less strongly are shown in shades of gray. The upper panel (shaded rows) shows the clinical and molecular data for each individual case. Dark shading refers to positive nodes, high-grade histology and ER, HER2 and p53 positive tumors.

Unsupervised hierarchical clustering on 89 tumors produced two groups, shown in Figure 1 and further explained in Table 2. Cluster I contained 34 cancers, significantly enriched for ER- tumors. Cluster II contained low grade ductal and lobular cancers, ER+ tumors and, in data not shown, all six normal tissues clustering together as a group.

Although selected for the prediction of lymph node status, we examined the cohort for other characteristics by supervised techniques. In particular, hormone receptor status was a robust discriminate in our tumors. Using a variety of techniques, we were able to distinguish ER+ (including low positive cases) from those tumors deemed ER- by receptor immunostaining. For instance, we were able to correctly predict ER status with an error rate of less than 5% and a permutation p value of <0.0001. Multidimensional scaling is a visual way to represent complex data, shown in Figure 2. In this figure, data in high-dimensional space has been scaled to two dimensions. ER- cancers are shown in blue and cluster distinct from the ER+ cancers, shown in red. In general, low positive cases (shown in green) cluster with the ER+ cancers. These results are reassuring. Despite the fact we extracted whole tumors, containing epithelial and stromal elements, these tumors were easily distinguished by a variety of statistical and computational methods. Therefore, we are confident in our ability to handle and analyze information coming from primary cancers.

Figure 2. Multidimensional scaling expression data from 89 tumors. ER- cancers are shown as a cluster (blue) to the right and above and the tight cluster (red) in the lower half are ER+ cancers. The ER low-positive cancers tend to cluster with the ER+ cancer cluster (in green).

USEFUL LINKS

dChip
Whitehead Cancer Genomics

[back to top]