Summary






SLIDES & TRANSCRIPTS
Monday, June 17

Epidemiology and Prognostic Assessment of Soft Tissue Sarcomas


Paul Meltzer, MD, PhD

Slide 1:

Thank you, Scott. It is a pleasure to be here and give you my perspective on sarcoma research. I really have to thank the preceding speakers for setting the stage so well. Especially, the previous talk was so incredibly lucid and a great prelude to what I am going to talk about I will continue to try to make the case for splitting rather than lumping, and I will give you my argument right now. If we are to develop disease-specific, targeted therapies, which I think is what everyone wants to do for all forms of cancer, not just sarcoma, then we must be able to characterize the specific changes that will turn out to be the good targets. We can only do that by truly facing the somewhat daunting complexity of sarcomas and really developing an appropriate classification scheme that recognizes their molecular complexity.

I am going to right away tell you that I am going to dodge all the issues about how the things I am going to talk about might be used in a practical, everyday clinical setting. I think one could talk for hours about that problem. As far as I am concerned, what I am talking a`out is purely a research technology, which is creeping into clinical use, but I think that is sort of a separate topic. Right now, we have a tremendous need to do these things in a research context.

I work in the Genome Institute, and I can't give any talk without pointing out the incredible progress that we now stand at the end of this timeline with the availability of the human genomic sequence to us.

For cancer biologists, it is a radically different situation resulting from this effort, which was reported just a little over a year ago. The complete, proofread, corrected human genome sequence is anticipated to be complete by next spring, and it puts us in a position now to take whole-genome approaches to biological questions.

These include three areas that are all relevant to sarcoma research -- gene expression at the genome scale, gene variation, and gene function. I am going to focus just on this top issue.

This is a picture of the MIT Whitehead Sequencing Center. I show this just to illustrate that we have gone through kind of a culture shift in the way we do biology, in that robotics, computational biology, have come to the forefront. You see this rather large lab with a small number of people and a lot of machines. This is really, I think, the future of modern biology.

This just gives you the idea of how incredibly rapidly microarrays have taken off since the first paper came out in 1995. This is just the result of a PubMed search.

What you can see is a number of publications increasing three-and-a-half fold every year, with over 1,000 publications last year.

Why has this technology taken the world by storm? It is very simple. It gives us a way to generate the mass quantities of data that we, in the past, have had to approach in a gene-by-gene way, and provides a detailed gene expression phenotype of any biosample of interest in very rapid and quantitative fashion.

So, this is the type of technology that we use in our laboratory -- the printed cDNA microarray. These are hybridized with representations of the cellular mRNA pool, they generate these familiar red/green patterns of spots. We don't stare at spots, but rather convert them into numbers, so the computer software keeps track of what is present at each spot and delivers to us a large spreadsheet of numbers, which then becomes the purview of the computational biologist.

From a practical point of view, in clinical studies with tumors, I think this is an important slide. It just indicates that the technology is somewhat regulated by sample acquisition. Essentially, we can work comfortably with samples from one microgram of RNA on up, corresponding to something like 100,000 cells on up. As you go below that, it becomes harder and harder, and no one is really claiming to do single-cell expression profiling quite yet. So, sample acquisition and adequate sample size becomes a terribly important issue, especially with sarcomas and other diseases which may be sampled without open biopsy.

So, what have we learned from the expression profiling of cancers so far? First, it is very clear that distinct histologies have distinct patterns of gene expression. So, different forms of cancer have very clearly recognizable patterns of gene expression that are easy to separate.

Using expression data, you can actually develop formal rules, which can be used to take an unknown sample, test it, and classify it accurately. There is, again, no doubt that this can be done, and many, many publications support these first two points.

You can, indeed, recognize novel subgroups that you didn't anticipate at the beginning of a study. Some of these, indeed, seem to have clinical and/or biological implications. This is perhaps a good point for me to mention that you can take the large amount of data which is generated by a microarray study and analyze it in one of two fashions.

You can either do what we call unsupervised analysis -- that means taking all the data, putting it in the hopper of the computational machine, turning the crank and seeing what comes out. If everything goes as you hope, tumors will separate out by themselves spontaneously into interesting and relevant groups that may tell you something you really didn't expect a priori.

This contrasts with supervised analysis, where we take the samples and annotate them as carefully as we can with as much information as we can get from all forms of classification -- the clinical parameters, expert pathological classification, all possible molecular markers and so on -- and then look for features in the microarray data which separate samples of, let us say, different grade, different propensity to metastasis, different translocation, and the like. So, both of these have a place in doing this kind of analysis.

You can certainly find incredibly interesting genes for follow-up studies that you wouldn't have thought of any other way, and this has certainly turned out to be an incredible hypothesis-generating tool for cancer research, and that is certainly one of its most important roles.

So, what do you get out of a microarray experiment? Primarily, you get a single thing, a table of expressed genes, the genes that are present in a given sample. You get their quantitative level of expression. So, you know the relative expression of the genes on this list. Hopefully, if you have got the right samples, you learn something about their disease or tissue specificity and, through appropriate bioinformatics, you can place them in relevant pathways.

Of course, the hope is that -- really, the credo of the field is that -- within this table there must be information which relates to the important clinical behavior of the tumors; theeir fundamental underlying molecular mechanism of pathogenesis; and the future markers and drug targets that we want to discover.

So, what are some potential implications of all of this for microarray studies of sarcomas? I am going to just give some very simple illustrative examples, mainly for the benefit of people who aren't familiar with this field. We would hope to get an improved classification of tumors. This is just an example from the published data from our lab on small blue round cell tumors. These relatively similar and, of course, distinguishable -- in expert hands -- diseases. In this case two lymphomas a rhabdo and a neuroblastoma.

When you look at their behavior on a microarray analysis, we were able to achieve very sharp separation of these four types of cancer, according to their pattern of gene expression.

So, each of these balls represents a picture of where the gene expression profile for an individual sample plots in an arbitrary, three-dimensional space. So, it is a very, very clear, sharp separation.

What is included in this separation? This is just a typical 2-D plot that we are used to seeing now from many, many publications.

We see, for example, in Ewing sarcoma, we pick up a myc-2 as a marker for Ewing's sarcoma, tending to validate that we pick up things that are recognized in conventional pathology. IGF-2 for rhabdomyosarcoma is a strong marker for that disease.

So, you find the things you do tend to expect, and this happens again and again. Here is a separation of synovial sarcoma from the reference group, a separation of gastrointestinal stromal tumor from the reference group, and essentially every tumor that we have looked at has followed the rule of having its characteristic pattern of gene expression, with varying levels of difficulty possible to dig out from this kind of data.

So, how far can this be taken? Can it be taken to the level of separating tumors that are really extremely similar, such as these morphologic subsets of leiomyosarcoma? I would argue that this is probably the case, if sufficient samples can be accrued to give statistical significance to this kind of analysis. I think our goal, with this type of research, really ought to be to develop a complete molecular taxonomy of all forms of sarcoma.

So, that is the first thing we would hope to get -- improved classification. Next would be some information regarding gene discovery. So, we really want to discover the genes that drive sarcoma growth, metastasis, angiogenesis, and the like, as Marc mentioned so clearly a few minutes ago.

Microarrays can be used in gene discovery in a couple of different ways. One would be, again, through the tumor-specific gene expression profile. Not as widely disseminated is the concept that we can also look for genomic alterations with microarrays. I will just illustrate that very quickly through searching for copy number changes -- that is, gain or loss of chromosomal segments -- with microarrays. So, there are lots of known amplifications and deletions in various types of sarcomas, as illustrated on this partial table, and we would like to be able to map these out in more detail to help to identify some of the crucial genes that are driving sarcoma growth.

It turns out that you can also do comparative genomic hybridization, a technique for looking at copy number change on cDNA microarrays.

This is just an example of a genome scan of a single soft tissue tumor. So, this is now the entire human genome spread out on this axis, and here are the relative gene copy numbers on this axis.

What may not be instantly obvious to you, out of this sort of noise here -- because most genes aren't changing -- we see some spikes, and these spikes represent regions of increased copy number at specific portions of the genome.

The more you do these types of experiments, the more you realize that we been only seeing a very small part of the picture of complexity of copy number change in sarcomas. Here, you can see three regions that are showing very nice spikes supported by multiple array elements.

Here is a magnification of part of that. Again, you can see two nice spikes now in this particular segment of the genome, where there is clearly a gene amplification event.

So, this would be a portion of the genome that would be worth looking at for genes which might be important in the growth of this tumor.

So, what one has to do is collect a lot of data like this and look for consensus regions of interest. What one would hope to do with this type of CGH data is to correlate CGH data with the expression profile. So, that reinforces the expression data by helping to identify specific genes which are not only highly expressed, but selected by the tumor, as evidenced by the increase in copy number.

We also want to be able to incorporate the CGH pattern into the classification. You certainly get into the classification problem; and if you can bring the relatively tumor-specific patterns of gain into this, I think it is going to be possible to increase the resolution that can be obtained by expression-profiling types of analyses.

So, classification, gene discovery. The third point I would make would be to recognize key pathways or drug targets. This is not necessarily a simple issue. There are a lot of genes to go through to find the ones that are the most important.

Our experience to date, I think, suggests that you will find interesting candidates. You can certainly rule out candidates by discovering that they are not expressed in a particular form of sarcoma.

The example that I like best is this one from the study that we did of GIST. This shows the gene expression profile and the set of GIST samples over here.

It was interesting to see that kit came out right in the middle of this very consistently expressed pattern. What was perhaps even more remarkable was that when we did what we call a weighted list or a weighted gene discriminator analysis, we found that, amongst the top discriminators for GIST, that kit was actually the strongest single gene in our analysis. So, just by fairly straightforward computational analysis, we were able to come up with kit as the top gene of interest for GIST. Certainly, other cases won't necessarily be so simple, but this gives you some feeling of optimism. That interesting genes do tend to jump out at you when you are doing this type of research.

I will show an example of how this can be extended to other forms of sarcoma. So, this is just what we call a random permutation analysis, comparing several forms of sarcoma. This is the curve that would be expected at random for the weight or statistical strength of a gene at random, and this is the actual data that is obtained.

This gap between the actual curve and the randomized curve indicates that an overabundance of informative genes -- so, these are the ones that we want to discover that are disease specific. When this is plotted in a more conventional way, you get this type of plot. So, looking from left to right, each of those color bars represents a different form of sarcoma, and the genes plotted underneath it are the ones that are relatively specific for that type of sarcoma. As you can see, they tend to separate out rather sharply. So, these would be the genes related to the histogenesis or the specific genetic alterations in each of these tumor types. One would hope that, within this set of genes, would be ones that would be worth following up in more detail.

So, how do you go about actually doing good clinical correlative studies using microarrays? I think it is actually rather challenging. I think, at the beginning -- although this has been called "discovery-driven" research -- we don't really need to think about it ahead of time. Actually, I think it is kind of important to think about it ahead of time. You really need to define the question that you want to answer at the beginning. You need to identify an appropriate patient sample that will work for answering that question. You have to apply appropriate and rigorous statistical analysis to what is hopefully very high-quality array data.

The result of this should be genes which carry the information relevant to the question that you have posed. At this point, one should develop a formal classifier that can be used to apply to an unknown test set to demonstrate the validity of this analysis. I think it is really rather crucial that this whole process be validated on an additional, completely independent sample set. I think a lot of the publications that you may have seen so far in this field, in other types of cancer, may not necessarily hold up in the long run, when they are validated with additional sample sets. So, it is very important, I think, to have this type of validation as in other types of clinical research, that we should go through a pilot phase followed by a validation phase.

So, what are some of the obstacles to sarcoma microarray studies? I think I should say at the beginning certain things that aren't obstacles. It is not an obstacle to worry about whether we are looking at microdissected samples or relatively larger pieces of tissue. I think larger pieces of tissue are satisfactory because they are easier to process -- data can be generated more rapidly. The analysis will still withstand this in the majority of cases, and it will be possible to go back and assign specific genes to particular cellular components of a tumor tissue by in situ procedures, using the tissue microarray technique, for example. So, I don't think that is terribly important.

The power of the technology is itself not so much an issue any more. Very accurate information can be obtained. A large portion of the genome, it is routine to get data on 20 or 30 thousand genes in a single hybridization. So, you can get most of the genome at this point. There are fairly routine computational tools available. It still requires the participation of experts in bioinformatics and mathematics, who are still in short supply. The tools are relatively standard for analysis that have been very, very useful. So, these things are not tremendous obstacles at the present time.

What I see in sarcoma as being a problematic area would be primarily issues such as obtaining adequate sample numbers. This has been alluded to already. A decent-sized microarray study in the current era involves hundreds of samples. To obtain hundreds of samples across a spectrum of disease that has a lot of heterogeneity, and to represent each subset with adequate numbers is quite a challenge. I think this can only be done in the context of multi-institutional collaborative studies.

One way to think about this, it doesn't mean that we should pack up and not do this because we can't get enough samples. You can think of this in terms of the resolution of an image. You can start out with a relatively small number of samples and you will see from that small number of samples the gross separations between tumor types that are easy to distinguish. This probably won't come as a big surprise to anybody -- but the more samples you add in each subset, the better separation that you will see. But adequate sample numbers are clearly a major issue.

Adequate sample quantity is an enormous problem, especially when talking about diseases that are frequently biopsied with skinny needle biopsy. It is conceivable that you can do microarray studies on 1,000 cells; but it is really not optimal to do that. So, core biopsies or open biopsies are really a preferable material. I would argue that if anyone believes that to take better care of their patients, they need to understand the biology, then they need to obtain adequate tissues.

I think every time a patient with a rare disease gets treated without tissue being banked for future research in adequate quantity and quality to do the things that are going to be the genomics of cancer for the 21st century, I think that is really a shame. It just delays the day until we get answers about what is going on with that type of patient. So, adequate sample quantity is critical.

Links to clinical trials, we certainly want to have links to clinical trials, so that the samples are not just characterized pathologically, but characterized with respect to all the clinical variables that matter that we know about at the time the study is done.

Finally, we need to establish interdisciplinary collaborations that include the clinicians caring for the patients, the pathologist, the molecular biologist, and the biostatistician. All of these things have to happen if you are really going to get to the bottom of the mystery of sarcomas.

That is basically what I had to say, hopefully getting us closer to back on time.

TOP