Thank you, Scott. It is a pleasure to be here and give you my
perspective on sarcoma research. I really have to thank the preceding
speakers for setting the stage so well. Especially, the previous
talk was so incredibly lucid and a great prelude to what I am
going to talk about I will continue to try to make the case for
splitting rather than lumping, and I will give you my argument
right now. If we are to develop disease-specific, targeted therapies,
which I think is what everyone wants to do for all forms of cancer,
not just sarcoma, then we must be able to characterize the specific
changes that will turn out to be the good targets. We can only
do that by truly facing the somewhat daunting complexity of sarcomas
and really developing an appropriate classification scheme that
recognizes their molecular complexity.
I am going to right away tell you that I am going to dodge all
the issues about how the things I am going to talk about might
be used in a practical, everyday clinical setting. I think one
could talk for hours about that problem. As far as I am concerned,
what I am talking a`out is purely a research technology, which
is creeping into clinical use, but I think that is sort of a separate
topic. Right now, we have a tremendous need to do these things
in a research context.
I work in
the Genome Institute, and I can't give any talk without pointing
out the incredible progress that we now stand at the end of this
timeline with the availability of the human genomic sequence to
us.
For cancer
biologists, it is a radically different situation resulting from
this effort, which was reported just a little over a year ago.
The complete, proofread, corrected human genome sequence is anticipated
to be complete by next spring, and it puts us in a position now
to take whole-genome approaches to biological questions.
These include
three areas that are all relevant to sarcoma research -- gene
expression at the genome scale, gene variation, and gene function.
I am going to focus just on this top issue.
This is a picture of the MIT Whitehead Sequencing Center. I show
this just to illustrate that we have gone through kind of a culture
shift in the way we do biology, in that robotics, computational
biology, have come to the forefront. You see this rather large
lab with a small number of people and a lot of machines. This
is really, I think, the future of modern biology.
This just
gives you the idea of how incredibly rapidly microarrays have
taken off since the first paper came out in 1995. This is just
the result of a PubMed search.
What you can
see is a number of publications increasing three-and-a-half fold
every year, with over 1,000 publications last year.
Why has this
technology taken the world by storm? It is very simple. It gives
us a way to generate the mass quantities of data that we, in the
past, have had to approach in a gene-by-gene way, and provides
a detailed gene expression phenotype of any biosample of interest
in very rapid and quantitative fashion.
So, this is
the type of technology that we use in our laboratory -- the printed
cDNA microarray. These are hybridized with representations of
the cellular mRNA pool, they generate these familiar red/green
patterns of spots. We don't stare at spots, but rather convert
them into numbers, so the computer software keeps track of what
is present at each spot and delivers to us a large spreadsheet
of numbers, which then becomes the purview of the computational
biologist.
From a practical
point of view, in clinical studies with tumors, I think this is
an important slide. It just indicates that the technology is somewhat
regulated by sample acquisition. Essentially, we can work comfortably
with samples from one microgram of RNA on up, corresponding to
something like 100,000 cells on up. As you go below that, it becomes
harder and harder, and no one is really claiming to do single-cell
expression profiling quite yet. So, sample acquisition and adequate
sample size becomes a terribly important issue, especially with
sarcomas and other diseases which may be sampled without open
biopsy.
So, what have
we learned from the expression profiling of cancers so far? First,
it is very clear that distinct histologies have distinct patterns
of gene expression. So, different forms of cancer have very clearly
recognizable patterns of gene expression that are easy to separate.
Using expression data, you can actually develop formal rules,
which can be used to take an unknown sample, test it, and classify
it accurately. There is, again, no doubt that this can be done,
and many, many publications support these first two points.
You can, indeed,
recognize novel subgroups that you didn't anticipate at the beginning
of a study. Some of these, indeed, seem to have clinical and/or
biological implications. This is perhaps a good point for me to
mention that you can take the large amount of data which is generated
by a microarray study and analyze it in one of two fashions.
You can either
do what we call unsupervised analysis -- that means taking all
the data, putting it in the hopper of the computational machine,
turning the crank and seeing what comes out. If everything goes
as you hope, tumors will separate out by themselves spontaneously
into interesting and relevant groups that may tell you something
you really didn't expect a priori.
This contrasts with supervised analysis, where we take the samples
and annotate them as carefully as we can with as much information
as we can get from all forms of classification -- the clinical
parameters, expert pathological classification, all possible molecular
markers and so on -- and then look for features in the microarray
data which separate samples of, let us say, different grade, different
propensity to metastasis, different translocation, and the like.
So, both of these have a place in doing this kind of analysis.
You can certainly
find incredibly interesting genes for follow-up studies that you
wouldn't have thought of any other way, and this has certainly
turned out to be an incredible hypothesis-generating tool for
cancer research, and that is certainly one of its most important
roles.
So, what do
you get out of a microarray experiment? Primarily, you get a single
thing, a table of expressed genes, the genes that are present
in a given sample. You get their quantitative level of expression.
So, you know the relative expression of the genes on this list.
Hopefully, if you have got the right samples, you learn something
about their disease or tissue specificity and, through appropriate
bioinformatics, you can place them in relevant pathways.
Of course,
the hope is that -- really, the credo of the field is that --
within this table there must be information which relates to the
important clinical behavior of the tumors; theeir fundamental
underlying molecular mechanism of pathogenesis; and the future
markers and drug targets that we want to discover.
So, what are some potential implications of all of this for microarray
studies of sarcomas? I am going to just give some very simple
illustrative examples, mainly for the benefit of people who aren't
familiar with this field. We would hope to get an improved classification
of tumors. This is just an example from the published data from
our lab on small blue round cell tumors. These relatively similar
and, of course, distinguishable -- in expert hands -- diseases.
In this case two lymphomas a rhabdo and a neuroblastoma.
When you look
at their behavior on a microarray analysis, we were able to achieve
very sharp separation of these four types of cancer, according
to their pattern of gene expression.
So, each of
these balls represents a picture of where the gene expression
profile for an individual sample plots in an arbitrary, three-dimensional
space. So, it is a very, very clear, sharp separation.
What is included
in this separation? This is just a typical 2-D plot that we are
used to seeing now from many, many publications.
We see, for example, in Ewing sarcoma, we pick up a myc-2 as a
marker for Ewing's sarcoma, tending to validate that we pick up
things that are recognized in conventional pathology. IGF-2 for
rhabdomyosarcoma is a strong marker for that disease.
So, you find
the things you do tend to expect, and this happens again and again.
Here is a separation of synovial sarcoma from the reference group,
a separation of gastrointestinal stromal tumor from the reference
group, and essentially every tumor that we have looked at has
followed the rule of having its characteristic pattern of gene
expression, with varying levels of difficulty possible to dig
out from this kind of data.
So, how far
can this be taken? Can it be taken to the level of separating
tumors that are really extremely similar, such as these morphologic
subsets of leiomyosarcoma? I would argue that this is probably
the case, if sufficient samples can be accrued to give statistical
significance to this kind of analysis. I think our goal, with
this type of research, really ought to be to develop a complete
molecular taxonomy of all forms of sarcoma.
So, that is
the first thing we would hope to get -- improved classification.
Next would be some information regarding gene discovery. So, we
really want to discover the genes that drive sarcoma growth, metastasis,
angiogenesis, and the like, as Marc mentioned so clearly a few
minutes ago.
Microarrays can be used in gene discovery in a couple of different
ways. One would be, again, through the tumor-specific gene expression
profile. Not as widely disseminated is the concept that we can
also look for genomic alterations with microarrays. I will just
illustrate that very quickly through searching for copy number
changes -- that is, gain or loss of chromosomal segments -- with
microarrays. So, there are lots of known amplifications and deletions
in various types of sarcomas, as illustrated on this partial table,
and we would like to be able to map these out in more detail to
help to identify some of the crucial genes that are driving sarcoma
growth.
It turns out
that you can also do comparative genomic hybridization, a technique
for looking at copy number change on cDNA microarrays.
This is just
an example of a genome scan of a single soft tissue tumor. So,
this is now the entire human genome spread out on this axis, and
here are the relative gene copy numbers on this axis.
What may not
be instantly obvious to you, out of this sort of noise here --
because most genes aren't changing -- we see some spikes, and
these spikes represent regions of increased copy number at specific
portions of the genome.
The more you do these types of experiments, the more you realize
that we been only seeing a very small part of the picture of complexity
of copy number change in sarcomas. Here, you can see three regions
that are showing very nice spikes supported by multiple array
elements.
Here is a
magnification of part of that. Again, you can see two nice spikes
now in this particular segment of the genome, where there is clearly
a gene amplification event.
So, this would
be a portion of the genome that would be worth looking at for
genes which might be important in the growth of this tumor.
So, what one
has to do is collect a lot of data like this and look for consensus
regions of interest. What one would hope to do with this type
of CGH data is to correlate CGH data with the expression profile.
So, that reinforces the expression data by helping to identify
specific genes which are not only highly expressed, but selected
by the tumor, as evidenced by the increase in copy number.
We also want to be able to incorporate the CGH pattern into the
classification. You certainly get into the classification problem;
and if you can bring the relatively tumor-specific patterns of
gain into this, I think it is going to be possible to increase
the resolution that can be obtained by expression-profiling types
of analyses.
So, classification,
gene discovery. The third point I would make would be to recognize
key pathways or drug targets. This is not necessarily a simple
issue. There are a lot of genes to go through to find the ones
that are the most important.
Our experience
to date, I think, suggests that you will find interesting candidates.
You can certainly rule out candidates by discovering that they
are not expressed in a particular form of sarcoma.
The example
that I like best is this one from the study that we did of GIST.
This shows the gene expression profile and the set of GIST samples
over here.
It was interesting
to see that kit came out right in the middle of this very consistently
expressed pattern. What was perhaps even more remarkable was that
when we did what we call a weighted list or a weighted gene discriminator
analysis, we found that, amongst the top discriminators for GIST,
that kit was actually the strongest single gene in our analysis.
So, just by fairly straightforward computational analysis, we
were able to come up with kit as the top gene of interest for
GIST. Certainly, other cases won't necessarily be so simple, but
this gives you some feeling of optimism. That interesting genes
do tend to jump out at you when you are doing this type of research.
I will show
an example of how this can be extended to other forms of sarcoma.
So, this is just what we call a random permutation analysis, comparing
several forms of sarcoma. This is the curve that would be expected
at random for the weight or statistical strength of a gene at
random, and this is the actual data that is obtained.
This gap between
the actual curve and the randomized curve indicates that an overabundance
of informative genes -- so, these are the ones that we want to
discover that are disease specific. When this is plotted in a
more conventional way, you get this type of plot. So, looking
from left to right, each of those color bars represents a different
form of sarcoma, and the genes plotted underneath it are the ones
that are relatively specific for that type of sarcoma. As you
can see, they tend to separate out rather sharply. So, these would
be the genes related to the histogenesis or the specific genetic
alterations in each of these tumor types. One would hope that,
within this set of genes, would be ones that would be worth following
up in more detail.
So, how do
you go about actually doing good clinical correlative studies
using microarrays? I think it is actually rather challenging.
I think, at the beginning -- although this has been called "discovery-driven"
research -- we don't really need to think about it ahead of time.
Actually, I think it is kind of important to think about it ahead
of time. You really need to define the question that you want
to answer at the beginning. You need to identify an appropriate
patient sample that will work for answering that question. You
have to apply appropriate and rigorous statistical analysis to
what is hopefully very high-quality array data.
The result
of this should be genes which carry the information relevant to
the question that you have posed. At this point, one should develop
a formal classifier that can be used to apply to an unknown test
set to demonstrate the validity of this analysis. I think it is
really rather crucial that this whole process be validated on
an additional, completely independent sample set. I think a lot
of the publications that you may have seen so far in this field,
in other types of cancer, may not necessarily hold up in the long
run, when they are validated with additional sample sets. So,
it is very important, I think, to have this type of validation
as in other types of clinical research, that we should go through
a pilot phase followed by a validation phase.
So, what are
some of the obstacles to sarcoma microarray studies? I think I
should say at the beginning certain things that aren't obstacles.
It is not an obstacle to worry about whether we are looking at
microdissected samples or relatively larger pieces of tissue.
I think larger pieces of tissue are satisfactory because they
are easier to process -- data can be generated more rapidly. The
analysis will still withstand this in the majority of cases, and
it will be possible to go back and assign specific genes to particular
cellular components of a tumor tissue by in situ procedures, using
the tissue microarray technique, for example. So, I don't think
that is terribly important.
The power
of the technology is itself not so much an issue any more. Very
accurate information can be obtained. A large portion of the genome,
it is routine to get data on 20 or 30 thousand genes in a single
hybridization. So, you can get most of the genome at this point.
There are fairly routine computational tools available. It still
requires the participation of experts in bioinformatics and mathematics,
who are still in short supply. The tools are relatively standard
for analysis that have been very, very useful. So, these things
are not tremendous obstacles at the present time.
What I see
in sarcoma as being a problematic area would be primarily issues
such as obtaining adequate sample numbers. This has been alluded
to already. A decent-sized microarray study in the current era
involves hundreds of samples. To obtain hundreds of samples across
a spectrum of disease that has a lot of heterogeneity, and to
represent each subset with adequate numbers is quite a challenge.
I think this can only be done in the context of multi-institutional
collaborative studies.
One way to
think about this, it doesn't mean that we should pack up and not
do this because we can't get enough samples. You can think of
this in terms of the resolution of an image. You can start out
with a relatively small number of samples and you will see from
that small number of samples the gross separations between tumor
types that are easy to distinguish. This probably won't come as
a big surprise to anybody -- but the more samples you add in each
subset, the better separation that you will see. But adequate
sample numbers are clearly a major issue.
Adequate sample
quantity is an enormous problem, especially when talking about
diseases that are frequently biopsied with skinny needle biopsy.
It is conceivable that you can do microarray studies on 1,000
cells; but it is really not optimal to do that. So, core biopsies
or open biopsies are really a preferable material. I would argue
that if anyone believes that to take better care of their patients,
they need to understand the biology, then they need to obtain
adequate tissues.
I think every
time a patient with a rare disease gets treated without tissue
being banked for future research in adequate quantity and quality
to do the things that are going to be the genomics of cancer for
the 21st century, I think that is really a shame. It just delays
the day until we get answers about what is going on with that
type of patient. So, adequate sample quantity is critical.
Links to clinical trials, we certainly want to have links to clinical
trials, so that the samples are not just characterized pathologically,
but characterized with respect to all the clinical variables that
matter that we know about at the time the study is done.
Finally, we
need to establish interdisciplinary collaborations that include
the clinicians caring for the patients, the pathologist, the molecular
biologist, and the biostatistician. All of these things have to
happen if you are really going to get to the bottom of the mystery
of sarcomas.
That is basically
what I had to say, hopefully getting us closer to back on time.
TOP