








|
|
| SLIDES
& TRANSCRIPTS
Monday,
May 12, 2003
Gene
profiling in ALL
Cheryl
Willman, M.D. |
| Slide
1: |
Good
morning. The topic that Bill and Hagop have given me is gene expression
profiling in acute lymphoblastic leukemia, and I will quickly
go through some of our own data and end with some recommendations.
TOP
|
| Slide
2: |
| The
three main questions in this field, as all of you know is, will
gene expression profiling allow us to discover novel inherent biologic
subtypes of leukemia, that we currently don't appreciate using traditional
morphologic, cytogenetic and immunophenotyping criteria?
Can we use gene
expression profiling to improve risk stratification and outcome
prediction? Can we use it to lead to identification of new pathways
and targets for therapy?
TOP |
| Slide
3: |
| I
won't go through this slide in any detail, but most of you are aware
that there are two general computational approaches to this data.
How you preprocess
your data is critical. It can cause tremendous differences in the
actual conclusions and clustering patterns that you develop.
How you select the genes that you are going to do your analysis
on is very critical, and there are two main approaches in this field,
unsupervised learning, in which you take all your gene expression
arrays without knowledge of any particular clinical and genetic
parameter, and use any number of algorithms, hierarchical clustering,
self organizing maps, principal component analysis to discover intrinsic
biology.
The second approach,
which is very important for modeling outcome and developing potential
targets and new pathways is called supervised or machine learning,
in which you know a particular parameter that you are trying to
model, such as outcome.
You develop
a training set of your cases, try to model the gene expression profile
that correlates with that parameter and, if you are really careful,
you always take that into a test set to validate it.
Major approaches
here are something called support vector machines, neural networks
and Bayesian networks. It is very important, without further elaboration,
to do some sort of statistical analysis of your conclusions, because
it is clear there often isn't one right answer in this field.
TOP |
| Slide
4: |
|
Most
of you have seen the data that we have generated on 127 infants,
registered to a pediatric oncology trial, and I will just show a
couple of slides from that.
The sample set
consisted of 79 children classified as ALL and 48 AML.
TOP |
| Slide
5: |
| When
we use a wonderful unsupervised learning program to discover intrinsic
biology called VXInsight developed at Sandia National Laboratories,
a very powerful clustering algorithm, we find three distinct groups,
as many of you have seen.
This is displayed
in a mountain and terrain motif.
TOP |
| Slide
6: |
| When
we ask if those three groups are related to the ALL versus AML morphology,
they are not. I think this is very interesting.
That top group,
now, we have finished the analysis of this pathway and submitted
this to a publication as interesting, because this leukemia in infants
truly looks to be a very immature stem cell leukemia with features
of almost hemangioblasts, with activation of endothelial and cell
growth pathways, very interesting perturbations of the TGF beta
and bone morphogenetic proteins signaling pathway.
We have been
able to model lots of new targets and diagnostics genes in this
particular group that are predictive of outcome.
A second group
of cases, predominantly lymphoid, are very interesting, in that
we see activation in many lymphoid antigens, as you might expect,
and some interesting genes associated with viruses, which we are
still sorting out.
Finally, the
last group is a mixture of AML and ALL cases, which is quite interesting
because the common shared feature of these cases are perturbations
in the ras signaling pathway in particular.
I think it is
interesting that we may be able to use gene expression arrays to
cluster and identify cases that would benefit from ras signaling
agents that are mixtures of ALL and AML.
TOP |
| Slide
7: |
| When
we look at the MLL cases with MLL gene rearrangements in this group,
as you have seen, you can find MLL rearrangements in each group
shown in blue.
What is interesting
is that there are two very distinct groups. Both A and B clusters
on the top and the left have a t(4;11) translocation in their MLL
gene rearrangements, in the cases that have them.
Those two t(4;11)
MLL translocation sets have very different gene expression profiles
and different targets. Most of the MLL cases in that third group
associated with perturbated ras signaling have t(9;11), t(10;11),
t(1;11) and other MLL variant translocations.
TOP |
| Slide
8: |
|
We
have gone on and used support vector machines and Bayesian nets
to model novel targets for each of the variant MLL translocations.
It is particularly
interesting that the t(9;11) translocations again have perturbation
of the ras signaling pathway, t(11;19) genes in the IL-2 pathway,
and very novel sets of genes with t(4;11).
I see Scott
Armstrong here from the Farber. I think one of the really interesting
things about this technology is that, despite all the caveats, despite
the fact that we all do it differently, it is interesting to me
how similar the profiles are that we tend to derive in different
labs. I think that is kind of reassuring.
TOP |
| Slide
9: |
| When
we look at Flt-3 expression expression in different MLL variants,
we see something very interesting, in that high levels of Flt-3
are not uniform in the MLL cases, but quite distinct, depending
on genotype. So, we need to think about that in terms of clinical
trial design using Flt-3 inhibitors, particularly high levels in
t(4;11), t(11;19) and t(9;11), but not t(1;11), t(10;11) and other
MLL translocations.
TOP |
| Slide
10: |
| I
want to use another couple of minutes to summarize our developing
data in pediatric B cell ALL, which as you know, we currently risk
stratify in the United States according to different genotypes which
have, over many, many years been associated with different outcomes.
Children with
trisomies, or t(12;21), are treated on less intense regimens than
children with t(4;11) or t(9;22).
TOP |
| Slide
11: |
|
So, one
of the things we wanted to do was determine whether we could do
better at risk classification in pediatric B cell ALL using gene
expression profiling, or could we refine risk classification, and
could we define novel genes associated with outcome.
So, much like
Jim Downing, we first started with a set of 255 cases. The actual
cohort design of this cohort of POG, former pediatric ALL cases,
is very complicated. It is a set of interior sort of case control
studies, in which cases were balanced for genotype and failure or
response within a genotype.
So, within the
cohort of 255 cases will be a series of t(9;22)s that equally responded
or failed, a series of t(4;11) that equally responded or failed,
a series of t(12;21)s, children who had induction failure or did
not.
All the survival
in the cases selected were out well over four years. So, we are
looking at follow up data at four years.
Now, taking
supervised learning methods, in which we are going to model gene
expression profiles for karyotype, as did Jim, we could come up
with gene expression profiles that were quite statistically significantly
associated with hyperdiploid, t(12;11), t(2;11), t(1;19).
Our t(9;22)s,
we had a hard time modeling, and I think you will see why in a minute.
Data emerging from Mike Cleary's group at Stanford shows the same
thing, that the t(9;22) cases are very heterogeneous in their gene
expression profiles.
TOP |
| Slide
12: |
|
So,
if we take in our hands 147 genes that we select up front to be
predictive of these karyotypes -- in other words, I can take 147
genes out of the 13 we analyzed and predict which karyotype a child
has. I cluster, using only those 147 genes in an unsupervised way,
and I can draw a pretty picture for you.
TOP |
| Slide
13: |
| [No
text is associated with this slide.]
TOP |
| Slide
14: |
| However,
if I go back into VXInsight and say, I want to look at the expression
of all of the genes in these 255 cases, and understand what the
intrinsic groups are, we come up with six very interesting groups
of B cell ALL, A,B,C,X,Y,Z.
These have very
distinct gene expression profile. In particular, as I will show
you in a minute, C and Z contain almost all the hyperdiploid and
t(12;21) cases, but two very distinct gene expression profiles of
the C and Z. A and B have most of the t(1;19)s, the t(9;22)s are
scattered throughout.
What is interesting
is this X group has a lot of perturbations, again, of the ras pathway
and DNA repair genes, which I think is interesting. Finally, embedded
in here were two groups of T cell ALL with very different gene expression
profiles.
We are currently
doing the sort of analysis of variance to come up with the genes
that are most statistically associated with each of these peaks,
and that is nearly finished.
I won't go through
the cytogenetics.
TOP |
| Slide
15: |
|
One of
the real exciting things, though, about this cohort, in my last
two slides, is we were able to model outcome in this group quite
successfully.
We are really
excited about this because it has led to some novel genes that we
hope will be very powerful in this disease.
So, using Bayesian
networks, we went in and asked which children have long-term response,
CCR, which children had lower event-free survival or induction failure,
and modeling through 13,000 genes, we come up with one, which is
called G0, and you win a prize at this meeting if you help us name
this new gene.
It wasn't fully
cloned in the human genome project. It is a novel receptor we have
now cloned. It appears to be a novel receptor family. We think it
is really, really interesting, but we would like to have a better
name than G0.
G0 is a mathematical
term in a Bayesian net for the gene that has the most power to discriminate
class. So, this gene, G0, is discriminating at high levels children
who do extremely well, at low levels children who have lower event-free
survival and induction failure.
The expression
of G0 is modified by another gene, G1, which further splits the
groups, and another gene, G2.
TOP |
| Slide
16: |
|
So, what
are these genes? The first gene, G0, as I said, is a completely
novel gene which only had an EST and from the human genome project,
we have now fully cloned. It is a novel transmembrane receptor.
G1 is a very
interesting novel signaling G protein which binds to protein kinase
C, and G2 is the IL-10 receptor alpha.
What is really
interesting in the model we are doing with G0 is we find that either
low or high G0 really predicts response in this disease.
Low G0 is associated
with induction failure quite significantly. High G0 is associated
with event-free survival, particularly on AlinC #15, and particularly
in males. It is stronger in males than females, something we don't
quite understand yet, which I think is interesting.
High G0 is associated
with higher event-free survival in patients with t(12;21) and normal
karyotype. Indeed, if you have high G0 and a t(12;21), you are the
child who tends to live for a very long time. If you have low G0
and a t(12;21), you are the child who does not have as long event-free
survival.
Our numbers
are quite small in some of the subsets, as we break them out cytogenetically.
So, obviously, we are validating this.
Interestingly,
when we take G0 back into the St. Jude data set that is now publicly
available on the website, we find that G0 is quite predictive of
outcome in this very independently derived data set overall, and
we also find that it predicts outcome in infants with ALL in clusters
A and C.
TOP |
| Slide
17: |
|
So, to
close, what are my recommendations about gene expression profiling
in pediatric and adult ALL? It upsets me to think that Rick Klausner's
vision of the director's challenge might not be sustained.
So, I think
it is very critical that the NCI continue funding streams of grants
of collaborative investigators to do this kind of work.
I think, because
we did it for five years, we are nowhere near done, we are just
beginning, and it is really important to sustain funding for this
kind of initiative. It is not clear whether that is going to happen.
It is absolutely
critical to validate. So, once we develop data sets to come up with
these very interesting new genes that predict for response, or give
us insights into new biology, we have to go on to new well-defined
clusters to validate.
I think it is
really essential to design, study and compare pediatric and adult
cohorts of ALL. What is the difference between kids with ALL with
the same karyotype and adults, or kids and adults who, in these
intrinsic novel biologic clusters, it is really important to do
classification outcome modeling, and identification of new targets.
Again, many
of us come from cooperative groups. I think it is so essential,
we have learned, to focus on uniformly treated patient cohorts,
who have very high quality outcome and correlative science data.
Obviously, you
are as good in this field as you are with your statisticians and
computational folks. So, it is really important to link the groups
who have the patient cohorts to good, technical cores and good informatic
and biocomputing cores, and you just have to have great biostatisticians
and great biocomputing.
Another thing
that is really important for all of us, I think, as we go forward
is maintaining updated, constantly reannotated data as gene identification
continues to push through the human genome project, publicly available
websites that contain all of the ALL data, as well as data on other
cancers, where the actual quantitative cell files are available
for reanalysis by other groups, and the clinical data is actually
present so that other groups can play with modeling.
So, public availability
of everyone's data in a single website, I think, is really important.
Then I think
a real power of these array cohorts, as we are now doing in the
Children's Oncology Group, is taking cohorts we have designed for
arrays for proteomic, pharmacogenomic and epidemiologic studies.
Thank you very much.
TOP |
| Slide
18: |
|
[No text
is associated with this slide.]
TOP |
|