SLIDES & TRANSCRIPTS
Monday, May 12, 2003

Gene profiling in ALL

Cheryl Willman, M.D.

Slide 1:

Good morning. The topic that Bill and Hagop have given me is gene expression profiling in acute lymphoblastic leukemia, and I will quickly go through some of our own data and end with some recommendations.

TOP

Slide 2:

The three main questions in this field, as all of you know is, will gene expression profiling allow us to discover novel inherent biologic subtypes of leukemia, that we currently don't appreciate using traditional morphologic, cytogenetic and immunophenotyping criteria?

Can we use gene expression profiling to improve risk stratification and outcome prediction? Can we use it to lead to identification of new pathways and targets for therapy?

TOP

Slide 3:

I won't go through this slide in any detail, but most of you are aware that there are two general computational approaches to this data.

How you preprocess your data is critical. It can cause tremendous differences in the actual conclusions and clustering patterns that you develop.
How you select the genes that you are going to do your analysis on is very critical, and there are two main approaches in this field, unsupervised learning, in which you take all your gene expression arrays without knowledge of any particular clinical and genetic parameter, and use any number of algorithms, hierarchical clustering, self organizing maps, principal component analysis to discover intrinsic biology.

The second approach, which is very important for modeling outcome and developing potential targets and new pathways is called supervised or machine learning, in which you know a particular parameter that you are trying to model, such as outcome.

You develop a training set of your cases, try to model the gene expression profile that correlates with that parameter and, if you are really careful, you always take that into a test set to validate it.

Major approaches here are something called support vector machines, neural networks and Bayesian networks. It is very important, without further elaboration, to do some sort of statistical analysis of your conclusions, because it is clear there often isn't one right answer in this field.

TOP

Slide 4:

Most of you have seen the data that we have generated on 127 infants, registered to a pediatric oncology trial, and I will just show a couple of slides from that.

The sample set consisted of 79 children classified as ALL and 48 AML.

TOP

Slide 5:

When we use a wonderful unsupervised learning program to discover intrinsic biology called VXInsight developed at Sandia National Laboratories, a very powerful clustering algorithm, we find three distinct groups, as many of you have seen.

This is displayed in a mountain and terrain motif.

TOP

Slide 6:

When we ask if those three groups are related to the ALL versus AML morphology, they are not. I think this is very interesting.

That top group, now, we have finished the analysis of this pathway and submitted this to a publication as interesting, because this leukemia in infants truly looks to be a very immature stem cell leukemia with features of almost hemangioblasts, with activation of endothelial and cell growth pathways, very interesting perturbations of the TGF beta and bone morphogenetic proteins signaling pathway.

We have been able to model lots of new targets and diagnostics genes in this particular group that are predictive of outcome.

A second group of cases, predominantly lymphoid, are very interesting, in that we see activation in many lymphoid antigens, as you might expect, and some interesting genes associated with viruses, which we are still sorting out.

Finally, the last group is a mixture of AML and ALL cases, which is quite interesting because the common shared feature of these cases are perturbations in the ras signaling pathway in particular.

I think it is interesting that we may be able to use gene expression arrays to cluster and identify cases that would benefit from ras signaling agents that are mixtures of ALL and AML.

TOP

Slide 7:

When we look at the MLL cases with MLL gene rearrangements in this group, as you have seen, you can find MLL rearrangements in each group shown in blue.

What is interesting is that there are two very distinct groups. Both A and B clusters on the top and the left have a t(4;11) translocation in their MLL gene rearrangements, in the cases that have them.

Those two t(4;11) MLL translocation sets have very different gene expression profiles and different targets. Most of the MLL cases in that third group associated with perturbated ras signaling have t(9;11), t(10;11), t(1;11) and other MLL variant translocations.

TOP

Slide 8:

We have gone on and used support vector machines and Bayesian nets to model novel targets for each of the variant MLL translocations.

It is particularly interesting that the t(9;11) translocations again have perturbation of the ras signaling pathway, t(11;19) genes in the IL-2 pathway, and very novel sets of genes with t(4;11).

I see Scott Armstrong here from the Farber. I think one of the really interesting things about this technology is that, despite all the caveats, despite the fact that we all do it differently, it is interesting to me how similar the profiles are that we tend to derive in different labs. I think that is kind of reassuring.

TOP

Slide 9:

When we look at Flt-3 expression expression in different MLL variants, we see something very interesting, in that high levels of Flt-3 are not uniform in the MLL cases, but quite distinct, depending on genotype. So, we need to think about that in terms of clinical trial design using Flt-3 inhibitors, particularly high levels in t(4;11), t(11;19) and t(9;11), but not t(1;11), t(10;11) and other MLL translocations.

TOP

Slide 10:

I want to use another couple of minutes to summarize our developing data in pediatric B cell ALL, which as you know, we currently risk stratify in the United States according to different genotypes which have, over many, many years been associated with different outcomes.

Children with trisomies, or t(12;21), are treated on less intense regimens than children with t(4;11) or t(9;22).

TOP

Slide 11:

So, one of the things we wanted to do was determine whether we could do better at risk classification in pediatric B cell ALL using gene expression profiling, or could we refine risk classification, and could we define novel genes associated with outcome.

So, much like Jim Downing, we first started with a set of 255 cases. The actual cohort design of this cohort of POG, former pediatric ALL cases, is very complicated. It is a set of interior sort of case control studies, in which cases were balanced for genotype and failure or response within a genotype.

So, within the cohort of 255 cases will be a series of t(9;22)s that equally responded or failed, a series of t(4;11) that equally responded or failed, a series of t(12;21)s, children who had induction failure or did not.

All the survival in the cases selected were out well over four years. So, we are looking at follow up data at four years.

Now, taking supervised learning methods, in which we are going to model gene expression profiles for karyotype, as did Jim, we could come up with gene expression profiles that were quite statistically significantly associated with hyperdiploid, t(12;11), t(2;11), t(1;19).

Our t(9;22)s, we had a hard time modeling, and I think you will see why in a minute. Data emerging from Mike Cleary's group at Stanford shows the same thing, that the t(9;22) cases are very heterogeneous in their gene expression profiles.

TOP

Slide 12:

So, if we take in our hands 147 genes that we select up front to be predictive of these karyotypes -- in other words, I can take 147 genes out of the 13 we analyzed and predict which karyotype a child has. I cluster, using only those 147 genes in an unsupervised way, and I can draw a pretty picture for you.

TOP

Slide 13:

[No text is associated with this slide.]

TOP

Slide 14:

However, if I go back into VXInsight and say, I want to look at the expression of all of the genes in these 255 cases, and understand what the intrinsic groups are, we come up with six very interesting groups of B cell ALL, A,B,C,X,Y,Z.

These have very distinct gene expression profile. In particular, as I will show you in a minute, C and Z contain almost all the hyperdiploid and t(12;21) cases, but two very distinct gene expression profiles of the C and Z. A and B have most of the t(1;19)s, the t(9;22)s are scattered throughout.

What is interesting is this X group has a lot of perturbations, again, of the ras pathway and DNA repair genes, which I think is interesting. Finally, embedded in here were two groups of T cell ALL with very different gene expression profiles.

We are currently doing the sort of analysis of variance to come up with the genes that are most statistically associated with each of these peaks, and that is nearly finished.

I won't go through the cytogenetics.

TOP

Slide 15:

One of the real exciting things, though, about this cohort, in my last two slides, is we were able to model outcome in this group quite successfully.

We are really excited about this because it has led to some novel genes that we hope will be very powerful in this disease.

So, using Bayesian networks, we went in and asked which children have long-term response, CCR, which children had lower event-free survival or induction failure, and modeling through 13,000 genes, we come up with one, which is called G0, and you win a prize at this meeting if you help us name this new gene.

It wasn't fully cloned in the human genome project. It is a novel receptor we have now cloned. It appears to be a novel receptor family. We think it is really, really interesting, but we would like to have a better name than G0.

G0 is a mathematical term in a Bayesian net for the gene that has the most power to discriminate class. So, this gene, G0, is discriminating at high levels children who do extremely well, at low levels children who have lower event-free survival and induction failure.

The expression of G0 is modified by another gene, G1, which further splits the groups, and another gene, G2.

TOP

Slide 16:

So, what are these genes? The first gene, G0, as I said, is a completely novel gene which only had an EST and from the human genome project, we have now fully cloned. It is a novel transmembrane receptor.

G1 is a very interesting novel signaling G protein which binds to protein kinase C, and G2 is the IL-10 receptor alpha.

What is really interesting in the model we are doing with G0 is we find that either low or high G0 really predicts response in this disease.

Low G0 is associated with induction failure quite significantly. High G0 is associated with event-free survival, particularly on AlinC #15, and particularly in males. It is stronger in males than females, something we don't quite understand yet, which I think is interesting.

High G0 is associated with higher event-free survival in patients with t(12;21) and normal karyotype. Indeed, if you have high G0 and a t(12;21), you are the child who tends to live for a very long time. If you have low G0 and a t(12;21), you are the child who does not have as long event-free survival.

Our numbers are quite small in some of the subsets, as we break them out cytogenetically. So, obviously, we are validating this.

Interestingly, when we take G0 back into the St. Jude data set that is now publicly available on the website, we find that G0 is quite predictive of outcome in this very independently derived data set overall, and we also find that it predicts outcome in infants with ALL in clusters A and C.

TOP

Slide 17:

So, to close, what are my recommendations about gene expression profiling in pediatric and adult ALL? It upsets me to think that Rick Klausner's vision of the director's challenge might not be sustained.

So, I think it is very critical that the NCI continue funding streams of grants of collaborative investigators to do this kind of work.

I think, because we did it for five years, we are nowhere near done, we are just beginning, and it is really important to sustain funding for this kind of initiative. It is not clear whether that is going to happen.

It is absolutely critical to validate. So, once we develop data sets to come up with these very interesting new genes that predict for response, or give us insights into new biology, we have to go on to new well-defined clusters to validate.

I think it is really essential to design, study and compare pediatric and adult cohorts of ALL. What is the difference between kids with ALL with the same karyotype and adults, or kids and adults who, in these intrinsic novel biologic clusters, it is really important to do classification outcome modeling, and identification of new targets.

Again, many of us come from cooperative groups. I think it is so essential, we have learned, to focus on uniformly treated patient cohorts, who have very high quality outcome and correlative science data.

Obviously, you are as good in this field as you are with your statisticians and computational folks. So, it is really important to link the groups who have the patient cohorts to good, technical cores and good informatic and biocomputing cores, and you just have to have great biostatisticians and great biocomputing.

Another thing that is really important for all of us, I think, as we go forward is maintaining updated, constantly reannotated data as gene identification continues to push through the human genome project, publicly available websites that contain all of the ALL data, as well as data on other cancers, where the actual quantitative cell files are available for reanalysis by other groups, and the clinical data is actually present so that other groups can play with modeling.

So, public availability of everyone's data in a single website, I think, is really important.

Then I think a real power of these array cohorts, as we are now doing in the Children's Oncology Group, is taking cohorts we have designed for arrays for proteomic, pharmacogenomic and epidemiologic studies. Thank you very much.

TOP

Slide 18:

[No text is associated with this slide.]

TOP