SLIDES & TRANSCRIPTS
Tuesday, February 1, 2000

New Agents and Strategies
Elihu Estey, MD

Slide 1:

DR. LARSON: Thank you very much, Stan. We will ask Eli Estey to come up and present the final discussion this morning on new agents and strategies. Eli is from the University of Texas, M. D. Anderson Cancer Center.

DR. ESTEY: I am going to start by talking about limitations of some formerly new agents and particularly topotecan and fludarabine. I am going to try to convince everyone, and I think it will be obvious during the afternoon sessions, that there is no shortage of new agents. Given that, I think this raises a question, in whom should these new agents be studied because there are many to be studied?

Should a trial be limited to, say, multiply relapsed patients? Then I am going to talk about the limitations of traditional statistical designs, the Phase I, the standard Phase II designs in doing what we want to accomplish, and I will try to be brief.


TOP

Slide 2:

Beginning with fludarabine and topotecan, obviously M. D. Anderson has had a considerable experience investigating these agents. Here we are looking at survival probability in patients given topotecan or fludarabine, and I am not sure that the comparison with idarubicin or Ara-C would be germane. In fact, there is very little difference, but the point is with either of these two regimens, the FLAG, topotecan, Ara-C or whatever, there is certainly nothing to suggest that these things are really major breakthroughs in patients with a minus five or minus seven.

TOP

Slide 3:

Similarly, you can look at the results in patients with a normal karyotype or inversion 16 or t(8;21) which is the other side of the coin. Here we are looking at patients who got CAT or topotecan-Ara-C, and basically we are doing the time to death model. You can see that if you compare idarubicin versus CAT or topo-Ara-C versus CAT, there is nothing to suggest that these things are any different today than the baseline idarubicin-Ara-C regimen.

Certainly I think we have to recognize that even though these are very interesting drugs, perhaps, they certainly have their limitations.

 

TOP

Slide 4:

On the other hand, there is a plethora of new agents that are available. You can probably read these quicker than I can say them, but the point to stress is that these are actually all agents that are in trial or will soon be in trial or going through the review process at M. D. Anderson.

 

TOP

Slide 5:

There is another slide that shows the same thing, and you will hear about many of these, I am sure this afternoon.

 

TOP

Slide 6:

Of course, I do not want to leave out transplant in this slide that Dr. Gene Anderson gave me, and obviously there are other transplant approaches that could be investigated as well in the same way as the chemotherapy approaches.

 

TOP

Slide 7:

Then the question is: okay, if there are so many new things to investigate, how can we possibly investigate them all? One of the premises that I have and I feel very strongly about, and I will return to it a little bit later, is that it is very difficult to know a priori which of these is going to be successful and which is not.

I think you might say, "Okay, there is some real rationale for investigating these, and there are lots to investigate. In whom should we investigate them? Should we only investigate them in people after multiple failures -- which is often what has been done in the past? Should they be investigated in certain patients at first relapse? In other words, would it be fair, for example, to give somebody liposomal daunomycin or even something that is not an anthracycline or an Ara-C at first relapse?

Should they be given after a failure of the initial induction course? Traditionally, people get two induction courses. Should we think about abandoning that strategy at least in certain patients? Then finally, is it possible that in certain poor prognosis patients in whom the karyotype is unfavorable, should new agents be given right at the beginning?

 

TOP

Slide 8:

We have a little bit of data to throw some light on these topics, even though it is very biased. I will try to share it with you. Basically, here we are comparing high-dose Ara-C based regimens, and that can be FLAG or FLAG-IDA or Ara-C or whatever. We are comparing the results in patients who got high-dose Ara-C based regimens and patients who got investigational regimens at M.D. Anderson, and the investigational regimens are obviously very heterogeneous. They probably include some Phase I drugs, Phase II drugs, etc., and we will start off looking at patients who had a first CR of less than a year.

Everyone knows that this is the big prognostic factor for the success of a reinduction attempt, and the first thing to note is that the doctors at M. D. Anderson are really not sure what they want to do. In somebody who has had a CR of less than a year, well, about half of the patients got high-dose Ara-C based regimens and in the other half of the cases, the doctors felt, gee, you know, maybe I should get away from that and do something else. So this is obviously an issue of interest, I think at least to us. Basically, what you can see over here is that the CR rate is certainly higher in the patients who got the high-dose Ara-C than the investigational regimens. Certainly that is the case when you compare the high-dose Ara-C and the investigational regimens in people who had longer first CRs, and obviously these rates over here are higher than these rates over there. So then the question is, well, yes, according to this there is certainly a benefit in anybody at first relapse to give them a high-dose Ara-C based regimen based solely on the reinduction CR rate.

 

TOP

Slide 9:

Does this mean anything in terms of survival? Now, basically what we are showing here is that if you look at the survival of patients who had initial first CRs of less than a year according to whether they got high-dose Ara-C or an investigational regimen at relapse, there was very little difference.

 

TOP

Slide 10:

In contrast, if you look at the people who had a longer first CR according to whether they got high-dose Ara-C or an investigational regimen at first relapse, there is more of a difference.

Now, obviously, I would be the first to disclaim all these results because who is to say why one doctor gave Patient A high-dose Ara-C and one gave Patient B an investigational regimen, etc., but I think they are very interesting data.

 

TOP

Slide 11:

The next group of patients in whom we could consider looking at new agents early on are people who fail the initial course of therapy. As I said before, usually what is done is they get a second course of the same therapy.

In fact, here we are looking at 190 patients who failed course one of the initial treatment, and they received a second course. One thing that I always feel dumb about when I talk about these things is that these are all averages, and obviously there are some patients who do better than others, etc., but I am stuck with what I have for now. The bottom line is that the CR rate in these patients was 31 percent. The median survival time was 6 months, and the probability that they would be alive at 3 years is less than 10 percent. That is shown graphically over here, and

 

TOP

Slide 12:

so certainly I think it is germane to ask the question: should patients who fail on average, and obviously there are exceptions to this that would come up, but to me the question is should people who fail in initial course of chemotherapy necessarily get course two, or rather give them the huge number of things that we have to investigate? Should they get those things there rather than waiting for them to fail or die on the second course?

 

TOP

Slide 13:

Okay, I will try to briefly summarize some of what I have just been saying. If you look here, we are looking at the CR rate with high-dose Ara-C and median survival with high-dose Ara-C. Importantly, there is a survival advantage with high-dose Ara-C compared to investigational treatment in people who are getting their first salvage therapy who had a relatively long first CR. The answer is, as I tried to show, yes, there appears to be a survival advantage if you give them high-dose Ara-C.

So certainly our policy at M. D. Anderson is yes, we want to give them high-dose Ara-C, but obviously we are going to add new things to the high-dose Ara-C. The thing that we actually have just begun, and probably Dr. Gandhi will talk about it at the breakout session, is UCNO1 plus Ara-C, but they do get Ara-C.

In contrast, for the people who had short first remissions, less than a year, there doesn't appear to be any survival advantage with high-dose Ara-C. So certainly in our opinion, these people are appropriate candidates for Phase I or Phase II studies.

In people who fail the initial induction course, our policy has been in the past just to give them a second course. So we really don't know the answer, but basically now what we are going to do is to do a little study where people either will get a Phase II drug or high-dose Ara-C.

TOP
Slide 14:

The next topic that I would like to come to is the designs used to investigate these drugs.

Now, one thing I didn't bring a slide of is the Phase I study, and I think there are three tremendous problems with Phase I studies as traditionally done. When I say, "Traditionally done," I mean they are done with a so-called "three plus three rule" or if zero of the first three have toxicity, you go and you put on the next three at the higher level. If one of the first three has toxicity, you put three more on.

First of all, and the statistical literature is full of this kind of thing, if you look at the operating characteristics of the so-called "three-plus-three" design, and by operating characteristics, I mean let us say the goal is you want to produce toxicity in 20 percent of the patients. You feel that is necessary to have an antileukemia effect. If you look at the likelihood that the three-plus-three design will actually accomplish that, it is startlingly low. For that reason certainly the modern trend, and hopefully the NCI will get more into this, is the idea is have to do CRM designs which are essentially Bayesian designs that I will speak about in a second. The operating characteristics of those designs for identifying the dose you are interested in is much better.

That is one problem with the Phase I study, but to me that is a relatively minor problem with the Phase I study. To me the major problem with Phase I studies is that in testing antileukemia agents, we are forced to start at doses that are way too low.

I mean whenever we have looked at this, what we have found is invariably by the time you get to the dose you are going to use in the Phase II study, you have gone through four, five, six escalations of drugs. Essentially, what is done is they take the solid tumor MTD, and they say, "Oh, here is where we are going to start the leukemia study," and despite the vast corpus of information that this leads to inadequately treated patients, patients develop no toxicity. That is great, but there is no response either in patients with relapsed AML, and I think that is something that perhaps we could talk about later. Are the doses that we start these drugs at, are they too low?

I think it is fine to say, "First do no harm," but I am not sure that that aphorism is applicable to the type of patients that we are here to talk about today. Of course, the third problem with these Phase I studies, and to me it has received remarkably little attention, is that there is no attention paid to the heterogeneity. It is assumed that the patients are homogeneous. So if a 66-year-old man has toxicity, that is regarded in exactly the same way as a 23-year-old patient who had toxicity, when clearly the likelihood of toxicity in these two groups of patients must be vastly different.

I think there really are issues to discuss in the way that we do Phase I studies, and hopefully maybe we can return to that.

What I would like to talk about now, though, is something that I want to spend a few minutes on which is the selection design that we have been working on at M. D. Anderson.

Basically, the rationale for it is what you saw before. It is that there are many new ideas to test. I listed them, and you will hear about them this afternoon. I hope I don't sound too nihilistic or anti-intellectual or whatever the horrible word is, but in my opinion you always need clinical data to identify the best idea. That is, before you begin the trial, it is impossible to begin. It is impossible to predict which of that whole list of things is going to be better than the other. All you have to do is look at history. In my experience at M. D. Anderson, one of the three things that have really made the most impact in terms of lives of patients is 2CDA in hairy cell leukemia. To this day, no one knows why 2CDA works for hairy cell leukemia and works less well for other diseases.

In fact, it was not predicted to work as well for hairy cell leukemia as in these other diseases. The second is interferon which obviously has made a difference in the lives of people with CML and yet today there is still debate as to why interferon works. Who is to know if you had no preclinical rationale, whether this drug would be selected for testing today? Then of course, the third is ATRA, and I think there is still debate as to which came first; was it the biologic idea? Ah, we know about PML/RAR alpha, let us use ATRA; or was it the empirical observation of Chinese investigators that led to the biological interest in ATRA?

At any rate the point is that you need clinical data to identify the best idea.

Obviously there is a limited number of patients even at M. D. Anderson.


TOP

Slide 15:

So basically we are forced to decide. We can say, "Okay, we can study two or three things and have a relatively large number of patients," or the complementary thing is you can study a relatively large number of things and have a relatively few number of patients. If you really believe this, that you cannot tell before you treat the patients, then it follows that the worst false negative is an idea that is not investigated. What would have happened if somebody said, "Ah, CDA, hairy cell leukemia; no biological rationale; forget it."? So on the basis that the worst false negative is an idea that is not investigated, what we have done is said that we want to investigate as many things as we can. We have come up with the help of Peter Fawn in our biostatistics department a pre-Phase II or selection design that I am going to illustrate with regard to a study that we have actually run. In fact, what you will see with this design, the probabilities of identifying actual treatment advances are good.

One of the things that maybe we can return to in the afternoon is if you really believe this, is the wisest strategy for drug development in the United States for all three cooperative groups to do huge randomized studies investigating one idea, or should there be more pilot studies of the type that I am describing? I am not really sure of the answer, but I think certainly it is something that needs to be discussed.

Okay, these things are Bayesian principles, and I think this is something that people are going to need to come to grips with. As time goes on, there is going to be increasing application of Bayesian statistical methods, and not really to bore anybody, but one of the things that people don't really appreciate about the classical or frequency test method are the problems with it. There are two articles that I would refer people to. One is in June in the Annals of Internal Medicine, an editorial about Bayesian methodology from Johns Hopkins, and the second is an article in an obscure journal called the American Scientist by Dr. Donald Barry who has just come to M. D. Anderson. He has made his living in Bayesian statistics as one of the foremost authorities in the world, and the article in the American Scientist that Dr. Barry wrote is called Statistical Illusion and the Statistical Analysis and the Illusion of Objectivity. People very frequently don't realize this illusion that they are operating on. A very, very simple thing here in classical P value statistics is that the analysis is tied to the design, and when you try to explain this to people, sometimes they can have real problems.

Let us say I have data that the CR rate in one group is 8 out of 10 and in the other, it is 6 out of 10. Somebody would say, "Okay, there is the data," but if you interpret it in classical statistics, it would matter to you to know what kind of trial did you plan; did you plan to enter 30 patients and look once; did you plan to enter 30 patients and look twice? The interpretation of that same data, 8 out of 10 versus 6 out of 10 or whatever, would depend on your plan even though the data is exactly the same. When you say that to people, they have a hard time understanding that. One of the nice things about Bayesian statistics, it takes us away from some of these problems and these illusions of objectivity that people feel are tied to statistics.

 

TOP

Slide 16:

That is all I will say about that, and now, what I am going to do is try to illustrate our pre-Phase II selection design that I talked about in the context of no drug too stupid to test to quote Dr. McCullough in something that he told me. Basically what the design is is that you establish a prior. So, for example, in the study that I am going to tell you about, which is a randomized trial of four agents in patients with newly diagnosed AML characterized by unfavorable karyotype, they had an early CR rate, and I will get back to that in a second, of around 48 percent. That was the prior.

Then you have to select the treatments, and no matter how great the selection design is, you still have to select treatments, but at least you are going to try more treatments. So you select the treatments ,and obviously if you are going to randomize, the prior has got to be the same for each. The prior basically is: okay, a priori, what do I expect this agent to do, and then you randomize the treatments. As each response is known, you update the prior. There is none of this I have got to wait for 40 patients before I look at the data. You stop their treatment arms early. Otherwise you enter a fixed number of patients, and then you select the best treatment for confirmatory studies.


TOP

Slide 17:

Okay, so here is our chemotherapy plus or minus thalidomide study. I have to show some hypotheses that we are interested in looking at which are very important, and I will get to that in a second. Basically, we had a little bit of data that suggested that increased angiogenesis was associated with failure of chemotherapy. You sort of felt that increased cellular levels of VEGF were associated with failure of chemotherapy. That was published, and even though VEGF can have a plethora of actions, it seemed that that was one possibility, and then there is something to suggest that thalidomide can inhibit angiogenesis. Maybe it would do that in AML -- you know, I use MDS and AML interchangeably sometimes -- and then that such inhibition would enhance response.

 

TOP

Slide 18:

The vehicle that we used to test this was this study in patients with abnormal karyotype with untreated AML or high-risk MDS. Patients were randomized to these four treatments. So here is liposomal daunorubicin and Ara-C, liposomal daunorubicin, Ara-C and thalidomide, and you can read it quicker than I can, lipo-dauno plus topotecan, lipo-dauno plus topotecan plus thalidomide. So in accordance with what I asked Dr. List, we said, "Okay, well, at least in these patients who do so poorly, no Ara-C for them," and the

 

 

TOP

Slide 19:

stopping rules for the study were this: We would enter a maximum of 20 patients in any arm, and we would stop, for example, if the early CR rate out of a number evaluated was say less than or equal to one out of five.

When people see this, they say, "My God, how can you learn anything from this study? The numbers are so small." So we are randomizing among four arms, and let us say the true early CR rate is obviously just the rate you would see, the true answer if you had a billion, an infinite number of patients. This is the answer, and the historical rate is 48 percent. So let us say that among these four arms, three were just the same as the old 48 percent. One was 68 percent. What you do is you run computer simulations. You can do 10,000 simulations in a matter of 30 seconds, and the program is freely available. It is called Multi 98, and you can design trials just like this very simply. What comes out of this, and under this circumstance you want your design with the lesser equal to one out of five or all that stuff, all the technical details. What you want is you want that design to identify before the therapy, to select before the confirmatory therapy, the arm that is in fact 68 percent. The probability that that will happen is 75 percent which are the typical figures you come up with.

Now, people say, "Well, you know, gee, that means in fact the power is only, if you go to the classical terms, is only around 70 percent," and most people are used to 80 percent. That is the magic figure, 80 percent or 90 percent.

 

TOP

Slide 20:

In fact, if you look at it from the way that we look at it, you say, "Okay, if I didn't know which of these four is best a priori, and I am just not going to do this trial, I am just going to pick one of these four from big Phase II studies, the typical Phase II study with 50 patients." My probability of being wrong assuming that I cannot tell beforehand is three out of four, 75 percent. So the 30 percent false-negative rate competes with the 75 percent false-negative rate.

Now, one of the problems with the design is let us say all four are the same. Presumably then you are not terribly interested in one. The probability that you pick one is around 50 percent. It is one minus this. So basically it has a high false-positive rate, but of course that is one of the things you get. You cannot have your cake and eat it, too. You cannot treat a relatively small number of patients and have very low false-positive and very low false-negative rates. At M.D. Anderson, our philosophy is we would much rather have a false positive than a false negative. This is something that we can discuss later, and I am not sure I agree with that, but you know, I have become a believer.


TOP

Slide 21:

So what are some of the problems with the selection design as I illustrated it. As I say, this is something that to me would be something that people could talk about and might form alternatives to the very large Phase IIB studies with 50 patients and, in particular, the 300-patient randomized studies where you are looking to detect a difference of 10 percent where the control is 20 percent and you want to get that to 30 percent.

At any rate, the problems are with false negatives, and we have dealt with that. A major one is homogeneity of prognostics. What happens if your first five patients by chance are all over 80 years old? We are learning how to deal with that, but another problem is loss of information, loss of prognostic information.

 

TOP

Slide 22:

This is something that really needs to be said. One of the things that we are trying to do in the study is collect information about number of blood vessels and plasma VEGF levels and cellular VEGF levels, and obviously, if we stop the studies early, then we won't get as much information as we might. This is something that obviously needs to be said and I am sure everyone realizes it, but because I feel very strongly about it, I will say it again, that all the results in here are just average results. We obviously need to get to the point where we look at AML as pneumonia. If you had tried INH in all patients with pneumonia, it would be a bust, but obviously if you had tuberculosis pneumonia, it would be a big success. So we have got to learn to define these subtypes that respond to these therapies better than we are doing. Having spent my whole life looking at these clinical prognostic factors, I have come to realize this is what you have got to look at. With this design it is certainly possible that you won't get the information that you would like to get.

I can just tell you because in our great study here with the four arms, if you remember, the lipo-dauno-Ara-C, the lipo-dauno-Ara-C-thalidomide, lipo-dauno-topotecan and lipo-dauno-topotecan plus thalidomide, despite my advice that Dr. List try these therapies without Ara-C, it turned out in the arms without Ara-C, the CR rate was zero out of six, and what that led to was stopping the study because the probability that the response rate would be the desired 68 percent was something like 1 percent.

So obviously we had to stop and that puts us in a real bind because people would say, "Well, gee, here you are not gaining biologic information on these particular therapies,i but on the other hand you are faced with a situation where the likelihood that it is an improvement is so low that your hand is sort of forced.

 

TOP

Slide 23:

So the last little thing that I would like to talk about is the definition of response, and in particular the definition of CR. Obviously, everyone knows that survival is the most objective of these end points, but on the other hand, everyone realizes that they would like to have things before survival, and the thing that people have looked at is CR. There can be two reasons for looking at something like CR: one, because it actually translates into survival which is what my prejudice has always been, but in a lot of the discussions I have had, especially with Dr. Applebaum and Dr. Ken Harsch, they have made me realize that there is another thing, too. It can demonstrate that the drug actually has some activity whether or not it prolongs survival, but what I have tried to concentrate on in the way we define CR is does the actual definition translate into an improvement in survival.

This is just to remind people of stuff that was done by Dr. Freireich 40 years ago, and here are these AML patients and they are looking at how long the response lasts according to the type of response that they had. The CRs do better than the others, and for many years there was the dogma that this reflected the fact that the CRs would have done better even if they never had been treated. They were just more favorable. It has nothing to do with the chemotherapy.

 

TOP

Slide 24:

Dr. Freireich was pressing it, and he probably figured out that 30 years later somebody might criticize him. So what they did in this paper was in fact they said, "Okay, let us look at the time that the people who get a CR live according to whether they are in remission or not," and so here in the blue is the CRs -- how long they lived. Here in the pink is the no CRs and how long they lived, and in the yellow, or whatever that is. They subtracted the time that the CR patients spent in remission, and when you took away the time that they spent in remission, their survival was exactly the same as the people who never got a remission which of course suggests that the reason that these people lived longer was not because they were inherently better but because in fact the chemotherapy produced a response, and the improvement in survival was due solely to the time they spent in that response.


TOP

Slide 25:

The definition of CR stayed static for many years, and then finally in the early 1980s people began to realize that if you went into CR in two courses, this was not as good as if you went into CR in one course. That led to things like I talked about earlier perhaps not giving people a second course because those remissions were not great in terms of what you were interested, the patient living. Then we have begun to address the concept of binary CR a little bit more. This is something that was just published. Here we are looking at 1101 of our patients and now we are just focusing on first CR.

So these are people in whom response is known after the first course. There are 1101 of these patients, and 741 of them went into CR, and we will call the time that it took them to go into CR TC. So that is just 20 days, 30 days, whatever, and here are the patients. So 71 of these patients are arbitrarily called resistant, etc., and 299 died. So here we have got TC, and we have got TR, and then the time from CR to death, we will call TCD. Of the 740 who went into remission, five hundred and whatever subsequently died. We are interested in TCD over here. The purpose of what we were interested in trying to do was to see the relationship between TCD which is the time from CR to death and TC which is the time to go into CR after we adjust for the important covariates that predict response.


TOP

Slide 26:

The analysis is shown over here, and the bottom line of all this literally is the bottom line because basically after you adjust for all these covariates what you see is that there is a very negative association between TC and TCD.

In other words, the longer it takes you to go into remission, the shorter the subsequent remission. Of course, what this suggests is that the idea that CR is a binary end point, that it is yes or no, is not correct.

 

TOP

Slide 27:

In fact, if you look at this graphically, what you can see over here is this, and here we are looking at the time to CR or resistance, and here we are looking at the median subsequent survival.

For example, let us say three people took 30 days each to go into CR. One lived 1 year after that. One lived 2 years after that, and one lived 3 years after that. The median would be 2 years, and you plot it up here. You can see that -- and here are the CR patients, and here are the resistant patients -- you can see that as the time to go into CR increases, there is a very, very dramatic fall in their subsequent survival after they get a CR such that by the time you get to about 45 days after you get the CR, the survival of these patients -- their subsequent survival -- is more like patients who are resistant than patients who got an earlier CR.

Thus, these CRs we call cosmetic, and this has led us to focus on this concept of early CR. The idea is not only should you get a CR in one course, but you should get it within 6 weeks of starting that course because otherwise it is a cosmetic CR. You may be in CR, but the fact is you are more like a resistant patient.

This thing is open to criticism on several grounds and the most telling actually was addressed to me by Dr. Lowenberg somewhere. He said, "Gee, you know, Eli, it is sort of a self-fulfilling prophecy, because you know if you keep waiting for them to get into CR and you don't treat them, you know that is what is going to happen," and I accept that criticism.

One of the things we have done is to look at it in a subset of people. Let us say you go in with 35 days and you still see the same relationship. The other thing that people could say is that this is with a particular strategy.

For example, if we gave people double induction, we might not see this same relationship, or against giving them double induction, is we would like to see what the first course does and have some idea of their sensitivity before just sort of blindly going into the second course.

At any rate, that is getting away from the field, and so I would like to leave you with the idea that there are many new things to test -- not necessarily limited to topotecan or fludarabine -- but I think if there are so many things to test, we need to think about should we test them in a wider variety of patients than we have tested them in before, and should we come up with new statistical models or paradigms or whatever, in particular in Phase I and early Phase II to test these drugs? Thanks for your attention.

DR. LARSON: Thank you very much, Eli.

I think we will break for lunch now. I would like to thank all of the speakers this morning for their outstanding overviews, and we hope we can continue these discussions in the afternoon sessions.

We would like to reassemble at 1 o'clock to begin the breakout sessions. The first working group on therapeutic resistance will meet back in this room at 1 o'clock, and simultaneously the working group B on antibody-delivered therapy will meet right across the hall, also at one.

The second set of simultaneous sessions will meet at three-thirty.

 

TOP