Matthew Sydes: Thank you very much for having me here this morning. I’ve been asked to give a statisticians perspective on it. Being a statistician, I’m one of very few in the room with you today. For those of you who don’t like listening to statisticians speak, I rather than like this picture, please allow yourself to just get wander off into the woods with a heron for a couple of minutes. For the rest of you, I’m going to cover these topics here. And just to remind you that statisticians can have disclosures too, here are mine.

I just wanted to summarize. I find it quite useful to visualize the trials when they happen, that are contributing to this discussion about what you’re doing for these patients. I’ve just sort of drawn out the number of patients, you’ve got the time at which they were recruited and when these trials have reported. Now notice to whether they reported early, because sometimes when a trial reports early, it’s been provoked to report early, then next time it reports, the effect size is usually a little bit more modest. We should always be aware of that.

You’ve got trials that have reported long term and you get longterm data from STAMPEDE at ESMO next month. You’ve got a number of trials here. As Chris said, as you’ve heard, STAMPEDE was metastatic and non-metastatic, but the number of metastatic patients is about the same as for CHAARTED and GETUG-15 added together. But you see there’s a number of small differences at least down the side. And then you’ve got data coming through from two years ago in looking at adding abiraterone in your metastatic hormone-sensitive patients. Again, 1,000 and 1,200 patients across these. One reported early, one reported at the planned event but planned to report together and published together. You’ve now got the longterm data from LATITUDE out as well.

How do you know the effect of these? Well, you can do a meta-analysis. Many people publish meta-analyses on these. I’ve chosen to report the data from STOPCAP because they did a brilliant job of engaging all of the trial teams in reporting this information. You can add up what you see in docetaxel and you get a hazard ratio of 0.77. You could have add up the abiraterone trials and you see a hazard ratio of 0.62 and it’s very easy to just stop there and say, “Well of course abiraterone shows a larger effect. That’s what we should be using.”

But actually you want to understand the data in a broader context I think. You need to look at a network meta-analysis. In Claire Vale’s paper in Annals of Oncology, this is what they did. They drew out the network and so you can look at the docetaxel nodes that joined together. You can look at the abiraterone nodes that join together and in a network what you’re trying to do is kind of triangulate from one to the other. Ideally, if you’ve got a network, you’ve got multiple triangulations going on. They were able to produce a table based on the published data that was there of a probability of which was likely to be the best treatment in terms of overall survival. And abiraterone from the published data came up with a 94% probability of being the better treatment there. Docetaxel was likely to be the second or third treatment. Interestingly, the zoledronic acid, celecoxib data that came from STAMPEDE, don’t forget that. Actually there was a little interesting in metastatic patients, but you can be pretty sure that giving ADT alone was the least favorable of the treatments.

Now you’ve got new data that I don’t think have been put into any networks, in TITAN and in the ENZAMET studies, different drugs. But you’re thinking about these within this context. One industry trial, one academic, both reported early, very similar size to these trials as well.

I’ve tried to draw out for you what a future network is going to look like. I haven’t put in all of the trials that could be there of course. You’ve still got the docetaxel data there. You still have the abiraterone data there. If you’re triangulating, you’ll want to remember that radiotherapy data from STAMPEDE and HORRAD, it’s going to be important to triangulate for that at least in low volume patients. And so mostly, those triangulation points you’re going from one point to another. You’re always going through something else. But you’ll notice that there is this, two of the research nodes here are joined together. There is a little data from STAMPEDE of docetaxel and abiraterone patients randomized contemporaneously.

It’s small but it’s the only direct head to head data that you’ve got in this setting. It has value in that approach. The data, the activity framework that many of you will have seen a hundred times from STAMPEDE over on the left, the docetaxel or data just including the patients in yellow, so mostly metastatic patients. 1,800 metastatic and non-metastatic patients. You’ve got the information there, you’ve got the abiraterone patients are the ones that were in that report and you’ll see that in the middle we’ve got the patients that contribute to that comparison. Just the docetaxel and the abiraterone arms. And of course, the characteristics of those patients sit somewhere in between those two, the docetaxel and the abiraterone papers. For docetaxel, STAMPEDE alone reported a hazard ratio of 0.78. In abiraterone, 0.63. That’s all comers.

What did we see when we looked at docetaxel versus abiraterone, those head to head data? Well, actually you see a substantial advantage, a clear advantage for abiraterone in terms of failure-free survival. But as you work towards a longer-term outcome measures, it becomes clear that there seems from the head to head data that there’s no obvious difference between them, which I think is really rather interesting. The toxicity profiles you want to consider are the disease state your patients are in.

Now why is that the case? Why is it that quite different differences that we saw in those two comparisons actually don’t seem to translate into something so clear in that little bit of head to head data, that bonus comparison data from STAMPEDE? Well, the eligibility criteria from STAMPEDE are pretty much unchanged in 15 years. But there have been little subtle shifts in the patients joining the trial. When we stopped recruiting the docetaxel comparison, people became a little bit happier to recruit the older patients into the trial, for example. There’s an age shift, there’s very little shifts that are in there.

Even within one trial, although you can define your outcome, you can define your eligibility criteria, who people recruit to the trial is beyond your control. But you’ve had bigger shifts I think in what happens to patients around trials as well. Particularly in terms of second-line therapy. If you relapsed in 2004, what happened to your patient is quite different to what happens to your patients if they relapse today, for example, in what’s available. You’ve got that context. The era is important. The key message, if you take nothing else away from today, if you find it difficult to make comparisons between two papers from one consistently run trial, how careful do you need to be when you’re looking at papers from different trials and trying to interpret messages that they’re making?

If we look at the STAMPEDE and the STOPCAP data, STAMPEDE, that is just the metastatic values on the right-hand side from our Annals of Oncology paper, the STOPCAP data on the other side. In terms of failure-free survival, these two ways of estimating come out with quite a similar hazard ratio of .56, .59 but for overall survival, they do seem to tell different messages. Why is that? Well, they’re two different approaches to trying to address the same problem. The STAMPEDE is direct evidence of 566 patients. It’s a short time window. It’s not today, it’s from a few years ago. It’s consistent assessment methods. STOPCAP is indirect. It’s multiple triangulation. It’s 6,000 patients. It’s bigger, but it’s a longer time window. And you’ve got to juggle patients from multiple trials.

What else are you going to consider when you’re looking at your future network? Well, you’re going to think about that backbone of care because now patients are having docetaxel in some of the instances, so ENZAMET, and TITAN, many of those patients did not have docetaxel but some did. They will be different nodes in a network. You’ll need to think about that previous local therapy and the burden just as Chris mentioned. You know that that’s a different distribution in TITAN to in ENZAMET, to in both of the comparisons we’ve reported from STAMPEDE. The use of prior local therapy, previous local therapy, and metastatic burden seems to be linked together. They seem to be quite different for example, in CHAARTED and GETUG-15. There’s other things you’re going to want to think about. Not everything is reported yet at least by metastatic burden. A few nodes from the network will drop out. I can update you, Chris, that the abiraterone volume data from STAMPEDE was published in European Urology last weekend and you’ll see the docetaxel volume data ESMO in a few weeks.

You’ll think about other control. ENZAMET had first-generation antiandrogens in the control arm. You want to think about the timing of recruitment. Docetaxel, the timing of randomization was different in ENZAMET and TITAN as I understand. The timing of assessments will be different. The length of reporting, but actually the utility of reporting, how people report things varies a little. I’d be remiss if I didn’t try and nag you about one thing while I was up here on this stage.

Let me just show you just to be cautious for a second TITAN, it’s I picked on this trial just because it’s very recent, not because everybody seems to do this. It’s very short data. You’ve got the members at risk. You can see there’s 1,055 patients in the trial. They don’t all appear at time zero here. But you can see by the time we get to two and a half years, there’s only 30 patients in the risk zone, nobody has made it to three years. These are very immature data.

You can see the majority of patients seem to disappear between one half and two years. That reflects when recruitment stopped. I wish we had the number of events or the number of censorings here. Somehow New England Journal didn’t insist on that being that put there for some reason. Our communicates survey was accepted yesterday in BMJ open. Many of you may have participated in that. The results come out, I hope will transform the way that we present numbers at risk and events underneath these curves. There’s any journal editors in the room who want to talk to me about this, I would love to talk to you.

But they’ve also read across, this is what I want to grumble about. Right across from the Y axis, they’ve told us a median survival. You can see your curves get nowhere near this so you can not estimable. It leads to this kind of useless double column in this figure that actually tells us that you just see a whole table of not estimable. It’s not helpful. Please don’t do this. Actually just going back to the artwork. I think these two people have a wonderful look of disapproval. Don’t make them angry. One of them has a pitchfork. But they have also given us data at two years. They have read up as well and this is the way we should do it. Pick a time, pre-specify a time, read up from the curve. That’s a much better way to do this. These two people would approve I think.

What you really need to pick this apart is individual patient data. It’s difficult to do this from aggregate data. The STOPCAP team is on the case. They are going to help you with this. A brilliant group. I won’t go into the adverts, but statisticians come to a lot of these clinical conferences. You should come to methodology conferences. There is great stuff here for you. Anybody interested in data monitoring committees, we run a lovely course on this.

Let me in conclusion, those who are in the woods come back into the room. You’ve got an increasing number of trials in metastatic hormone-sensitive prostate cancer and more to come. It’s a great time for you all challenging but great. You’ve got very little direct head to head data. It’s all about the triangulation for you, so you need to be cautious interpreting your papers. Just be humble, I think is what we’re told to be. Humble in our scientific approaches, how we interpret our data. Consider all of the characteristics and help each other with clear outcome measures. Thank you very much for your attention. Thank you for having me.