Introduction

The overarching aim of treating cancer is optimally cure. The term cancer cure refers to an enduring complete clinical remission of a cancer, regardless of the presence or absence of late sequelae of treatments, for the duration of the patient’s natural life span [1]. In order to measure potential cure according to that definition, overall survival (OS) is considered the most important endpoint in clinical trials. However, OS has shifted from being the primary to secondary endpoint in most clinical trials in hematologic malignancies due to the often very long observation periods being required [2]. As an example, the recent advances in the treatment of indolent lymphoma result in very long survival times for most patients, reaching the normal life expectancy in some subgroups. Hence, OS as the primary endpoint is practically difficult to measure within the timeframe of an individual trial in this disease entity because follow-up times of 10 years or longer would be required. Besides OS, health related quality of live (HRQOL) plays an important role as an outcome parameter in indolent lymphoma, noting this may also be affected by patients’ comorbidities with increasing age. These aspects underline the importance of defining surrogate endpoints as helpful tools to provide early estimates of expected improvements in OS.

Though there are many examples of clinical trials demonstrating that - in particular during frontline therapy - prolongation of progression-free survival (PFS) resulted in a prolongation of OS [3,4,5], longer PFS is not always a reliable adequate surrogate parameter for OS [6, 7], partly due to the increasing number and efficacy of options for salvage treatments.

Additionally, being a victim of its own success, even PFS differences are becoming increasingly difficult to capture in indolent lymphoma and therefore the use of an earlier and possibly even more precise measure of treatment success is enticing to trialists as well as pharmaceutical companies: measurable or minimal residual disease (MRD) measured from peripheral blood or bone marrow has all the theoretical and biological plausibility to fulfill just this promise: it distinguishes up to a level of 1 in 10.000 cells the (remaining) presence of a malignant cell and has a higher precision than imaging response in indolent lymphoma [8, 9]. Lower levels of MRD should therefore inevitably lead to longer time to progression, at least on a trial/patient cohort level.

Although the prognostic impact of MRD on survival has been well established in CLL [10] and FL [11] and its use as an intermediate endpoint for clinical trials has been accepted by FDA and EMA [12], the data on its use a surrogate endpoint is scant within this context. Interestingly, on April 12th of 2024 the oncologic drugs advisory committee (ODAC) of the FDA has voted for the use of minimal residual disease as a surrogate endpoint for accelerated approval of new treatment for multiple myeloma [13].

In this perspective response parameters such as MRD, but also complete response and overall response as well as PFS as surrogate endpoints in indolent lymphoproliferative disorders are discussed with respect to their validation, potential use and limitations.

Surrogate endpoints and their validation

A surrogate endpoint as accepted by European health care stakeholders (European network for health technology assessment (EUnetHTA)) [14] and the federal drug administration (FDA) [15] is an intermediate endpoint which is not only prognostic for the true endpoint but also captures the extent to which a treatment influences the true endpoint.

To formally validate surrogate endpoints different statistical approaches have been proposed, which mostly rely on a two-level meta-analytic method [16, 17], in which both patient and trial level correlation of surrogate endpoints are calculated, ideally based on individual patient data.

Patient level correlation describes the degree with which a surrogate and the true endpoint are associated on an individual patient level or more specifically the prognostic value of the endpoint (Fig. 1). Trial level correlation is however vital to validate a surrogate endpoint and its use in the setting of clinical trials: It reflects the association of treatment effects on both endpoints, e.g. the correlation of the hazard ratio (HR) of PFS with the HR of OS in a randomized trial. Ideally, if validated through a meta-analytic approach, the effect on OS can be estimated through the impact on the surrogate or surrogate threshold effects [18] can be determined, i.e. the threshold HR PFS from which an HR OS difference can be assumed.

Fig. 1: Examples of validation for surrogate endpoints.
figure 1

A End of treatment MRD status shows a correlation with overall survival on a patient-level, i.e. it has significant prognostic value. B The odds ratio of reaching undetectable MRD across multiple trials is strongly associated with the hazard ratios of PFS, i.e. trial-level correlation. Figure adapted from End Point Surrogacy in First-Line Chronic Lymphocytic Leukemia. Florian Simon et al., JCO 0, JCO.24.01192 [22]. Printed with permission.

However, there is still no formal international consensus on the optimal statistical method with which both effects should be measured, and thresholds remain arbitrarily defined. While some statisticians argue that any value above 0.5 for correlation coefficients between surrogate and true endpoints might suffice, recent publications have set thresholds above >0.8 for strong and <0.6 as weak correlations [19].

Additionally, limitations in the application of the two-level approach remain, as they do not account for differential censoring of data and rely on a large number of trials/trial comparisons [20] and especially events to precisely define treatment effects, an increasingly difficult requirement in the setting of indolent lymphomas.

Surrogate endpoints in mature B-cell neoplasms

In CLL, the most commonly used surrogate endpoints in phase 2 and 3 clinical trials are overall and complete response rates, rates of undetectable measurable residual disease and PFS (Table 1). Within a set of chemotherapy trials, a high patient-level correlation of 0.8 was shown between PFS and OS [21]. Similar observations were made with chemo/chemoimmuntherapy and limited-duration targeted therapy with patient-level correlation of >0.8 [22]. Trial-level surrogacy between PFS and OS with targeted therapies, including continuous BTK inhibition, was >0.7, indicating a moderate correlation between PFS and OS.

Table 1 Available data on surrogate endpoints in mature B-cell neoplasms.

In the context of follicular lymphoma (FL) the best validated trial-level surrogate for long-term PFS is the proportion of patients remaining in ongoing CR at 30 months after initiation of their frontline therapy (CR30) [23]. It was prospectively identified by the global Follicular Lymphoma Analysis of Surrogate Hypothesis (FLASH) consortium from their individual patient data (IPD)-based analysis of 13 chemo-immunotherapy trials with 3837 patients with a strong correlation of R2 of 0.88. CR30 was also validated as a robust surrogate for PFS at an individual patient level and had superior performance to CR24. In the analysis by Shi et al., a threshold of 11% absolute improvement in CR30 from a 50% control rate, predicted a significant treatment effect on PFS, demonstrating its applicability as surrogate endpoint for PFS [23]. Although, CR30 has not been examined as a predictor for OS in FL, it is currently implemented as a co-primary endpoint in several ongoing trials in frontline FL. On the other hand, although widely accepted as a strong predictor of adverse outcome, POD24 did not perform well as a surrogate for OS in either FL [24] or MZL [25].

In advanced stage marginal zone lymphoma (MZL), CR24 (as distinct from CR30) has been explored as a potential surrogate for PFS using IPD from the 401 patients in the IELSG19 3 arm study and found to be a robust surrogate for 8-year PFS rate [26].

While the US FDA policies state that they are supportive of considering regulatory submissions in lymphoma based on “durable response” rates and some recent approvals of the covalent BTK inhibitors ibrutinib and zanubrutinib in Waldenstrom macroglobulinemia have cited this criterion, the exact definition of “durable” and what proportion of responses are required to show such durability are unclear, and the formal surrogacy of this endpoint has not been established [27].

MRD as a surrogate endpoint

CLL

In CLL, response endpoints like undetectable MRD rates were strongly associated with PFS on a trial level (R > 0.8), thereby suggesting its utility as an intermediate endpoint. However, the correlation between MRD and OS was modest with an R of 0.71 but limited by a small number of datapoints for these events. Definitive conclusions on MRD surrogacy for OS can therefore not yet be concluded [28]. Conversely, ORR does not correlate with OS across different treatment modalities [29]. Overall, the available evidence supports the use of time-to-event endpoints as primary outcome measures in randomized CLL studies, while use of response endpoints requires further surrogacy validation.

Indolent lymphoma

While optimal methodology and international harmonization of MRD assays in indolent NHL lag far behind CLL, these do have some emerging promise as potential surrogate endpoint in follicular NHL [11] based on an analysis of long-term outcome in the GALLIUM study and using PCR of t(14;18) translocation and/or clonal Ig rearrangement at a 10−5 level of sensitivity, showing an overall strong association with PFS, but with differential impacts seen across different treatment regimens (NOT prognostic in obinutuzumab-treated patients), underlining that further evaluation is necessary before this could be generalized. In that same study, PET response status at end-of-induction therapy added to the prognostic performance of MRD status, supporting further exploration of PET response parameters as another potential surrogate.

Conclusion

Especially in mature B-cell neoplasms OS difference is increasingly difficult to capture within the practical timeframe of clinical trials. For example, CLL14 has shown clear superiority of venetoclax + obinutuzumab vs. chlorambucil + obinutuzumab in terms of PFS but OS differences have so far not crossed significance boundaries even after 6 years of follow-up [30].

When trying to validate endpoints, even with large datasets, correlation of surrogate candidates such as PFS, Time to next treatment (TTNT), response rates (CR30) or MRD with OS is difficult to evaluate due to few events as well as the efficacy of relapse treatments, especially in the setting of novel agents. The recent unanimous ODAC-vote on the use of MRD as a regulatory endpoint in multiple myeloma based on a similar amount of data to that available in CLL however, shows the trust which trialists put into this promising response assessment. Additionally, novel approaches to assess MRD with a higher resolution such as NGS-based/CAPP-sequencing approaches, which showed higher prognostic value [31, 32], might eventually lead to the granularity needed to validate MRD as a true surrogate endpoint for OS.

Finally, even if analyses ultimately show that there is only a moderate or weak correlation, PFS or MRD and their impact on patients’ quality of life should not be underestimated, especially when a long-lasting remission is achieved through a well-tolerated time-limited treatment.

Combined efforts across the scientific community are required to enable the analysis of large, aggregated datasets to strengthen the significance of surrogate parameters.