Background – For men on active surveillance for prostate cancer, biomarkers may improve the prediction of reclassification to a higher grade or volume cancer. This study examined the association of urinary Prostate Cancer Gene 3 (PCA3) and TMPRSS2:ERG (T2:ERG) with biopsy-based reclassification.
Methods – Urine was collected at baseline, 6, 12, and 24 months in the multi-institutional Canary Prostate Active Surveillance Study (PASS), and PCA3 and T2:ERG levels were quantitated. Reclassification was an increase in Gleason score or ratio of biopsy cores with cancer to ≥34%. The association of biomarker scores, adjusted for common clinical variables, with short and long-term reclassification was evaluated. Discriminatory capacity of models with clinical variables alone or with biomarkers was assessed using receiver operating characteristic (ROC) curves and decision curve analysis (DCA).
Results – Seven hundred and eighty-two men contributed 2069 urine specimens. After adjusting for prostate-specific androgen (PSA), prostate size, and the ratio of biopsy cores with cancer, PCA3 but not T2:ERG was associated with short-term reclassification at the first surveillance biopsy (OR = 1.3; 95% CI 1.0–1.7, p = 0.02). The addition of PCA3 to a model with clinical variables improved the area under the curve from 0.743 to 0.753 and increased net benefit minimally. After adjusting for clinical variables, neither marker nor marker kinetics was associated with time to reclassification in subsequent biopsies.
Conclusions – PCA3 but not T2:ERG was associated with cancer reclassification in the first surveillance biopsy but has a negligible improvement over clinical variables alone in ROC or DCA analyses. Neither marker was associated with reclassification in subsequent biopsies.
Introduction
Active surveillance (AS) is a management strategy for many clinically localized prostate cancers that allows men to delay or be spared the potential morbidities of treatment1. Cancers that appear relatively low risk at diagnosis are monitored, typically with regular clinical exams, serial prostate-specific antigen (PSA) measurements, and repeat prostate biopsies. However, uncertainty about the possibility of occult aggressive cancer is one factor tempering widespread adoption of AS2,3. Furthermore, optimal surveillance schedules and triggers for intervention have not yet been established1, resulting in substantial variation in the practice of AS4. Biomarkers that are collected noninvasively and that improve the prediction of aggressive behavior could improve the utilization of AS and inform decisions about the intensity of surveillance regimens.
During AS, treatment is usually recommended when higher grade or volume disease is found by biopsy. Biomarkers that detect the presence of occult high-grade or high-volume disease or that predict future reclassification to high grade or volume cancer could have substantial clinical utility. Importantly, biomarkers should incrementally improve upon models that are based on commonly available clinical variables.
Materials and Methods
Study Population
The multi-center Canary Prostate Active Surveillance Study (PASS) enrolls men diagnosed with clinically localized prostate cancer who chose to use AS to manage their cancer. Men provide informed consent under institutional review board supervision at nine centers (clinicaltrials.gov NCT00756665). Under the PASS protocol, PSA is measured every 3 months, clinic visits occur every 6 months, and ultrasound-guided biopsies are performed 6–12 months and 24 months after diagnosis, and then every 2 years. Specimens, including post-digital rectal exam urine, are collected at study entry and every 6 months. Follow-up data were collected from September 2008 to February 2017. Men were included in the analysis if they had Gleason score ≤ 3+4 and <34% ratio of biopsy cores with cancer to total cores collected prior to urine collection. Men were excluded if they enrolled in PASS >5 years after diagnosis or if they had no on-study biopsy for endpoint determination.
Biomarker collection
Urine samples were collected, assays performed, and biomarker scores calculated as described previously9. For this analysis, all urine specimens from study entry, 6-, 12-, and 24-month visits that were available in July 2014 were assayed.
Statistical models for reclassification
Reclassification was defined as an increase in primary or secondary Gleason grade at biopsy and/or an increase in the biopsy cores with cancer to total cores collected (cores ratio) to ≥34%10. Several models of reclassification were considered, as depicted in the study schematic (Fig. 1).
Modeling short-term biopsy reclassification
The association between urine biomarkers collected immediately prior to a biopsy and reclassification at the biopsy was modeled using logistic regression. The analysis was stratified by the first surveillance biopsy (sBx1, sometimes called confirmatory biopsy; Fig. 1a, n = 552) and subsequent surveillance biopsies (Fig. 1b, n = 446).
Modeling time to reclassification with longitudinal biomarkers
The association between PCA3 or T2:ERG and the time to biopsy reclassification was modeled using a partly conditional Cox proportional hazards (PH) model11. Participants were excluded from this analysis if they reclassified at the sBx1. Each participant had the first urine biomarkers assayed after diagnosis and prior to sBx1 and had up to 3 additional urine samples collected up to 2 years after the first. The first urine biomarker sample along with the urine biomarker kinetics at each observation time were covariates in the partly conditional Cox model (Fig. 1c, n = 405). Participants without reclassification were censored at date of last study contact, treatment, or 2 years after their last biopsy, whichever came first.
Urine biomarker kinetics were calculated based on a linear mixed-effect model (LMEM), in which the natural log of the urine biomarkers was modeled as a linear function of time since diagnosis, with a random intercept indicating the individual-specific ln(urine biomarker) at diagnosis, and a random slope reflecting individual-specific rate of change over time. A PCA3 or T2:ERG kinetics (PCA3k or T2: ERGk) value for each participant based on the first urine sample up to an observation time was then derived based on the best linear unbiased predictor (BLUP) estimator from the LMEM11. Intra-class correlation (ICC) was calculated to determine the proportion of total variability in biomarker scores explained by between-participant variability.
For statistical models, the natural log (ln) of the urine biomarkers was calculated as ln(PCA3) and, owing to possible values of 0 in T2:ERG, ln(T2:ERG+1). We also adjusted for, as appropriate, most recent ln(PSA), ln[time since diagnosis (in years)], ln(prostate size), ln[maximum ratio of positive to total biopsy cores (cores ratio)], diagnostic Gleason score, number of prior biopsies, number of prior negative biopsies, cT-stage (T1a–c vs T2a–c), body mass index (BMI; obese, overweight, or normal), race (Caucasian, African American, or other), ethnicity (Hispanic vs non-Hispanic/other), age at diagnosis, and family history of PCa. Insignificant clinical variables were backwards eliminated based on a p value cutoff of 0.05. Robust variance estimators were used to account for multiple biopsies or urine specimens within participant, where appropriate (Fig. 1b, c).
Evaluation of clinical performance
If either PCA3 or T2:ERG was significant in a model, model performance was compared between a clinical model containing clinical and biopsy variables and a model with the urine biomarkers added. Model performance was assessed with receiver operating characteristic (ROC) curves and area under the curve (AUC). Confidence intervals (CIs) for model performance assessments were obtained by calculating 1000 bootstrap samples. Decision curve analysis (DCA) was used to report the clinical net benefit of each model compared to biopsy-all and biopsynone strategies12. All analyses were performed with SAS version 9.4 and R version 3.4.1. Code is available upon request.
Results
There were 782 participants included with a median followup of 4.9 years (interquartile range: 3.5, 6.5) among censored participants. Urine specimens were collected at 6- to 12-month increments such that 627 participants contributed at least 2 specimens, 448 contributed at least 3 specimens, and 209 contributed 4 specimens. There were 552 participants who had urine collected prior to the sBx1 (Fig. 1). Median age was 63, PSA was 4.8, prostate size was 40 cc, 94% were initially diagnosed with Gleason 3+3 cancer, and the median ratio of cores containing cancer to total biopsy cores (cores ratio) was 8.3% (Table 1). Results of each analysis depicted in Fig. 1 are described below.
Analysis of reclassification at next biopsy
Of the 552 men with urine biomarkers assessed prior to the sBx1 (Fig. 1a), 130 (24%) were reclassified at that biopsy. In a logistic regression model adjusted for PSA, cores ratio, and prostate size, PCA3 score was associated with reclassification in the sBx1 (odds ratio (OR) = 1.3; 95% CI: 1.0–1.7) and T2:ERG score was not (Table 2). A model in which the endpoint was only Gleason grade reclassification was similar (data not shown).
There was a small change in the AUC of a model for predicting reclassification at sBx1 using clinical variables plus PCA3 vs a model with only clinical variables: 0.753 [95% CI 0.707–0.800] vs 0.743 [0.693–0.791] with and without PCA3, respectively (Fig. 2). No improvement in AUC was found for a model including T2:ERG (Fig. 2). Similar findings were observed in DCA (Fig. 2). A model with clinical variables and PCA3 showed a minimal increase in net benefit relative to a model with only clinical variables. All models with clinical variables showed an improvement to PCA3 alone, and all models showed an improvement to biopsy-all and biopsy-none strategies.
In the 446 men with urine biomarkers assessed prior to subsequent surveillance biopsies (Fig. 1b), 85 (19%) reclassified at the biopsy immediately following biomarker assessment (Supplemental Table 1). In a logistic regression model adjusted for clinical variables, neither PCA3 nor T2: ERG were associated with reclassification (OR = 1.01; 95% CI: 0.77–1.32, p = 0.96, and OR = 1.12; 95% CI: 1.00–1.27, p = 0.06, respectively; Supplemental Table 2).
Analysis of time to reclassification for longitudinal biomarkers
There were 405 participants included in the time-to-event analysis who had their first urine sample collected prior to the sBx1 and who did not reclassify at the sBx1 (Fig. 1c). With a median follow-up of 3.6 years from the first urine collection, 103 (25%) participants reclassified at any subsequent surveillance biopsy (Supplemental Table 3).
The annual percentage of change in PCA3 estimated by LMEM was 9.8 (95% CI 7.3–12.3, p < 0.001). As determined by ICC, 85% of the observed variation in PCA3 was explained by between-participant variation and 15% due to within-participant variation. The annual percentage of change in T2:ERG was 11.3 (95% CI 5.2–17.8, p < 0.001), and 68% of the observed variation was explained by between-participant variation and 32% due to within-participant variation. Biomarker kinetics were calculated based on deriving a BLUP estimator from a LMEM11. No significant differences in slopes were found between participants with reclassification vs those with no reclassification for either biomarker (Supplemental Fig. 1).
In a Cox PH model adjusted for time since diagnosis, BMI, prostate size, cores ratio, biopsies since diagnosis (0 vs 1+), negative biopsies since diagnosis (0 vs 1+), and PSA, no significant association was found between baseline PCA3 or T2:ERG and reclassification (hazard ratio (HR) for PCA3 = 1.16; 95% CI 0.86–1.57, p = 0.33, and HR for T2: ERG = 0.92; 95% CI = 0.75–1.12, p = 0.40) or PCA3 or T2:ERG kinetics and reclassification (HR for 0.10 increase in PCA3k = 0.96; 95% CI 0.44–2.09, p = 0.92, and HR for 0.10 increase in T2:ERGk = 1.56; 95% CI 0.73–3.34, p = 0.26) (Table 3).
Discussion
Non-invasive assays for diagnosing or monitoring prostate cancer have the potential to aid treatment decisions and improve clinical outcomes. In this context, biomarkers assayed in urine represent an attractive approach and several urine-based assays have been developed. The Progensa PCA3 assay is a commercially available, analytically validated diagnostic test that has been US Food and Drug Administration approved to inform biopsy decision making in men with no known cancer and a previous negative biopsy. PCA3 is a prostate-specific non-coding mRNA and has been shown in many studies to improve predictive accuracy for cancer on initial biopsy7,8,13,14 and to be correlated with more aggressive cancer at prostatectomy15,16. At the time we initiated this work, the T2:ERG assay had been analytically validated6 and was being developed as a commercial assay. Thus we hypothesized that both biomarker assays could improve management of patients using AS.
In this report from a multi-center contemporary AS cohort, we evaluated the association between urinary PCA3 and T2:ERG and biopsy reclassification using urine collected at multiple times during surveillance. The study was designed according to PRoBE criteria17, and analyses were tailored to help inform varying decisions made during AS. After adjusting for clinical and biopsy variables, we observed a significant association of PCA3 with reclassification at the sBx1, but only a modest improvement in AUC was found between a model with clinical variables only and a model with clinical variables plus PCA3. Similarly, in DCA minimal improvement in net benefit was observed when PCA3 was added to models. We also found no association between either baseline PCA3 or T2:ERG and time to reclassification and no association between changes in the biomarker scores over time and time to reclassification.
Our motivation for performing this study was that biomarkers that improve discrimination of indolent cancers and more aggressive tumors will not only support the practice of AS but also promote less intensive biopsy regimes. However, to optimize patient management and clinical utility, biomarkers should improve upon existing information. Multimodal risk assessment approaches that combine several sources of information into a risk score have been developed, such as the Prostate Cancer Prevention Trial (PCPT) Risk Calculator, which is used prior to a diagnosis of cancer to predict the risk of finding high-grade cancer in the next biopsy18, and PCA3 has been shown to improve the net benefit of the PCPT Risk Calculator19. Several commercial biomarker panels employ such a multimodal strategy for predicting the risk of finding high-grade cancer prior to a diagnosis of cancer, including the 4Kscore20 and more recently the Select score21. Because men who are candidates for AS have already been diagnosed with cancer and have available more clinical information than prior to diagnosis, risk models that include information from the index biopsy have been developed for the AS setting22,23. These models utilize commonly available clinical variables and provide utility in risk prediction while on AS.
Studies evaluating the use of PCA3 in AS have been limited and sample sizes have been small24,25. In the current study, which is the largest study to date of PCA3 in men using AS, after adjustment for clinical variables available after cancer diagnosis, we found a significant association of PCA3 with reclassification at the sBx1 (adjusted OR = 1.3, p = 0.02) but not for subsequent biopsies (adjusted OR = 1.01, p = 0.96). Although we found no association between T2:ERG and biopsy reclassification, some studies have suggested improved performance when PCA3 and T2:ERG are used in combination [26] or combined into a Mi-Prostate Score (MiPS) for the initial diagnosis of PCa27. We thus combined PCA3 and T2:ERG into a MIPS, but found little or no improvement over PCA3 alone (data not shown).
In ROC curve analysis of models for predicting reclassification at sBx1, addition of PCA3 to a model that contained PSA, prostate size, and ratio of positive to total biopsy cores improved the AUC of the model very minimally. However, in a clinical setting, predictions are not being made over the full range of sensitivity and specificity. The likely use for a biomarker measured after a diagnosis of low-risk prostate cancer would be to “rule out” men who are at very low risk of harboring high grade or volume cancer, allowing them to delay having a biopsy. Thus we examined specificity at 95% sensitivity. The addition of PCA3 to a clinical model increased specificity at 95% sensitivity (0.156 vs 0.095), but the difference was not significant. Similarly, a slight improvement was seen in the net benefit of proceeding to biopsy for models including PCA3 at about 15% risk threshold, but at other risk thresholds addition of PCA3 to clinical variables made no improvement in the benefit of a decision to proceed to biopsy.
We also evaluated whether the urinary biomarkers, alone or in combination, improved upon clinical variables for prediction of time to future reclassification. Although PCA3 alone was associated with time to reclassification, when incorporated into a multivariable model neither PCA3 or T2:ERG were.
Using the same model, we evaluated whether changes in PCA3 or T2:ERG scores measured over time (biomarker kinetics) were associated with reclassification. We used samples collected prior to the sBx1, and at 6-month intervals up to 2 years, and employed an analytic strategy that allowed the models to account for each individual biomarker measurement while utilizing information from the general trend across all participants and accommodating for random variability in the biomarkers. Although PCA3 increased over time, and both PCA3 score and T2:ERG score were higher in men who reclassified than those who did not, there were no significant differences in the slopes of either biomarker over time for event vs non-event participants (Supplemental Fig. 1), and we found no association between biomarker kinetics and reclassification. Our results are consistent with the one other longitudinal study of PCA3 in a smaller, more uniform risk cohort, suggesting that longitudinal PCA3 measurements do not add value over a single PCA3 measurement25.
We found a surprisingly large amount of variability in the longitudinal samples, in particular for T2:ERG. To assess the variation statistically, we used ICC, which can be interpreted as the proportion of total variability explained by between-participant variability. The variation in longitudinal measurements of PCA3 is very similar to that of PSA11: for both PCA3 and PSA, 15% of the variability can be attributed to random variability. However, 32% of the variability in T2:ERG was attributed to noise.
This study has limitations. Our design does not allow for us to address whether the observed variability is due to biology, specimen collection methods, or assay performance. Second, although this is the largest study of urine biomarkers in AS published, the sample size is somewhat modest. However, while we expect that a larger sample size may reduce confidence intervals, we do not expect that it would result in different conclusions. Another limitation may be the reliability of our endpoint. Biopsy reclassification is imperfect in that it may reflect minimal changes in the tumor that may have little clinical importance. Nonetheless, our definition is consistent with those used in most AS cohorts and, importantly, drives treatment decisions in contemporary clinical practice. We did evaluate biomarker scores in only the 94% of men who were diagnosed with 3+3 disease and found no difference in results. Similarly, we evaluated biomarker scores in the 31 men who reclassified to 4+3 or higher at sBx1 and found no difference from the scores in the men who reclassified to 3+4 (data not shown).
In conclusion, we found that PCA3 was associated with reclassification at sBx1 in a multivariable model, but PCA3, or PCA3 and T2:ERG together, demonstrated minimal improvement to the clinical utility of a multivariable model. Neither PCA3 nor T2:ERG was associated with time to reclassification at subsequent biopsies. Overall, we found that these markers add little or no improvement over clinical variables in predicting biopsy reclassification during AS.
Compliance with ethical standards: Conflict of interest – The authors declare that they have no conflict of interest.
Authors: Lisa F. Newcomb1,2, Yingye Zheng3, Anna V. Faino3, Daniella Bianchi-Frias4, Matthew R. Cooperberg5,6, Marshall D. Brown3, James D. Brooks7, Atreya Dash8, Michael D. Fabrizio9, Martin E. Gleave10, Michael Liss11, Todd M. Morgan12, Ian M. Thompson13, Andrew A. Wagner14, Peter R. Carroll5, Peter S. Nelson4, Daniel W. Lin1,2
1. Cancer Prevention Program, Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA