Disruption of National Cancer Database Data Models in the First … – JAMA Network

  • November 3, 2023
  • November 3, 2023
  • 16 min read

eTable 1. Observed and Expected Cancer Cases by Race and Ethnicity, NCDB 2020
eTable 2. Observed and Expected Cancer Cases by Sex, NCDB 2020
eTable 3. Observed and Expected Cancer Cases by Age at Diagnosis, NCDB 2020
eTable 4. Observed and Expected Cancer Cases by Insurance Status, NCDB 2020
eTable 5. Observed and Expected Cancer Cases by High School Diploma Status, NCDB 2020
eTable 6. Observed and Expected Cancer Cases by Household Income, NCDB 2020
eTable 7. Observed and Expected Cancer Cases by Patient’s Region, NCDB 2020
eTable 8. National Cancer Database Reporting Tools and Data Elements
Customize your JAMA Network experience by selecting one or more topics from the list below.
Lum SS, Browner AE, Palis B, et al. Disruption of National Cancer Database Data Models in the First Year of the COVID-19 Pandemic. JAMA Surg. 2023;158(6):643–650. doi:10.1001/jamasurg.2023.0652
© 2023
Importance  Each year, the National Cancer Database (NCDB) collects and analyzes data used in reports to support research, quality measures, and Commission on Cancer program accreditation. Because data models used to generate these reports have been historically stable, year-to-year variances have been attributed to changes within the cancer program rather than data modeling. Cancer submissions in 2020 were anticipated to be significantly different from prior years because of the COVID-19 pandemic. This study involved a validation analysis of the variances in observed to expected 2020 NCDB cancer data in comparison with 2019 and 2018.
Observations  The NCDB captured a total of 1 223 221 overall cancer cases in 2020, a decrease of 14.4% (Δ = −206 099) compared with 2019. The early months of the COVID-19 pandemic (March-May 2020) coincided with a nadir of cancer cases in April 2020 that did not recover to overall prepandemic levels through the remainder of 2020. In the early months of the COVID-19 pandemic, the proportion of early-stage disease decreased sharply overall, while the proportion of late-stage disease increased. However, differences in observed to expected stage distribution in 2020 varied by primary disease site. Statistically significant differences in the overall observed to expected proportions of race and ethnicity, sex, insurance type, geographic location, education, and income were identified, but consistent patterns were not evident.
Conclusions and Relevance  Historically stable NCDB data models used for research, administrative, and quality improvement purposes were disrupted during the first year of the COVID-19 pandemic. NCDB data users will need to carefully interpret disease- and program-specific findings for years to come to account for pandemic year aberrations when running models that include 2020.
The National Cancer Database (NCDB) is one of the largest cancer registries in the world and is leveraged to improve the quality of cancer care in the United States. The NCDB collects data on 1.5 million new cancer cases each year, representing more than 70% of all cancer cases in the United States. Nearly 1500 Commission on Cancer (CoC)–accredited programs use NCDB data for quality assurance purposes. In addition, researchers analyze NCDB data for comparative effectiveness studies and to optimize the safety and equity of cancer care. Year-over-year NCDB data have been reliably stable.1
The COVID-19 pandemic destabilized usual patterns of cancer care. In early 2020, preliminary reports demonstrated that patients with cancer were more likely to both contract COVID-19 and experience adverse outcomes.2 Cancer screening and treatment triage guidelines recommended intentional delays.3 As ambulatory and hospital-based care resumed, albeit differently from previous years, economic consequences of COVID-19 revealed inequitable impacts on communities.4,5 These forces may have further altered the incidence of cancer diagnosis and treatment during the height of the pandemic.58
The impact of the COVID-19 pandemic on the number and distribution of cancer diagnoses captured in the NCDB has not been determined. While monitoring real-time data submissions,9 the CoC noted variances in new cancer diagnoses in the first several months of 2020. In an effort to better understand the NCDB perspective of the COVID-19 pandemic, we examined all cancer cases added to the NCDB in the years leading up to the pandemic and 2020, which represents the first full year of US involvement. Our objective was to validate and report changes in the trajectory of cancer cases captured by the NCDB in the first year of the pandemic so that users of 2020 NCDB data sets may recalibrate their models accordingly.
This study reviewed cases from the NCDB, a joint project of the CoC of the American College of Surgeons and the American Cancer Society. Cases were selected of adults 18 years or older who were diagnosed with cancer and/or received first-course treatment at the reporting facility from January 1, 2018, through December 31, 2020. Reporting facilities included CoC-accredited programs, excluding Veterans Affairs–affiliated programs. Data were defined and selected using the International Classification of Diseases for Oncology, 3rd Edition10 by primary site topography and histology. All cases were abstracted according to the Standards for Oncology Registry Entry manual.11 Complete reporting of all 2020 cancer cases at the annual call for data was finalized in March 2022. Cancer reporting completeness was verified to confirm that case counts were not affected by cancer registrar workforce issues during the first year of the COVID-19 pandemic in 2020.12
Seventeen primary cancer sites were defined using International Classification of Diseases for Oncology, 3rd Edition topography and histology codes.10 The American Joint Committee on Cancer tumor, node, and metastasis staging and stage group followed the American Joint Committee on Cancer 8th edition manual.13 Tumor stage at diagnosis was defined as American Joint Committee on Cancer pathologic stage group; a clinical stage group was used if the pathologic stage group was unknown. Unstageable malignant neoplasms were excluded from primary site stage analyses. Independent variables included race and ethnicity, sex, age at diagnosis, insurance status, education, household income, and geographic region. Race and ethnicity were assigned by self-report and categorized as Asian including Hawaiian or Pacific Islander, Hispanic, non-Hispanic Black, non-Hispanic White, and other/unknown. Sex was determined by self-report and defined as male, female, and other/unknown with all other gender codes included in the other/unknown group. Insurance categories included Medicaid, Medicare, no insurance, private, other, and unknown. The American Community Survey income (median household income quartiles for 2016-2020) and education (percent without high school degree quartiles for 2016-2020) and American Cancer Society Region were based on patient zip code of residence at the time of diagnosis.
Descriptive univariate statistics for patient demographic and tumor staging characteristics were used to compare observed findings with expected findings for 2020. Independent autoregressive time series models used past observations between January 2018 and December 2019 (monthly, 24 data time points) for all primary sites combined and separate as input to a regression equation to predict the number of cases expected between January and December 2020. For all models, the response was number of observed cases. The explanatory variables consisted of diagnosis month, year, stage, race and ethnicity, sex, age at diagnosis, insurance status, education, household income, and geographic region independently. It was hypothesized that variation exists between the proportion of observed and expected cases. Data analyses used SAS version 9.4 (SAS Institute). The level of significance threshold was P < .05 corrected for multiple comparisons.
Overall, 4 045 097 cancer cases were captured by the NCDB from 2018 to 2020, including 1 392 556 in 2018, 1 429 320 in 2019, and 1 223 221 in 2020 (Table 1). The median (IQR) age at diagnosis was 66 (57-74) years, and 2 173 284 patients (53.7%) were female. Overall, 140 923 patients (3.5%) were Asian including Hawaiian or Pacific Islander, 268 204 (6.6%) were Hispanic, 445 188 (11.0%) were non-Hispanic Black, and 3 057 339 (75.6%) were non-Hispanic White. Stage at diagnosis included 1 251 334 stage I (30.9%), 553 852 stage II (13.7%), 518 849 stage III (12.8%), 654 029 stage IV (16.2%), 270 168 other stage (6.7%), and 796 865 unknown (19.7%). Additional demographic characteristics regarding race and ethnicity, sex, age, insurance, education, income, and geographic location are shown in eTables 1-7 in the Supplement.
Prior to the COVID-19 pandemic, cancer cases captured by the NCDB increased by 2.6% (Δ = +36 764 in 2019 compared with 2018). In 2020, the first year of the pandemic, cases captured by the NCDB decreased by 14.4% (Δ = −206 099) compared with 2019 (Table 1). A decrease in the monthly reported number of cases early in the pandemic, defined as March to May 2020, was observed. This number reached a nadir in April 2020 and partially recovered midyear but not to the levels seen in prior years except for September, during which observed cases fell within the expected range (Figure 1). The reduction of cases diagnosed in 2020 was observed across the 17 primary cancer sites examined, ranging from −10.5% in head and neck cancer to −20.6% in thyroid cancer and other endocrine cancer.
Across all stage groups, monthly stage reporting followed patterns similar to case volumes. Overall, the proportion of early-stage disease at diagnosis decreased from March to June 2020, with a corresponding increase in the proportion of those with late-stage disease, peaking in April 2020 and correcting to prior years’ percentages by July 2020 (Figure 2). However, the observed to expected 2020 stage distributions for individual disease sites with stageable disease varied (Table 2). Digestive system cancer showed a decrease in the observed to expected proportion of stage I cases (35 708 [17.1%] vs 43 055 [17.8%]; 95% CI, 17.7-18.0; P < .001) and an increase in that of stage IV cases (58 312 [27.9%] vs 64 162 [26.6%]; 95% CI, 26.4-26.7; P < .001). In contrast, for breast cancer, the proportion of observed to expected stage I cases increased (130 858 [56.4%] vs 148 639 [54.1%]; 95% CI, 53.9-54.3; P < .001) and the proportion of observed to expected stage IV cases decreased (10 681 [4.6%] vs 13 314 [4.8%]; 95% CI, 4.8-4.9; P < .001).
This study identified differences between the observed to expected 2020 case distributions across sociodemographic strata. Findings for racial and ethnic groups are shown in eTable 1 in the Supplement. Overall, the proportion of non-Hispanic White patients showed a statistically significant increase, while Asian or Pacific Islander, Hispanic, non-Hispanic Black, and other/unknown patients showed a statistically significant decrease in observed to expected cases. At the primary site level, similar trends followed, with notable exceptions, such as lack of statistically significant changes in the observed to expected proportions of Hispanic patients with digestive system cancer and non-Hispanic Black patients with breast cancer.
For age groups (eTable 3 in the Supplement), significant differences were found among decades, with increases in the age groups of 60 to 69 years and 70 to 79 years and decreases in the younger and older age categories at presentation in 2020. Insurance status distribution (eTable 4 in the Supplement) showed increases in patients with Medicare and decreases in all other insurance categories. Education status (eTable 5 in the Supplement) reflected higher proportions of the top 3 education quartiles and a decrease in the proportion of the lowest education quartile in 2020. Median household income quartile (eTable 6 in the Supplement) comparisons showed decreases in the proportions of the lowest- and highest-income brackets and increases in the 2 middle income brackets for cases diagnosed in 2020. Geographic region of residence at diagnosis (eTable 7 in the Supplement) showed increases in the proportion of cases in the North, North Central, and Southeast regions, and decreases in the Northeast, South, and West regions.
Observed to expected proportions among different primary sites of disease for all sociodemographic categories varied in directionality and statistical significance. Case loss in 2020 was evident across all independent variables evaluated. The supplemental patient demographic tables (eTables 1-7 in the Supplement) provide additional details regarding differences in race and ethnicity, sex, age group, insurance type, education, income, and geographic region.
As a requirement for accreditation, CoC member programs must adhere to stringent case reporting standards to maintain the quality of the data they submit to the NCDB.14,15 This study validates the unique findings in the 2020 NCDB data set to inform NCDB data users of pandemic-related divergences from historically consistent data. The COVID-19 pandemic was associated with significant changes in diagnoses of all cancer types in 2020, with a 14.4% overall decline in the number of reported cancer cases in the NCDB compared with the prior year, representing more than 200 000 cancer cases that were not diagnosed and/or treated at CoC facilities. These cancer cases may be missing for now but are expected to appear in 2021 and beyond, potentially at more advanced stages.16
The monthly decline in cancer cases, along with the sharp decrease in the proportion of early-stage disease and increase in late-stage disease observed in early 2020, correlate with timing of stay-at-home orders and triage guidelines.3,17 These trends mirror timelines previously documented for cancer screening, diagnosis, and treatment, as the health care system was strained in the early days of the pandemic.68,16,18,19 Temporal geographic variation in pandemic spread has been well documented20,21 and significant variance in geographic distribution of cancer cases in the NCDB was noted in 2020.
To their credit, cancer programs steadily recovered cases in the second half of 2020. Yearly observed to expected differences in stage at presentation for overall cancer diagnoses and at the primary site level showed statistically significant differences compared with prior years. However, the yearly data did not reflect month to month variation. Even with the decrease in overall stage reporting, the proportion of cases presenting in each stage group remained remarkably stable. A rebound in recovery of backlogged cases was not observed. The reasons for these variances by stage and site may be multifactorial and related to the manner of clinical presentation, the relative role of cancer prevention, screening, and early detection, and resource constraints amid the COVID-19 pandemic.
Demographic characteristics of cancer cases in 2020 in the NCDB reflected in some ways previous reports of exposure and access to general medical care during the outset of the pandemic.4,5,22 The observed to expected proportions decreased for patients diagnosed with cancer in 2020 who were of Asian or Pacific Islander, Hispanic, and non-Hispanic Black race and ethnicity, had lower education and income levels, or had Medicaid or no insurance. However, the observed to expected proportion of patients with private insurance, at highest income levels, and of younger age also decreased.
Greater scrutiny at the primary site level revealed a remarkable lack of skewedness. The observed to expected proportions of non-Hispanic Black patients presenting with breast cancer in 2020 were no different from prior years. Although statistically significant, the observed to expected proportions of Asian or Pacific Islander and Hispanic patients with breast cancer differed by less than 1%. For patients with Medicaid, the observed to expected proportions in 2020 for digestive system, female genital, head and neck, lymphoma, or soft-tissue malignant neoplasms among others were not significantly different from previous years. Whether these findings reflect disparities in cancer prevention, screening, treatment, or outcomes for marginalized groups during the first pandemic year will require further study.4,5,22
The current study focused on the NCDB, which reflects patients treated at CoC-accredited institutions only. A comparison of incident cancer cases in the NCDB with US Cancer Statistics data showed that the NCDB captured 72% of US cancer cases in from 2012 through 2014.1 NCDB case coverage varies by primary site and geographic region and has lower coverage rates for Asian, Hispanic, and Pacific Islander patients. Unknown is whether the findings in this report are reproducible in the 30% of US cancer cases treated in non–CoC-accredited facilities. However, reports from other cancer registries and large observational data sets have disclosed similar pandemic-related findings regarding incidence and stage migration.16,2326
Because the NCDB is the data source for reporting tools for CoC-accredited cancer programs,27 NCDB reports will harbor relevant and significant changes in the findings for 2020 compared with 2018 and 2019. Observed to expected variations in case counts, stage at diagnosis, demographic characteristics, and geographic location for 2020 compared with prior years will be reflected in institutional NCDB site reports by stage distribution, hospital comparison benchmarks, and survival (eTable 8 in the Supplement).
The Rapid Cancer Reporting System (previously the Rapid Quality Reporting System)9 provides CoC-accredited cancer programs with historic and real-time performance data regarding compliance with CoC quality measures. Expected performance rates of compliance with CoC quality measures have been incorporated into accreditation standards to expand evidence-based best practices across CoC programs. Quality measures reflecting time to treatment and stage or age eligibility will be affected by 2020 NCDB data. Cancer quality improvement program reports, which summarize hospital benchmarks, Rapid Cancer Reporting System, survival, and operative mortality for each facility, will similarly be affected by 2020 data now and in future reports using follow-up data. As COVID-19 infection itself may have affected survival outcomes, short-term outcomes such as 30- and 90-day mortality should be considered when assessing treatment outcomes. Furthermore, all CoC reports will need to be analyzed for variance against prior reports, with priority given to those required for CoC accreditation.
With its unique treatment, sociodemographic, and outcome variables, the NCDB has been widely used in cancer outcomes research.28 NCDB participant user files have been used in more than 1500 publications on PubMed.14 The variances in data noted in patient demographic characteristics, stage at diagnosis, monthly temporal trends, and geographic region will likely impact future treatment and survival results. Researchers using NCDB data from 2020 and beyond should be aware of these findings in the 2020 cohort when investigating institutional and disease-specific hypotheses and consider performing their own validation studies prior to incorporating 2020 data. Because significant variance was identified in 2020 NCDB data compared with prior years, we recommend use of the following disclaimer language in publications using 2020 NCDB data: “This study includes data from the year 2020, the first year of the COVID-19 pandemic. Variability in reporting 2020 cases in the NCDB must be considered when interpreting results.” The CoC will follow this cohort over time and is monitoring 2021 data through Rapid Cancer Reporting System. We will conduct similar validity studies with the next data cohorts in 2021 and 2022.
In contrast to other reports,2,4,29,30 this study assesses cancer data in the NCDB rather than the impact of COVID-19 on patients with cancer, treatment adherence, or outcomes. Only gross categories, not specific disease sites, were evaluated, and highly statistically significant findings may not portend clinical significance. Individual primary site findings demonstrated unique patterns that did not consistently reflect overall trends. We do not know if these individual small effect sizes will generate larger differences when incorporated into reports where interactions may occur. Future research will evaluate COVID-19 infection-related associations, treatment adherence and outcomes, and long-term follow-up of the 14.4% of cancer diagnoses that were not treated as expected in CoC-accredited cancer programs.
This report represents a high-level overview to notify the cancer community of the disruption in historically stable NCDB data observed in the first year of the COVID-19 pandemic. NCDB data models for hospital benchmarks, cancer program accreditation, quality measures, and research should acknowledge deviations in the 2020 data and account for monthly and discrete disease site and demographic findings that annualized and grouped results may mask. The cumulative effect of these validated variances on clinical outcomes in 2020 and beyond has yet to be determined.
Accepted for Publication: January 17, 2023.
Published Online: April 12, 2023. doi:10.1001/jamasurg.2023.0652
Corresponding Author: Sharon S. Lum, MD, MBA, Loma Linda University School of Medicine, 11175 Campus St, CP21111, Loma Linda, CA 92354 (slum@llu.edu).
Author Contributions: Dr Lum and Mr Palis had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.
Concept and design: Lum, Browner, Palis, Nelson, Nogueira, McCabe, Mullett, Wick.
Acquisition, analysis, or interpretation of data: All authors.
Drafting of the manuscript: Lum, Browner, Palis, Nelson, Boffa, Wick.
Critical revision of the manuscript for important intellectual content: Lum, Browner, Palis, Nelson, Boffa, Nogueira, Hawhee, McCabe, Mullett.
Statistical analysis: Lum, Browner, Palis, Nogueira, Mullett.
Administrative, technical, or material support: Browner, Palis, Nelson, Hawhee, McCabe, Mullett.
Supervision: Lum, Nelson, McCabe.
Conflict of Interest Disclosures: Dr Boffa reported honoraria from Iovance outside the submitted work. Dr Wick reported grants from Agency for Healthcare Research and Quality outside the submitted work. No other disclosures were reported.
Additional Contributions: The authors wish to acknowledge the Commission on Cancer and National Cancer Database of the American College of Surgeons.
© 2023 American Medical Association. All Rights Reserved.
Terms of Use| Privacy Policy| Accessibility Statement| Cookie Settings

source