Cancer registries play a central role in the documentation of cancer information. Over the past few decades, the data quality of cancer registries has been the subject of considerable debate among healthcare planners and has received increasing attention among epidemiologists. This is because of the importance of cancer registries’ data in health services planning, and epidemiological research. Cancer registries are responsible for collecting the basic demographic and disease information of every patient diagnosed with cancer and producing high-quality cancer statistics. The quality of cancer registry data is evaluated using different techniques to improve the registration process, completeness, and accuracy. This review aims to describe the quality of cancer registration as reported in the literature, highlighting the effect of the completeness and accuracy of cancer data on survival estimates. A limited number of studies have looked at the quality of cancer data. The existing literature indicated several limitations on the quality of cancer data that influence the estimates of cancer survival and contribute to international variations of cancer survival between countries. This effect could make survival estimates either underestimated or overestimated. No specific data field was reported to be responsible for the change in survival estimate. However, the importance of some clinical fields, such as clinical stages and treatments, has been highlighted in pieces of literature. Survival statistics based on cancer registries were also affected by the presence of death certificate only (DCO) registrations. Complete and accurate data are crucial for obtaining reliable results and valid inferences in oncology research.
accuracy, completeness, cancer, data, survival
SCR: Scottish Cancer Registry, DCO: death certificate only, OCR: Ontario Cancer Registry, EMR: electronic medical record
Cancer registries play a central role in the documentation of cancer information. Over the past few decades, the data quality of cancer registries has been the subject of considerable debate among healthcare planners and has received increasing attention among epidemiologists. This is because of the importance of cancer registry data in health services planning, epidemiological research, and evaluation of cancer health care services [1]. Data quality at cancer registries is determined by at least four important factors: comparability, completeness, validity (accuracy), and timelines. Each of these prerequisite characteristics has special assessment procedures [2]. For instance, completeness can be evaluated by either qualitative or quantitative methods.
The former includes historic data methods, mortality incidence ratios, and histological verification of diagnosis. The latter examines the degree of completeness of registration by applying independent case ascertainment comparisons, capture-recapture methods, and death certificate methods [3].
Completeness is the extent to which all appropriate data items within the jurisdiction covered by the cancer registry are recorded [4]. This metric plays a fundamental role in assessing the quality of data. Various studies have documented completeness rates that range from 36% to 99.3% [5]. Factors that influence completeness encompass the application of multiple data sources, proactive methods for identifying cases, and the promptness of data submission. Ensuring the thoroughness of case identification by reporting facilities is an essential element of the quality assurance procedures of a central cancer registry. It is imperative for registries to validate that all relevant tumor cases are being reported by the facilities [6].
Another important dimension of data quality is accuracy, defined as the proportion of cases with specific characteristics that actually have that attribute [4]. Accuracy constitutes a crucial aspect of data quality. The evaluation of accuracy can be accomplished through the comparison of registry data against a reference standard, such as medical records. Studies have documented agreement rates ranging from 84.7% to 99.6% for various data elements [7]. Furthermore, certain registries have pinpointed misclassification in aspects like diagnostic criteria, laterality, and gender as areas necessitating enhancement. The utilization of multiple data resources and logical validations can contribute to the enhancement of accuracy [8].
The quality of cancer data is a crucial aspect of cancer research and patient care. Therefore, ensuring the accuracy and reliability of these data is essential for making informed decisions about treatment options, predicting patient outcomes, and developing effective strategies for cancer prevention and management. Given the critical role of completeness and accuracy of cancer data that has been reported by several researchers, the reviewed studies were concerned with those two aspects of data quality to identify their levels at cancer registries and medical databases. This review aims to describe and discuss the quality of cancer registration as reported in the literature, highlighting the effect of the completeness and accuracy of cancer data on survival estimates.
Previous studies called ‘in-house’ studies were conducted by Brewster and colleagues [9–15] and published in the period between 1994 and 2008. They worked on the assessment of data accuracy and case ascertainment for all cancer registrations at the Scottish Cancer Registry (SCR), which is responsible for collecting cancer data in Scotland. Some studies have concluded that data at the SCR were found to be at a high level of accuracy and completeness. Furthermore, the overall completeness and accuracy in 1992 at the SCR were estimated at 96.5% [13, 14].
However, accurate capture of pathologic details was challenging. Brewster et al. [11] assessed the accuracy of lung cancer registration using a random sample of 340 patients registered with lung carcinoma in 1990. These registrations were compared with their relevant medical records, which were available for 309 included in the study. Death certificate registrations accounted for 20 cases. The comparison was based on selected data items, including demographic data, treatment and histological verification data, and site and morphologic data. Their results revealed some discrepancies, that were higher for site codes, morphology codes, and histological verification. These discrepancies were reported as 56.5%, 47.25%, and 12.5%, respectively. The researchers explained that these errors in data items were due to the missed data and the time of re-abstraction of the database for this study.
A similar methodology was undertaken and close findings were reported in other previous studies concerned with colorectal and breast tumors, and non-melanoma skin cancer. Although the overall data quality at the SCR was considered high, there were a few limitations in some data items related to those cancers. This was in addition to the reported under-ascertainment of intracranial tumors since the SCR was missing 46% of cases [16].
In addition, a retrospective review of the SCR data in 1997 found significant issues with reliability for grades of differentiation, staging variables, and dates of treatment [13]. Similar problems have been reported in the Thames Cancer Registry [17]. Accordingly, Klassen et al. [18] pointed out that the stage of disease at diagnosis and histological grade are the two clinical characteristics often missing in the cancer registry.
In a comparison of registry data with hospital data, some authors argue that hospital data would be more reliable than cancer registries [19]. However, Schouten et al. [20] suggested that data collected by clinicians cannot be considered a golden standard’ in cancer registries. This is because cancer registries and clinicians may collect data with a ‘different perspective’. Data collected by clinicians is mainly used to determine the treatment and prognosis of patients, therefore, it is characterized by less basic detail. Whereas, people who work at cancer registries are trained to follow coding rules. This study, however, was cross-sectional, it measured the difference in the quality of clinical and registry data at one point in time only.
These suggestions have been observed to a certain extent in Gregor et al. study [21], they reported considerable limitations in the medical records available for 91.2% of the participants (4465) diagnosed in 1995. The missing data is noticed in important prognostic fields. For example, 423 cases did not have their staging detail, and 999 patients did not have their microscopic verification. Comparing hospital data with cancer registry was reported in further studies [22, 23].
Regarding other literature on the UK cancer registries, a study found cancer records characterized by high standards of completeness, accuracy, and reliability [24]. However, another study [25] found that cancer data in the United Kingdom has many aspects of incompleteness in many data fields, such as sex, age, staging information, and treatment. In addition, Adams et al. [26] suggest that there is a socioeconomic gradient in the quality of data from cancer registries in the United Kingdom. For example, death certificate only (DCO) registrations represent the only evidence for a diagnosis of cancer, are the most common available records for the most deprived population. Jones et al. [27] added that such registrations are associated with males rather than females. The presence of these registrations, therefore, reflects the fact that the diagnosis of cancer was post-mortem. In such cases, the date of incidence is the date of death, and this negatively affects the overall cancer survival estimate. Despite the continuing efforts of cancer registries to collect cancer incidence data, some literature has reported evidence of missing and inaccurate cancer registry data under the terms of a voluntary case reporting policy. Consequently, these limitations create frequent problems in data analysis and interpretation. Cancer survival estimates may be under or overestimated based on the processes of data collection, reporting, and analysis.
The incompleteness of medical records may introduce a selective bias in the cancer survival estimates of patients with relatively poor or good prognoses. Brenner et al. [28] examined the impact of data incompleteness on five-year survival estimates in patients aged 15 years and older, with a first diagnosis of one of the common types of cancer in Finland, in the period between 1990 and 1999. They concluded that selective under-ascertainment of patients with a good prognosis may lead to an underestimation of cancer patient survival, whereas an opposite effect could result from selective under-ascertainment of patients with poor prognosis.
A study from Germany reported some incompleteness of Hamburg Cancer Registry data on certain malignancies such as colon, prostate, and urinary bladder. The authors conclude that such limitations in data quality may impact the validity of the cancer registry to produce convincing survival estimates [29].
Survival estimates based on data from cancer registries are also affected by the presence of DCO registrations; in addition to the incompleteness of case ascertainment [30, 31]. DCO cases are associated with patients who have shorter survival times on average. Pollock et al. [30] examined the impact of adding DCO registrations to district health authority data on the estimation of 5-year survival of colorectal cancer. As mentioned earlier, for DCO registrations, the incidence date (date of diagnosis) and date of death are the same. Therefore, the duration of survival is considered to be zero. They found that survival rates were decreased by 8.6%. They concluded that the DCO registrations play an important role in declining the survival estimate for a concerned population, and the exclusion of DCO is not the solution to the issue, but rather the improvement of the quality of DCO. Likewise, another study examined the relationship between those two factors and pointed out that the percentage of high DCO indicates a poor case ascertainment. However, low DCO registrations do not reflect a complete case ascertainment [32].
Moreover, Berrino [33] suggested that these factors may cause either under or over-cancer survival estimations. A study by Robinson et al. [34] aimed to assess the impact of DCO registrations and incomplete ascertainment on survival estimates of certain cancer sites, including lung cancer. Their methodology was based on comparing 5-year survival estimates using data from the Finland and Thames cancer registries. They compared survival estimates before and after adjustment for DCO registrations, incompleteness, and both. Their findings confirmed the observations that were reported by Berrino [33] and have shown that the 5-year survival estimate was influenced by the presence of both DCO and under ascertainment, they reported that Finland registry data are visually complete and had few DCO registrations due to the obligatory legislation of cancer registration in Finland. Consequently, adjustments for those factors had little effect on survival estimates.
On the other hand, the Thames Cancer Registry in the United Kingdom, where cancer registration is voluntary, had data showing less completeness and higher DCO proportions. As a result, adjusting for under-ascertainment caused a substantial increase in survival estimates, whereas, adjusting for DCO led to a marked reduction in survival estimates. In other words, the presence of DCO and incomplete case details in cancer registries can cause bias in the calculated survival estimate in opposite directions. This study relied on the proportions of DCO both in Finland and Thames cancer registries as a measure for the quality of data. This approach, however, was criticized as only providing a broader indicator of data quality and not accurately estimating the completeness [32]. In addition, it would not be sufficient to reflect the overall completeness of data in both registers as the researchers assessed the completeness for sex and cancer site fields only.
Beral et al. [35] have reported that the incompleteness in the UK cancer registrations made survival estimates misleading and seem significantly worse than they are. In contrast, mortality statistics are more reliable than survival rates. This is because death certificates are legally required for burial and cremation. Therefore, death information in the cancer registry is almost complete, as the death certificates are automatically transferred to the regional registries. The authors discussed many limited aspects of the UK cancer survival statistics which are based on cancer registries’ data. They suggest that the absence of non-fatal case registrations can distort the overall survival estimate. In addition, cancer screening also causes substantial distortions by artificially prolonging the recorded survival duration. Similarly, another recent British study reported that incompleteness of case ascertainment can lead to bias in estimates of cancer survival producing very low survival figures [36].
Different findings have been reported where registration is mandatory. In Denmark, Norgaard et al. [22] examined the quality of data on hematological malignancies in a population-based hospital discharge registry by independent comparison with data from the Danish Cancer Registry. Further, they investigated the impact of missing data on survival estimates. They used the positive predictive value (PPV) measure to assess the degree of completeness. This approach indicates that most positive test results are true positives, which can be useful in assessing the quality of the registry data. As a result, they found similarities in the Kaplan-Meier survival curves and the completeness level.
The findings of this study were consistent with Hall et al. study [4] that assessed the quality of the Ontario Cancer Registry (OCR) data. They had clinical information on 898 patients with squamous carcinoma of the head and neck, including index tumor site, date of diagnosis, vital status, date of death, and cause of death from a prospective database at the Kingston Regional Cancer Centre, compared to the same data elements in the OCR for the same patients. Their results indicate a high level of case ascertainment at the registry and no differences in the survival estimate between the two data sources.
Completeness of ascertainment and accuracy of registration with other factors significantly contribute to the international variations in cancer survival [37]. These variations have been widely reported in the literature. More importantly, the EUROCARE publications have presented strong evidence of lung cancer survival differences between the European countries. In particular, the EUROCARE-4 study indicates the drawbacks of under-reporting of cancer data on the accuracy of international cancer survival comparison and interpretation; consequently, survival estimates could not be reliable in evaluating the quality of cancer care [38].
The quality of cancer data and its impact on survival estimates are influenced by various factors outlined in recent research. Issues such as loss to follow-up cases can lead to biased survival estimates, especially in small population-based cancer registries, with censoring of cases potentially overestimating survival rates [39]. Accurate survival estimates require the combination of high-quality cancer registry data with electronic medical record (EMR) data. The use of EMR data alone for survival estimates can result in an overestimation of survival times [40]. Additionally, missed deaths within cancer registry data can result in inflated long-term survival estimates, highlighting the importance of detecting and addressing such discrepancies [41].
More importantly, survival rates are considered essential indicators of the effectiveness of cancer services, reflecting the prospects of cure and the overall quality of cancer care [42]. Several studies underscore the critical role of high-quality cancer data for measuring survival and emphasize the need for accurate and comparable survival estimates to assess the overall effectiveness of cancer care in a population, guiding treatment decisions, and improving outcomes for cancer patients [43–45].
In summary, cancer registries collect the basic demographic and disease information of every patient diagnosed with cancer. This routinely collected information produces high-quality research about the incidence, patterns, and mortality from cancer-related conditions. Incomplete or inaccurate data can lead to misinformation about cancer statistics and services. The quality of cancer registry data is evaluated using different techniques to improve the registration process, completeness, and accuracy.
The review aimed to shed light on the existing literature regarding the quality of cancer data at registries and medical settings in terms of completeness and accuracy, and how these aspects affect cancer survival estimates. This review was limited to certain literature sources including Ovid MEDLINE in process, EMBASE, PubMed, and Web of Knowledge from inception until 2023. Further relevant studies were identified by searching references in the obtained full-text papers. However, only articles published in English were included. Being a descriptive review, it does not systematically search and review the completeness and accuracy of cancer registration and its effect on cancer survival estimates. However, it tried to bridge the gap in knowledge and practice regarding the significance of data quality in cancer statistics.
A limited number of studies have looked at the quality of cancer data. The existing literature indicated several limitations on the quality of cancer data that influence the estimates of cancer survival and contribute to international variations of cancer survival between countries. This effect could make survival estimates either underestimated or overestimated. No specific data field was reported to be responsible for the change in survival estimate. However, the importance of some clinical fields such as clinical stage and treatments has been highlighted in pieces of research. The presence of DCO registrations can significantly impact the accuracy and interpretation of cancer survival estimates from registry data, and appropriate measures should be taken to address this issue. Furthermore, maintaining high levels of completeness and accuracy is crucial for cancer registries to provide reliable data for cancer control planning, research, and policy development. Standardized metrics and reporting of data quality are necessary to ensure confidence in the usefulness of cancer registry data.