The number of finished consultant episodes (FCEs) (the time spent under the care of one consultant whilst an inpatient) for all diagnoses in England from 1 April 2014 to 31 March 2015 was obtained from inpatient activity reports published by NHS Digital [1]. Diagnoses were coded using three and four character International Classification of Diseases, tenth revision (ICD-10) codes. The FCEs for codes in ICD-10 chapters I-XIV and XVI-XVII were examined. Pregnancy-related conditions, symptoms, signs, abnormal clinical and laboratory findings, and external causes of morbidity and mortality were excluded. Three or four character ICD-10 codes were assigned to specific conditions in the different disease categories as agreed between Dr Valerie Kuan and: Professor Aroon Hingorani (benign neoplastic, cancers, cardiovascular, digestive, ear, endocrine, eye, genitourinary, haematological or immunological, infections, musculoskeletal, neurological, perinatal or congenital, psychiatric, respiratory, and skin), Dr Osman Bhatti (benign neoplastic, cancers, ear, endocrine, eye, haematological or immunological, musculoskeletal, neurological, perinatal or congenital, psychiatric, respiratory, and skin), Dr Shanaz Husain (benign neoplastic, cancers, ear, endocrine, eye, haematological or immunological, musculoskeletal, neurological, perinatal or congenital, psychiatric, respiratory, and skin), Dr Shailen Sutaria (infections), Professor Dorothea Nitsch (genitourinary), Mrs Melanie Hingorani (eye), Dr Constantinos Parisinos (digestive), Dr Tom Lumbers (cardiovascular) and Dr Reecha Sofat (cardiovascular).
Conditions with codes that had more than 10,000 FCEs were included. If a condition had fewer than 10,000 FCEs but it was considered to be clinically important, it was included in the study.
Infections were categorized by organ system and causal organism. Chronic infections with long-term sequelae included were HIV, chronic viral hepatitis, tuberculosis, and rheumatic fever. Acute infections were limited to hospital admissions. Obesity was only considered for individuals above the age of 18 years.
308 physical and mental health conditions involving intensive use of healthcare resources were selected. These included health conditions from QOF, with modifications for more granular phenotypes reflecting distinct pathological pathways where applicable, such as type 1 diabetes mellitus, type 2 diabetes mellitus and diabetes mellitus (‘other’ or ‘unspecified’).
Health conditions were harmonised across primary and secondary care coding systems and organised into 16 disease categories corresponding closely to ICD-10 chapters.
Phenotyping algorithms defining 302 of the 308 conditions were based on diagnosis or procedural codes. The case definitions for the remaining six conditions used blood test values or other measures, namely: estimated glomerular filtration rate (eGFR) for chronic kidney disease, total cholesterol (TC) for raised total cholesterol, low density lipoprotein-cholesterol (LDL-C) for raised LDL-C, high density lipoprotein-cholesterol (HDL-C) for low HDL-C, triglyceride (TG) for raised triglyceride and body mass index (BMI) for obesity. Phenotyping algorithms for eleven conditions (stable angina, unstable angina, myocardial infarction, coronary heart disease not otherwise specified, hypertension, peripheral arterial disease, atrial fibrillation, abdominal aortic aneurysm, type 1 diabetes, type 2 diabetes and diabetes other or not specified) were adapted from algorithms and codelists that had previously been defined in the CALIBER portal.
Diagnoses and procedures are recorded in CPRD using Read codes. ICD-10 diagnosis codes and Office of the Population Censuses and Surveys Classification of Interventions and Procedures version 4 (OPCS-4) procedural codes are used in HES-APC.
The selection of ICD-10 codes for the ICD-10 codelists has been described above. These ICD-10 codes were mapped to Read codes as follows: cross-maps provided by NHS Digital were used to look up similar terms between ICD-10 and Read codes [2]; a list of keywords for each condition was constructed in agreement with the clinicians responsible for diseases in the respective categories; keyword searches were performed on a lookup file provided by CPRD (medical.txt) which contains Read codes, CPRD medcodes and a verbal description common to both codes; Read codes with their corresponding descriptive terms identified from the cross-mapping and keyword searches were concatenated into a long list; the long list was further supplemented by Read codes in the Read code hierarchy that were adjacent to the codes identified from the cross-mapping and keyword searches, which were considered possible candidates for inclusion in the respective codelists; finally, this long list was pruned in collaboration with the clinicians responsible for the respective categories to obtain the final Read codelists for each condition.
Where procedures were identified in Read codes for a specific condition, a keyword search for these procedures was performed in the OPCS-4 dictionary. These terms were concatenated with adjacent terms within the OPCS-4 hierarchy that were considered potentially relevant to the specific condition to form a long list. The final OPCS-4 codelist was constructed from this long list together with the clinicians responsible for diseases in the respective categories.