Using SAS® to Analyze ICD-9 and ICD-10 Diagnosis Codes Found in Administrative Health-Care Data

Kathy Fraeman
Director, Data Analytics and Principal Data Analyst, Evidera

Presentation Description
Administrative health-care data – including insurance claims data, electronic medical records (EMR) data, and hospitalization data – contains standardized diagnosis codes to identify diseases and other medical conditions. These codes use the short-form name of ICD, which stands for International Classification of Diseases. Much of the currently available health-care data contains the ninth version of these codes, referred to as ICD-9. Although, the more recent 10th version, ICD-10, is becoming more common in health-care data. These diagnosis codes are typically saved as character variables, are often stored in arrays of multiple codes representing primary and secondary diagnoses, and can be associated with either outpatient medical visits or inpatient hospitalizations. SAS® text processing functions, array processing, and the SAS colon modifier can be used to analyze the text of these codes and to identify similar codes or ranges of ICD codes. In epidemiologic analyses, groups of multiple ICD diagnosis codes are typically used to define more general comorbidities or medical outcomes. These disease definitions based on multiple ICD diagnosis codes, also known as coding algorithms, can either be hardcoded in a SAS program or defined externally from the programming. When coding algorithm definitions based on ICD codes are stored externally, the definitions can be read into SAS, transformed to SAS format, and dynamically converted into SAS programming statements.

