June 16, 2017

How to Solve a Problem Like Missing Data

by Alex Sutherland and Catherine L. Saunders

Missing data is a challenge for statisticians, policymakers, and analysts, particularly when a robust evidence base is needed. This is often caused by three key reasons: when data collection is done improperly, when mistakes exist in the data and when the data simply does not exist due to non-responses. The Second Longitudinal Study of Young People in England (LSYPE2), research designed to understand the compulsory education, school-to-work transition, careers, and lives of young people in the UK, suffers from the latter.

The overall aim of the study is to have a dataset that can provide a resource for evidence-based policy development. However, a significant barrier to this aim is the fact that, on top of the more 'run of the mill' missingness (the manner in which data is missing from a sample of a population) that bedevils longitudinal studies, LSYPE2 has systematic incomplete data owing to a boycott of Key Stage 2 (KS2) testing in 2010 that occurred before the study began. Boycotts of national tests leave gaps in pupils' attainment records and, in the case of LSYPE2, threaten to undermine a large-scale (and expensive) longitudinal study with substantial policy relevance. In LSYPE2, KS2 data was missing for approximately 30 per cent of the cohort…

Alex Sutherland is a research leader at RAND Europe and Catherine Saunders is a statistician working in the Cambridge Centre for Health Services Research. Both were involved in Missing Data in the Second Longitudinal Study of Young People in England (LSYPE2), a report for the UK Department for Education.

This commentary originally appeared on Statistics Views on June 16, 2017.