National Health and Nutrition Examination Survey

Overview

The National Health and Nutrition Examination Survey (NHANES) is a program of studies designed to assess the health and nutritional status of adults and children in the United States. The survey is unique in that it combines interviews and physical examinations. NHANES is a major program of the National Center for Health Statistics (NCHS). NCHS is part of the Centers for Disease Control and Prevention (CDC) and has the responsibility for producing vital and health statistics for the Nation.

The NHANES program began in the early 1960s and has been conducted as a series of surveys focusing on different population groups or health topics. In 1999, the survey became a continuous program that has a changing focus on a variety of health and nutrition measurements to meet emerging needs. The survey examines a nationally representative sample of about 5,000 persons each year. These persons are located in counties across the country, 15 of which are visited each year.

The NHANES interview includes demographic, socioeconomic, dietary, and health-related questions. The examination component consists of medical, dental, and physiological measurements, as well as laboratory tests administered by highly trained medical personnel.

Public-Use Data

Much of the NHANES data are available in public-use form through the NHANES website. Because of the longitudinal nature of the data and its complexity, data are released after each two-year cycle and in separate categories. For Continous NHANES, each cycle is divided into five sections labeled by collection method: Demographics, Dietary, Examination, Laboratory, and Questionnaire. Within each section are many individual components — groups of related variables packaged in a data file. To look for specific variables, you can perform a keyword search by following the Search Variables link from the Questionnaires, Datasets, and Related Documentation page. You can search across all survey cycles or restrict your search to a single data release cycle.

Restricted-Use Data

NHANES also has restricted-use data. Some variables or entire data files are not publicly released due to disclosure concerns, for example, geographic identifiers and some sensitive topics. These files are only available through the Research Data Center (RDC). Please review the Data Release and Access Policy for more information. The Limited Access Data component page for each survey cycle contains documentation, including a codebook with frequencies, to assist data users preparing proposals to use the Research Data Center.

Harmonized Public Data

A recent project from an interdisciplinary group of scholars harmonizes 30 years of NHANES data. The authors have

[D]eveloped a set of curated and unified datasets and accompanied code by merging 614 separate files and harmonizing unrestricted data across NHANES III (1988-1994) and Continuous (1999-2018), totaling 134,310 participants and 4,740 variables. The variables convey 1) demographic information, 2) dietary consumption, 3) physical examination results, 4) occupation, 5) questionnaire items (e.g., physical activity, general health status, medical conditions), 6) medications, 7) mortality status linked from the National Death Index, 8) survey weights, 9) environmental exposure biomarker measurements, and 10) chemical comments that indicate which measurements are below or above the lower limit of detection. We also provide a data dictionary listing the variables and their descriptions to help researchers browse the data. We also provide R markdown files to show example codes on calculating summary statistics and running regression models.

The authors have created 10 separate data modules based on NHANES categories: 1) Mortality, 2) Demographics, 3) Questionnaire, 4) Dietary, 5) Medications, 6) Occupation, 7) Chemicals, 8) Comments, 9) Weights, and 10) Response.

The data modules, data dictionaries, and cleaning documentation are available for download here. The datasets are available in both CSV and RData format.

In addition to the data and dictionaries, the authors provide R markdown files for getting started with analysis, including merging datasets together, accounting for sampling design, creating summary statistics, and running regression models.

Sample Code

The sample code provided uses the harmonized datasets to merge datasets, run models, and calculate summary statistics. The code can be found here.