Automated identification of implausible values in growth data from pediatric electronic health records

Carrie Daymont, Michelle E. Ross, A. Russell Localio, Alexander G. Fiks, Richard C. Wasserman, Robert WGrundmeier

Research output: Contribution to journalArticle

9 Citations (Scopus)

Abstract

Objective: Large electronic health record (EHR) datasets are increasingly used to facilitate research on growth, but measurement and recording errors can lead to biased results. We developed and tested an automated method for identifying implausible values in pediatric EHR growth data. Materials and Methods: Using deidentified data from 46 primary care sites, we developed an algorithm to identify weight and height values that should be excluded from analysis, including implausible values and values that were recorded repeatedly without remeasurement. The foundation of the algorithm is a comparison of each measurement, expressed as a standard deviation score, with a weighted moving average of a child's other measurements. We evaluated the performance of the algorithm by (1) comparing its results with the judgment of physician reviewers for a stratified randomselection of 400measurements and (2) evaluating its accuracy in a dataset with simulated errors. Results: Of 2 000 595 growth measurements from 280 610 patients 1 to 21 years old, 3.8% of weight and 4.5% of height values were identified as implausible or excluded for other reasons. The proportion excluded varied widely by primary care site. The automated method had a sensitivity of 97% (95% confidence interval [CI], 94- 99%) and a specificity of 90% (95% CI, 85-94%) for identifying implausible values compared to physician judgment, and identified 95% (weight) and 98% (height) of simulated errors. Discussion and Conclusion: This automated, flexible, and validated method for preparing large datasets will facilitate the use of pediatric EHR growth datasets for research.

Original languageEnglish (US)
Pages (from-to)1080-1087
Number of pages8
JournalJournal of the American Medical Informatics Association
Volume24
Issue number6
DOIs
StatePublished - Nov 1 2017

Fingerprint

Electronic Health Records
Pediatrics
Growth
Weights and Measures
Primary Health Care
Confidence Intervals
Physicians
Research
Datasets

All Science Journal Classification (ASJC) codes

  • Health Informatics

Cite this

Daymont, Carrie ; Ross, Michelle E. ; Localio, A. Russell ; Fiks, Alexander G. ; Wasserman, Richard C. ; WGrundmeier, Robert. / Automated identification of implausible values in growth data from pediatric electronic health records. In: Journal of the American Medical Informatics Association. 2017 ; Vol. 24, No. 6. pp. 1080-1087.
@article{db58a03dd96a4c75987a912c174e0d95,
title = "Automated identification of implausible values in growth data from pediatric electronic health records",
abstract = "Objective: Large electronic health record (EHR) datasets are increasingly used to facilitate research on growth, but measurement and recording errors can lead to biased results. We developed and tested an automated method for identifying implausible values in pediatric EHR growth data. Materials and Methods: Using deidentified data from 46 primary care sites, we developed an algorithm to identify weight and height values that should be excluded from analysis, including implausible values and values that were recorded repeatedly without remeasurement. The foundation of the algorithm is a comparison of each measurement, expressed as a standard deviation score, with a weighted moving average of a child's other measurements. We evaluated the performance of the algorithm by (1) comparing its results with the judgment of physician reviewers for a stratified randomselection of 400measurements and (2) evaluating its accuracy in a dataset with simulated errors. Results: Of 2 000 595 growth measurements from 280 610 patients 1 to 21 years old, 3.8{\%} of weight and 4.5{\%} of height values were identified as implausible or excluded for other reasons. The proportion excluded varied widely by primary care site. The automated method had a sensitivity of 97{\%} (95{\%} confidence interval [CI], 94- 99{\%}) and a specificity of 90{\%} (95{\%} CI, 85-94{\%}) for identifying implausible values compared to physician judgment, and identified 95{\%} (weight) and 98{\%} (height) of simulated errors. Discussion and Conclusion: This automated, flexible, and validated method for preparing large datasets will facilitate the use of pediatric EHR growth datasets for research.",
author = "Carrie Daymont and Ross, {Michelle E.} and Localio, {A. Russell} and Fiks, {Alexander G.} and Wasserman, {Richard C.} and Robert WGrundmeier",
year = "2017",
month = "11",
day = "1",
doi = "10.1093/jamia/ocx037",
language = "English (US)",
volume = "24",
pages = "1080--1087",
journal = "Journal of the American Medical Informatics Association : JAMIA",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "6",

}

Automated identification of implausible values in growth data from pediatric electronic health records. / Daymont, Carrie; Ross, Michelle E.; Localio, A. Russell; Fiks, Alexander G.; Wasserman, Richard C.; WGrundmeier, Robert.

In: Journal of the American Medical Informatics Association, Vol. 24, No. 6, 01.11.2017, p. 1080-1087.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Automated identification of implausible values in growth data from pediatric electronic health records

AU - Daymont, Carrie

AU - Ross, Michelle E.

AU - Localio, A. Russell

AU - Fiks, Alexander G.

AU - Wasserman, Richard C.

AU - WGrundmeier, Robert

PY - 2017/11/1

Y1 - 2017/11/1

N2 - Objective: Large electronic health record (EHR) datasets are increasingly used to facilitate research on growth, but measurement and recording errors can lead to biased results. We developed and tested an automated method for identifying implausible values in pediatric EHR growth data. Materials and Methods: Using deidentified data from 46 primary care sites, we developed an algorithm to identify weight and height values that should be excluded from analysis, including implausible values and values that were recorded repeatedly without remeasurement. The foundation of the algorithm is a comparison of each measurement, expressed as a standard deviation score, with a weighted moving average of a child's other measurements. We evaluated the performance of the algorithm by (1) comparing its results with the judgment of physician reviewers for a stratified randomselection of 400measurements and (2) evaluating its accuracy in a dataset with simulated errors. Results: Of 2 000 595 growth measurements from 280 610 patients 1 to 21 years old, 3.8% of weight and 4.5% of height values were identified as implausible or excluded for other reasons. The proportion excluded varied widely by primary care site. The automated method had a sensitivity of 97% (95% confidence interval [CI], 94- 99%) and a specificity of 90% (95% CI, 85-94%) for identifying implausible values compared to physician judgment, and identified 95% (weight) and 98% (height) of simulated errors. Discussion and Conclusion: This automated, flexible, and validated method for preparing large datasets will facilitate the use of pediatric EHR growth datasets for research.

AB - Objective: Large electronic health record (EHR) datasets are increasingly used to facilitate research on growth, but measurement and recording errors can lead to biased results. We developed and tested an automated method for identifying implausible values in pediatric EHR growth data. Materials and Methods: Using deidentified data from 46 primary care sites, we developed an algorithm to identify weight and height values that should be excluded from analysis, including implausible values and values that were recorded repeatedly without remeasurement. The foundation of the algorithm is a comparison of each measurement, expressed as a standard deviation score, with a weighted moving average of a child's other measurements. We evaluated the performance of the algorithm by (1) comparing its results with the judgment of physician reviewers for a stratified randomselection of 400measurements and (2) evaluating its accuracy in a dataset with simulated errors. Results: Of 2 000 595 growth measurements from 280 610 patients 1 to 21 years old, 3.8% of weight and 4.5% of height values were identified as implausible or excluded for other reasons. The proportion excluded varied widely by primary care site. The automated method had a sensitivity of 97% (95% confidence interval [CI], 94- 99%) and a specificity of 90% (95% CI, 85-94%) for identifying implausible values compared to physician judgment, and identified 95% (weight) and 98% (height) of simulated errors. Discussion and Conclusion: This automated, flexible, and validated method for preparing large datasets will facilitate the use of pediatric EHR growth datasets for research.

UR - http://www.scopus.com/inward/record.url?scp=85032943891&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85032943891&partnerID=8YFLogxK

U2 - 10.1093/jamia/ocx037

DO - 10.1093/jamia/ocx037

M3 - Article

C2 - 28453637

AN - SCOPUS:85032943891

VL - 24

SP - 1080

EP - 1087

JO - Journal of the American Medical Informatics Association : JAMIA

JF - Journal of the American Medical Informatics Association : JAMIA

SN - 1067-5027

IS - 6

ER -