Examining the impacts of crash data aggregation on SPF estimation

Agnimitra Sengupta, Vikash Varun Gayah, Eric T. Donnell

Research output: Contribution to journalArticlepeer-review

1 Scopus citations


The American Association of State Highway and Transportation Officials’ Highway Safety Manual (HSM) includes a collection of safety performance functions (SPFs) or statistical models to estimate the expected crash frequency of roadway segments, intersections, and interchanges. These models are applied in several steps of the safety management process, including to screen the road network for opportunities to improve safety and to evaluate the performance of safety countermeasure deployments. The SPFs in the HSM are generally estimated using negative binomial regression modeling. In some instances, they are estimated using annual crash frequency and site-specific (e.g., traffic volume) data, while in other instances they are estimated using aggregate crash frequency and site-specific data. This paper explores the differences that result from estimating SPFs using aggregate versus disaggregate data using the same methods as those used to estimate the SPFs in the HSM. A synthetic dataset was first used to conduct these comparisons – these data were generated in a manner that is consistent with the properties of the negative binomial distribution. Then, an observational dataset from Pennsylvania was used to compare the SPFs from both aggregate and disaggregate data. The results show that SPFs estimated using the panel (disaggregate) data and aggregated data provide similar model coefficients, although some differences may sometimes arise. However, the overdispersion parameter obtained using each dataset can differ significantly. These differences result in systematic biases in calculations of expected crash frequency when Empirical Bayes adjustments are applied, which – as the paper demonstrates – could lead to different outcomes in a network screening exercise. Overall, these results reveal that aggregating crash data might result in biased SPF outputs and lead to inconsistent Empirical Bayes adjustments.

Original languageEnglish (US)
Article number106313
JournalAccident Analysis and Prevention
StatePublished - Sep 2021

All Science Journal Classification (ASJC) codes

  • Human Factors and Ergonomics
  • Safety, Risk, Reliability and Quality
  • Public Health, Environmental and Occupational Health


Dive into the research topics of 'Examining the impacts of crash data aggregation on SPF estimation'. Together they form a unique fingerprint.

Cite this