Evaluating the predictive power of an SPF for two-lane rural roads with random parameters on out-of-sample observations

Research output: Contribution to journalArticle

Abstract

Negative binomial (NB) regression is among the most common statistical modeling methods used to model crash frequencies due to its simple functional form and ability to handle over-dispersion commonly found in crash data. However, a drawback of this approach is that regression parameters are assumed to be the same across observations, which could contribute to biased parameter estimates. To alleviate this concern, the random parameters negative binomial (RPNB) model was recently proposed, which allows regression parameters to differ across observations following some known distribution. The resulting coefficients should be less biased, and thus the RPNB approach is believed to provide a more accurate relationship between independent variables and expected crash frequency. However, the prediction accuracy of the RPNB model relative to the standard NB model has not been thoroughly evaluated, particularly with respect to out-of-sample observations for which unique random parameters cannot be estimated. In this paper, the predictive power of the RPNB and NB models are examined using two-lane rural highway data from three engineering Districts in Pennsylvania. Multiple evaluation metrics are applied—root-mean-square error (RMSE) and mean absolute error (MAE), coefficients from calibration functions and cumulative residual (CURE) plots—to assess each model type. The results show that the RPNB model outperforms the NB model when applied to within sample observations (i.e., those used to estimate the model) by making use of the observation-specific coefficients. However, the predictive power of the RPNB model appears to be similar to or slightly less precise than the traditional NB model when applied to out-of-sample observations. Since the RPNB model is estimated using a simulation-based approach, sensitivity tests were also performed to see how the parameter estimates change with the number of Halton draws used to perform the simulation. For the sample sizes used in this paper, the estimates were fairly insensitive when more than 50 Halton draws were used. The findings suggest that the RPNB model is more reliable when applied to the same set of sites that were used to estimate the model but might not be as robust as the traditional NB model when applied to other sites.

Original languageEnglish (US)
Article number105275
JournalAccident Analysis and Prevention
Volume132
DOIs
StatePublished - Nov 1 2019

Fingerprint

Rural roads
Statistical Models
road
regression
Mean square error
Sample Size
Calibration
simulation

All Science Journal Classification (ASJC) codes

  • Human Factors and Ergonomics
  • Safety, Risk, Reliability and Quality
  • Public Health, Environmental and Occupational Health

Cite this

@article{9c6acce54cb84b0189434123b4de5d0f,
title = "Evaluating the predictive power of an SPF for two-lane rural roads with random parameters on out-of-sample observations",
abstract = "Negative binomial (NB) regression is among the most common statistical modeling methods used to model crash frequencies due to its simple functional form and ability to handle over-dispersion commonly found in crash data. However, a drawback of this approach is that regression parameters are assumed to be the same across observations, which could contribute to biased parameter estimates. To alleviate this concern, the random parameters negative binomial (RPNB) model was recently proposed, which allows regression parameters to differ across observations following some known distribution. The resulting coefficients should be less biased, and thus the RPNB approach is believed to provide a more accurate relationship between independent variables and expected crash frequency. However, the prediction accuracy of the RPNB model relative to the standard NB model has not been thoroughly evaluated, particularly with respect to out-of-sample observations for which unique random parameters cannot be estimated. In this paper, the predictive power of the RPNB and NB models are examined using two-lane rural highway data from three engineering Districts in Pennsylvania. Multiple evaluation metrics are applied—root-mean-square error (RMSE) and mean absolute error (MAE), coefficients from calibration functions and cumulative residual (CURE) plots—to assess each model type. The results show that the RPNB model outperforms the NB model when applied to within sample observations (i.e., those used to estimate the model) by making use of the observation-specific coefficients. However, the predictive power of the RPNB model appears to be similar to or slightly less precise than the traditional NB model when applied to out-of-sample observations. Since the RPNB model is estimated using a simulation-based approach, sensitivity tests were also performed to see how the parameter estimates change with the number of Halton draws used to perform the simulation. For the sample sizes used in this paper, the estimates were fairly insensitive when more than 50 Halton draws were used. The findings suggest that the RPNB model is more reliable when applied to the same set of sites that were used to estimate the model but might not be as robust as the traditional NB model when applied to other sites.",
author = "Houjun Tang and Gayah, {Vikash Varun} and Donnell, {Eric Todd}",
year = "2019",
month = "11",
day = "1",
doi = "10.1016/j.aap.2019.105275",
language = "English (US)",
volume = "132",
journal = "Accident Analysis and Prevention",
issn = "0001-4575",
publisher = "Elsevier Limited",

}

TY - JOUR

T1 - Evaluating the predictive power of an SPF for two-lane rural roads with random parameters on out-of-sample observations

AU - Tang, Houjun

AU - Gayah, Vikash Varun

AU - Donnell, Eric Todd

PY - 2019/11/1

Y1 - 2019/11/1

N2 - Negative binomial (NB) regression is among the most common statistical modeling methods used to model crash frequencies due to its simple functional form and ability to handle over-dispersion commonly found in crash data. However, a drawback of this approach is that regression parameters are assumed to be the same across observations, which could contribute to biased parameter estimates. To alleviate this concern, the random parameters negative binomial (RPNB) model was recently proposed, which allows regression parameters to differ across observations following some known distribution. The resulting coefficients should be less biased, and thus the RPNB approach is believed to provide a more accurate relationship between independent variables and expected crash frequency. However, the prediction accuracy of the RPNB model relative to the standard NB model has not been thoroughly evaluated, particularly with respect to out-of-sample observations for which unique random parameters cannot be estimated. In this paper, the predictive power of the RPNB and NB models are examined using two-lane rural highway data from three engineering Districts in Pennsylvania. Multiple evaluation metrics are applied—root-mean-square error (RMSE) and mean absolute error (MAE), coefficients from calibration functions and cumulative residual (CURE) plots—to assess each model type. The results show that the RPNB model outperforms the NB model when applied to within sample observations (i.e., those used to estimate the model) by making use of the observation-specific coefficients. However, the predictive power of the RPNB model appears to be similar to or slightly less precise than the traditional NB model when applied to out-of-sample observations. Since the RPNB model is estimated using a simulation-based approach, sensitivity tests were also performed to see how the parameter estimates change with the number of Halton draws used to perform the simulation. For the sample sizes used in this paper, the estimates were fairly insensitive when more than 50 Halton draws were used. The findings suggest that the RPNB model is more reliable when applied to the same set of sites that were used to estimate the model but might not be as robust as the traditional NB model when applied to other sites.

AB - Negative binomial (NB) regression is among the most common statistical modeling methods used to model crash frequencies due to its simple functional form and ability to handle over-dispersion commonly found in crash data. However, a drawback of this approach is that regression parameters are assumed to be the same across observations, which could contribute to biased parameter estimates. To alleviate this concern, the random parameters negative binomial (RPNB) model was recently proposed, which allows regression parameters to differ across observations following some known distribution. The resulting coefficients should be less biased, and thus the RPNB approach is believed to provide a more accurate relationship between independent variables and expected crash frequency. However, the prediction accuracy of the RPNB model relative to the standard NB model has not been thoroughly evaluated, particularly with respect to out-of-sample observations for which unique random parameters cannot be estimated. In this paper, the predictive power of the RPNB and NB models are examined using two-lane rural highway data from three engineering Districts in Pennsylvania. Multiple evaluation metrics are applied—root-mean-square error (RMSE) and mean absolute error (MAE), coefficients from calibration functions and cumulative residual (CURE) plots—to assess each model type. The results show that the RPNB model outperforms the NB model when applied to within sample observations (i.e., those used to estimate the model) by making use of the observation-specific coefficients. However, the predictive power of the RPNB model appears to be similar to or slightly less precise than the traditional NB model when applied to out-of-sample observations. Since the RPNB model is estimated using a simulation-based approach, sensitivity tests were also performed to see how the parameter estimates change with the number of Halton draws used to perform the simulation. For the sample sizes used in this paper, the estimates were fairly insensitive when more than 50 Halton draws were used. The findings suggest that the RPNB model is more reliable when applied to the same set of sites that were used to estimate the model but might not be as robust as the traditional NB model when applied to other sites.

UR - http://www.scopus.com/inward/record.url?scp=85071165867&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071165867&partnerID=8YFLogxK

U2 - 10.1016/j.aap.2019.105275

DO - 10.1016/j.aap.2019.105275

M3 - Article

C2 - 31465933

AN - SCOPUS:85071165867

VL - 132

JO - Accident Analysis and Prevention

JF - Accident Analysis and Prevention

SN - 0001-4575

M1 - 105275

ER -