### Abstract

Count regression models have been applied widely in traffic safety research to estimate expected crash frequencies on road segments. Data mining algorithms, such as classification and regression trees, have recently been introduced into the field to overcome some of the assumptions associated with statistical models. However, these data-driven algorithms usually provide non-parametric output, making it difficult to draw statistical inference or to evaluate how independent variables are associated with expected crash frequencies. In this paper, the model-based recursive partitioning (MOB) algorithm is applied in a crash frequency application. The algorithm incorporates the concept of recursive partitioning data in tree models and develops user-defined statistical models as outputs. The objective of this paper is to explore the potential of the MOB algorithm as a methodological alternative to parametric modeling methods in crash frequency analysis. To accomplish the objective, a standard negative binomial (NB) regression model, a NB model developed using the MOB algorithm, adjusted NB models which incorporate variables identified by the MOB algorithm, and a random parameters NB model are compared using 8 years of data collected from two-lane rural highways in Pennsylvania. The models are compared in terms of data fitness, sign and magnitude of statistical association between the independent and dependent variables, and predictive power. The results show that the MOB-NB model yields better data fitness than other NB models, and provides similar performance to the RPNB model, suggesting that the MOB-NB model may be capturing unobserved heterogeneity by dividing the data into subgroups. The presence of a passing zone and posted speed limit are two covariates identified by the MOB algorithm that differentiate variable effects among subgroups. In addition, the MOB-NB model provides the highest prediction accuracy based on the training and test data sets, although the difference among models is small. The comparison results reveal that the MOB algorithm is a promising alternative to identify covariates, evaluate variable associations and instability, and make predictions in a crash frequency context.

Original language | English (US) |
---|---|

Article number | 105274 |

Journal | Accident Analysis and Prevention |

Volume | 132 |

DOIs | |

State | Published - Nov 2019 |

### Fingerprint

### All Science Journal Classification (ASJC) codes

- Human Factors and Ergonomics
- Safety, Risk, Reliability and Quality
- Public Health, Environmental and Occupational Health

### Cite this

}

**Application of a model-based recursive partitioning algorithm to predict crash frequency.** / Tang, Houjun; Donnell, Eric T.

Research output: Contribution to journal › Article

TY - JOUR

T1 - Application of a model-based recursive partitioning algorithm to predict crash frequency

AU - Tang, Houjun

AU - Donnell, Eric T.

PY - 2019/11

Y1 - 2019/11

N2 - Count regression models have been applied widely in traffic safety research to estimate expected crash frequencies on road segments. Data mining algorithms, such as classification and regression trees, have recently been introduced into the field to overcome some of the assumptions associated with statistical models. However, these data-driven algorithms usually provide non-parametric output, making it difficult to draw statistical inference or to evaluate how independent variables are associated with expected crash frequencies. In this paper, the model-based recursive partitioning (MOB) algorithm is applied in a crash frequency application. The algorithm incorporates the concept of recursive partitioning data in tree models and develops user-defined statistical models as outputs. The objective of this paper is to explore the potential of the MOB algorithm as a methodological alternative to parametric modeling methods in crash frequency analysis. To accomplish the objective, a standard negative binomial (NB) regression model, a NB model developed using the MOB algorithm, adjusted NB models which incorporate variables identified by the MOB algorithm, and a random parameters NB model are compared using 8 years of data collected from two-lane rural highways in Pennsylvania. The models are compared in terms of data fitness, sign and magnitude of statistical association between the independent and dependent variables, and predictive power. The results show that the MOB-NB model yields better data fitness than other NB models, and provides similar performance to the RPNB model, suggesting that the MOB-NB model may be capturing unobserved heterogeneity by dividing the data into subgroups. The presence of a passing zone and posted speed limit are two covariates identified by the MOB algorithm that differentiate variable effects among subgroups. In addition, the MOB-NB model provides the highest prediction accuracy based on the training and test data sets, although the difference among models is small. The comparison results reveal that the MOB algorithm is a promising alternative to identify covariates, evaluate variable associations and instability, and make predictions in a crash frequency context.

AB - Count regression models have been applied widely in traffic safety research to estimate expected crash frequencies on road segments. Data mining algorithms, such as classification and regression trees, have recently been introduced into the field to overcome some of the assumptions associated with statistical models. However, these data-driven algorithms usually provide non-parametric output, making it difficult to draw statistical inference or to evaluate how independent variables are associated with expected crash frequencies. In this paper, the model-based recursive partitioning (MOB) algorithm is applied in a crash frequency application. The algorithm incorporates the concept of recursive partitioning data in tree models and develops user-defined statistical models as outputs. The objective of this paper is to explore the potential of the MOB algorithm as a methodological alternative to parametric modeling methods in crash frequency analysis. To accomplish the objective, a standard negative binomial (NB) regression model, a NB model developed using the MOB algorithm, adjusted NB models which incorporate variables identified by the MOB algorithm, and a random parameters NB model are compared using 8 years of data collected from two-lane rural highways in Pennsylvania. The models are compared in terms of data fitness, sign and magnitude of statistical association between the independent and dependent variables, and predictive power. The results show that the MOB-NB model yields better data fitness than other NB models, and provides similar performance to the RPNB model, suggesting that the MOB-NB model may be capturing unobserved heterogeneity by dividing the data into subgroups. The presence of a passing zone and posted speed limit are two covariates identified by the MOB algorithm that differentiate variable effects among subgroups. In addition, the MOB-NB model provides the highest prediction accuracy based on the training and test data sets, although the difference among models is small. The comparison results reveal that the MOB algorithm is a promising alternative to identify covariates, evaluate variable associations and instability, and make predictions in a crash frequency context.

UR - http://www.scopus.com/inward/record.url?scp=85070905446&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85070905446&partnerID=8YFLogxK

U2 - 10.1016/j.aap.2019.105274

DO - 10.1016/j.aap.2019.105274

M3 - Article

C2 - 31446099

AN - SCOPUS:85070905446

VL - 132

JO - Accident Analysis and Prevention

JF - Accident Analysis and Prevention

SN - 0001-4575

M1 - 105274

ER -