Count regression models have been applied widely in traffic safety research to estimate expected crash frequencies on road segments. Data mining algorithms, such as classification and regression trees, have recently been introduced into the field to overcome some of the assumptions associated with statistical models. However, these data-driven algorithms usually provide non-parametric output, making it difficult to draw statistical inference or to evaluate how independent variables are associated with expected crash frequencies. In this paper, the model-based recursive partitioning (MOB) algorithm is applied in a crash frequency application. The algorithm incorporates the concept of recursive partitioning data in tree models and develops user-defined statistical models as outputs. The objective of this paper is to explore the potential of the MOB algorithm as a methodological alternative to parametric modeling methods in crash frequency analysis. To accomplish the objective, a standard negative binomial (NB) regression model, a NB model developed using the MOB algorithm, adjusted NB models which incorporate variables identified by the MOB algorithm, and a random parameters NB model are compared using 8 years of data collected from two-lane rural highways in Pennsylvania. The models are compared in terms of data fitness, sign and magnitude of statistical association between the independent and dependent variables, and predictive power. The results show that the MOB-NB model yields better data fitness than other NB models, and provides similar performance to the RPNB model, suggesting that the MOB-NB model may be capturing unobserved heterogeneity by dividing the data into subgroups. The presence of a passing zone and posted speed limit are two covariates identified by the MOB algorithm that differentiate variable effects among subgroups. In addition, the MOB-NB model provides the highest prediction accuracy based on the training and test data sets, although the difference among models is small. The comparison results reveal that the MOB algorithm is a promising alternative to identify covariates, evaluate variable associations and instability, and make predictions in a crash frequency context.
All Science Journal Classification (ASJC) codes
- Human Factors and Ergonomics
- Safety, Risk, Reliability and Quality
- Public Health, Environmental and Occupational Health