TY - GEN
T1 - Learning to Describe Player Form in the MLB
AU - Heaton, Connor
AU - Mitra, Prasenjit
N1 - Publisher Copyright:
© 2022, The Author(s), under exclusive license to Springer Nature Switzerland AG.
PY - 2022
Y1 - 2022
N2 - Major League Baseball (MLB) has a storied history of using statistics to better understand and discuss the game of baseball, with an entire discipline of statistics dedicated to the craft, known as sabermetrics. At their core, all sabermetrics seek to quantify some aspect of the game, often a specific aspect of a player’s skill set - such as a batter’s ability to drive in runs (RBI) or a pitcher’s ability to keep batters from reaching base (WHIP). While useful, such statistics are fundamentally limited by the fact that they are derived from an account of what happened on the field, not how it happened. As a first step towards alleviating this shortcoming, we present a novel, contrastive learning-based framework for describing player form in the MLB. We use form to refer to the way in which a player has impacted the course of play in their recent appearances. Concretely, a player’s form is described by a 72-dimensional vector. By comparing clusters of players resulting from our form representations and those resulting from traditional sabermetrics, we demonstrate that our form representations contain information about how players impact the course of play, not present in traditional, publicly available statistics. We believe these embeddings could be utilized to predict both in-game and game-level events, such as the result of an at-bat or the winner of a game.
AB - Major League Baseball (MLB) has a storied history of using statistics to better understand and discuss the game of baseball, with an entire discipline of statistics dedicated to the craft, known as sabermetrics. At their core, all sabermetrics seek to quantify some aspect of the game, often a specific aspect of a player’s skill set - such as a batter’s ability to drive in runs (RBI) or a pitcher’s ability to keep batters from reaching base (WHIP). While useful, such statistics are fundamentally limited by the fact that they are derived from an account of what happened on the field, not how it happened. As a first step towards alleviating this shortcoming, we present a novel, contrastive learning-based framework for describing player form in the MLB. We use form to refer to the way in which a player has impacted the course of play in their recent appearances. Concretely, a player’s form is described by a 72-dimensional vector. By comparing clusters of players resulting from our form representations and those resulting from traditional sabermetrics, we demonstrate that our form representations contain information about how players impact the course of play, not present in traditional, publicly available statistics. We believe these embeddings could be utilized to predict both in-game and game-level events, such as the result of an at-bat or the winner of a game.
UR - http://www.scopus.com/inward/record.url?scp=85130233206&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85130233206&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-02044-5_8
DO - 10.1007/978-3-031-02044-5_8
M3 - Conference contribution
AN - SCOPUS:85130233206
SN - 9783031020438
T3 - Communications in Computer and Information Science
SP - 93
EP - 102
BT - Machine Learning and Data Mining for Sports Analytics - 8th International Workshop, MLSA 2021, Revised Selected Papers
A2 - Brefeld, Ulf
A2 - Davis, Jesse
A2 - Van Haaren, Jan
A2 - Zimmermann, Albrecht
PB - Springer Science and Business Media Deutschland GmbH
T2 - 8th International Workshop on Machine Learning and Data Mining for Sports Analytics, MLSA 2021
Y2 - 13 September 2021 through 13 September 2021
ER -