TY - GEN
T1 - Natural Language Processing for Theoretical Framework Selection in Engineering Education Research
AU - Berdanier, Catherine G.P.
AU - McComb, Christopher M.
AU - Zhu, Weiwei
N1 - Publisher Copyright:
© 2020 IEEE.
PY - 2020/10/21
Y1 - 2020/10/21
N2 - This research paper presents recent work exploring the power of natural language processing (NLP) methods applied to qualitative engineering education data. As NLP and other machine learning methods are developed for qualitative data, it is important to prioritize the role that theory plays in rigorous qualitative research, where the selection of a theoretical framework serves as the lens by which the research project is framed, results are analyzed, and findings are brought to light. Indeed, the view from a different theoretical lens can highlight novel or new findings. In this work, we seek to explore the viability of NLP methods for helping researchers select appropriate frameworks. In this work, we present our method to train a Python-based NLP algorithm to analyze an existing data set of interview data using one theoretical lens: Community of Practice theory, an oft-used theory in graduate education literature, which is the topic of the interview corpus to investigate. We present and test two methods for developing dictionaries by which to train the algorithm: An expert-curated dictionary and a machine-generated dictionary compiled by mining the theoretical framework sections of published literature employing Community of Practice theory. We apply these two dictionaries to analyze a corpus of 54 interview transcripts investigating graduate engineering attrition. The high dimensional data from NLP can be compared using Principal Component Analysis (PCA) visualization and pairwise distance plots to determine which method results in the most well-defined structure indicating agreement between the dictionary and the corpus of interview transcripts. In the discussion, we highlight opportunities for using these automated methods to help researchers with qualitative data analysis and warn against potential dangers and ethical ramifications for using machine learning and NLP for social science data. This work will have impact on the disciplinary communities working to embed computational language-based methods into engineering education research, and for the qualitative methods communities across social science and education disciplines.
AB - This research paper presents recent work exploring the power of natural language processing (NLP) methods applied to qualitative engineering education data. As NLP and other machine learning methods are developed for qualitative data, it is important to prioritize the role that theory plays in rigorous qualitative research, where the selection of a theoretical framework serves as the lens by which the research project is framed, results are analyzed, and findings are brought to light. Indeed, the view from a different theoretical lens can highlight novel or new findings. In this work, we seek to explore the viability of NLP methods for helping researchers select appropriate frameworks. In this work, we present our method to train a Python-based NLP algorithm to analyze an existing data set of interview data using one theoretical lens: Community of Practice theory, an oft-used theory in graduate education literature, which is the topic of the interview corpus to investigate. We present and test two methods for developing dictionaries by which to train the algorithm: An expert-curated dictionary and a machine-generated dictionary compiled by mining the theoretical framework sections of published literature employing Community of Practice theory. We apply these two dictionaries to analyze a corpus of 54 interview transcripts investigating graduate engineering attrition. The high dimensional data from NLP can be compared using Principal Component Analysis (PCA) visualization and pairwise distance plots to determine which method results in the most well-defined structure indicating agreement between the dictionary and the corpus of interview transcripts. In the discussion, we highlight opportunities for using these automated methods to help researchers with qualitative data analysis and warn against potential dangers and ethical ramifications for using machine learning and NLP for social science data. This work will have impact on the disciplinary communities working to embed computational language-based methods into engineering education research, and for the qualitative methods communities across social science and education disciplines.
UR - http://www.scopus.com/inward/record.url?scp=85098561036&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85098561036&partnerID=8YFLogxK
U2 - 10.1109/FIE44824.2020.9274115
DO - 10.1109/FIE44824.2020.9274115
M3 - Conference contribution
AN - SCOPUS:85098561036
T3 - Proceedings - Frontiers in Education Conference, FIE
BT - 2020 IEEE Frontiers in Education Conference, FIE 2020 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2020 IEEE Frontiers in Education Conference, FIE 2020
Y2 - 21 October 2020 through 24 October 2020
ER -