Assessing Scientific Practices Using Machine-Learning Methods: How Closely Do They Match Clinical Interview Performance?

Elizabeth P. Beggrow, Minsu Ha, Ross H. Nehm, Dennis Keith Pearl, William J. Boone

Research output: Contribution to journalArticle

20 Citations (Scopus)

Abstract

The landscape of science education is being transformed by the new Framework for Science Education (National Research Council, A framework for K-12 science education: practices, crosscutting concepts, and core ideas. The National Academies Press, Washington, DC, 2012), which emphasizes the centrality of scientific practices-such as explanation, argumentation, and communication-in science teaching, learning, and assessment. A major challenge facing the field of science education is developing assessment tools that are capable of validly and efficiently evaluating these practices. Our study examined the efficacy of a free, open-source machine-learning tool for evaluating the quality of students' written explanations of the causes of evolutionary change relative to three other approaches: (1) human-scored written explanations, (2) a multiple-choice test, and (3) clinical oral interviews. A large sample of undergraduates (n = 104) exposed to varying amounts of evolution content completed all three assessments: a clinical oral interview, a written open-response assessment, and a multiple-choice test. Rasch analysis was used to compute linear person measures and linear item measures on a single logit scale. We found that the multiple-choice test displayed poor person and item fit (mean square outfit >1.3), while both oral interview measures and computer-generated written response measures exhibited acceptable fit (average mean square outfit for interview: person 0.97, item 0.97; computer: person 1.03, item 1.06). Multiple-choice test measures were more weakly associated with interview measures (r = 0.35) than the computer-scored explanation measures (r = 0.63). Overall, Rasch analysis indicated that computer-scored written explanation measures (1) have the strongest correspondence to oral interview measures; (2) are capable of capturing students' normative scientific and naive ideas as accurately as human-scored explanations, and (3) more validly detect understanding than the multiple-choice assessment. These findings demonstrate the great potential of machine-learning tools for assessing key scientific practices highlighted in the new Framework for Science Education.

Original languageEnglish (US)
Pages (from-to)160-182
Number of pages23
JournalJournal of Science Education and Technology
Volume23
Issue number1
DOIs
StatePublished - Feb 1 2014

Fingerprint

learning method
Learning systems
Education
interview
science
performance
human being
education
Students
learning
argumentation
Teaching
academy
student
Communication
cause
communication

All Science Journal Classification (ASJC) codes

  • Education
  • Engineering(all)

Cite this

@article{78e1b58366ab4fb0a5097dccb8dcfcc8,
title = "Assessing Scientific Practices Using Machine-Learning Methods: How Closely Do They Match Clinical Interview Performance?",
abstract = "The landscape of science education is being transformed by the new Framework for Science Education (National Research Council, A framework for K-12 science education: practices, crosscutting concepts, and core ideas. The National Academies Press, Washington, DC, 2012), which emphasizes the centrality of scientific practices-such as explanation, argumentation, and communication-in science teaching, learning, and assessment. A major challenge facing the field of science education is developing assessment tools that are capable of validly and efficiently evaluating these practices. Our study examined the efficacy of a free, open-source machine-learning tool for evaluating the quality of students' written explanations of the causes of evolutionary change relative to three other approaches: (1) human-scored written explanations, (2) a multiple-choice test, and (3) clinical oral interviews. A large sample of undergraduates (n = 104) exposed to varying amounts of evolution content completed all three assessments: a clinical oral interview, a written open-response assessment, and a multiple-choice test. Rasch analysis was used to compute linear person measures and linear item measures on a single logit scale. We found that the multiple-choice test displayed poor person and item fit (mean square outfit >1.3), while both oral interview measures and computer-generated written response measures exhibited acceptable fit (average mean square outfit for interview: person 0.97, item 0.97; computer: person 1.03, item 1.06). Multiple-choice test measures were more weakly associated with interview measures (r = 0.35) than the computer-scored explanation measures (r = 0.63). Overall, Rasch analysis indicated that computer-scored written explanation measures (1) have the strongest correspondence to oral interview measures; (2) are capable of capturing students' normative scientific and naive ideas as accurately as human-scored explanations, and (3) more validly detect understanding than the multiple-choice assessment. These findings demonstrate the great potential of machine-learning tools for assessing key scientific practices highlighted in the new Framework for Science Education.",
author = "Beggrow, {Elizabeth P.} and Minsu Ha and Nehm, {Ross H.} and Pearl, {Dennis Keith} and Boone, {William J.}",
year = "2014",
month = "2",
day = "1",
doi = "10.1007/s10956-013-9461-9",
language = "English (US)",
volume = "23",
pages = "160--182",
journal = "Journal of Science Education and Technology",
issn = "1059-0145",
publisher = "Springer Netherlands",
number = "1",

}

Assessing Scientific Practices Using Machine-Learning Methods : How Closely Do They Match Clinical Interview Performance? / Beggrow, Elizabeth P.; Ha, Minsu; Nehm, Ross H.; Pearl, Dennis Keith; Boone, William J.

In: Journal of Science Education and Technology, Vol. 23, No. 1, 01.02.2014, p. 160-182.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Assessing Scientific Practices Using Machine-Learning Methods

T2 - How Closely Do They Match Clinical Interview Performance?

AU - Beggrow, Elizabeth P.

AU - Ha, Minsu

AU - Nehm, Ross H.

AU - Pearl, Dennis Keith

AU - Boone, William J.

PY - 2014/2/1

Y1 - 2014/2/1

N2 - The landscape of science education is being transformed by the new Framework for Science Education (National Research Council, A framework for K-12 science education: practices, crosscutting concepts, and core ideas. The National Academies Press, Washington, DC, 2012), which emphasizes the centrality of scientific practices-such as explanation, argumentation, and communication-in science teaching, learning, and assessment. A major challenge facing the field of science education is developing assessment tools that are capable of validly and efficiently evaluating these practices. Our study examined the efficacy of a free, open-source machine-learning tool for evaluating the quality of students' written explanations of the causes of evolutionary change relative to three other approaches: (1) human-scored written explanations, (2) a multiple-choice test, and (3) clinical oral interviews. A large sample of undergraduates (n = 104) exposed to varying amounts of evolution content completed all three assessments: a clinical oral interview, a written open-response assessment, and a multiple-choice test. Rasch analysis was used to compute linear person measures and linear item measures on a single logit scale. We found that the multiple-choice test displayed poor person and item fit (mean square outfit >1.3), while both oral interview measures and computer-generated written response measures exhibited acceptable fit (average mean square outfit for interview: person 0.97, item 0.97; computer: person 1.03, item 1.06). Multiple-choice test measures were more weakly associated with interview measures (r = 0.35) than the computer-scored explanation measures (r = 0.63). Overall, Rasch analysis indicated that computer-scored written explanation measures (1) have the strongest correspondence to oral interview measures; (2) are capable of capturing students' normative scientific and naive ideas as accurately as human-scored explanations, and (3) more validly detect understanding than the multiple-choice assessment. These findings demonstrate the great potential of machine-learning tools for assessing key scientific practices highlighted in the new Framework for Science Education.

AB - The landscape of science education is being transformed by the new Framework for Science Education (National Research Council, A framework for K-12 science education: practices, crosscutting concepts, and core ideas. The National Academies Press, Washington, DC, 2012), which emphasizes the centrality of scientific practices-such as explanation, argumentation, and communication-in science teaching, learning, and assessment. A major challenge facing the field of science education is developing assessment tools that are capable of validly and efficiently evaluating these practices. Our study examined the efficacy of a free, open-source machine-learning tool for evaluating the quality of students' written explanations of the causes of evolutionary change relative to three other approaches: (1) human-scored written explanations, (2) a multiple-choice test, and (3) clinical oral interviews. A large sample of undergraduates (n = 104) exposed to varying amounts of evolution content completed all three assessments: a clinical oral interview, a written open-response assessment, and a multiple-choice test. Rasch analysis was used to compute linear person measures and linear item measures on a single logit scale. We found that the multiple-choice test displayed poor person and item fit (mean square outfit >1.3), while both oral interview measures and computer-generated written response measures exhibited acceptable fit (average mean square outfit for interview: person 0.97, item 0.97; computer: person 1.03, item 1.06). Multiple-choice test measures were more weakly associated with interview measures (r = 0.35) than the computer-scored explanation measures (r = 0.63). Overall, Rasch analysis indicated that computer-scored written explanation measures (1) have the strongest correspondence to oral interview measures; (2) are capable of capturing students' normative scientific and naive ideas as accurately as human-scored explanations, and (3) more validly detect understanding than the multiple-choice assessment. These findings demonstrate the great potential of machine-learning tools for assessing key scientific practices highlighted in the new Framework for Science Education.

UR - http://www.scopus.com/inward/record.url?scp=84895910194&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84895910194&partnerID=8YFLogxK

U2 - 10.1007/s10956-013-9461-9

DO - 10.1007/s10956-013-9461-9

M3 - Article

AN - SCOPUS:84895910194

VL - 23

SP - 160

EP - 182

JO - Journal of Science Education and Technology

JF - Journal of Science Education and Technology

SN - 1059-0145

IS - 1

ER -