TY - JOUR
T1 - Statistical approach for automated weighting of datasets
T2 - Application to heat capacity data
AU - Zomorodpoosh, S.
AU - Bocklund, B.
AU - Obaied, A.
AU - Otis, R.
AU - Liu, Z. K.
AU - Roslyakova, I.
N1 - Funding Information:
S. Zomorodpoosh and I. Roslyakova acknowledge funding from the Collaborative Research Center “Superalloys Single Crystal” ( SFB TR-103 project T2 ) of the German Research Foundation (DFG) . B. Bocklund and Z.-K. Liu were supported by a NASA Space Technology Research Fellowship, USA , grant number 80NSSC18K116 . A. Obaied acknowledges funding from IMPRS-SurMat, Germany . A part of the research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration.
Publisher Copyright:
© 2020 Elsevier Ltd
PY - 2020/12
Y1 - 2020/12
N2 - An essential step in CALPHAD is assigning relative weights to different datasets, but there is no consensus as to the best approach regarding this issue. Currently, such an assignment of weights for experimental or first-principles data is performed manually based on the knowledge and experience of the modeler. Since the existing manual treatment is subjective and time consuming, manipulation of such data is rapidly advancing toward automated procedures through statistical and data mining tools. In the present study, we propose an automated approach to determine the weight of datasets based on the K-Fold Cross-Validation method, modified under the conditions that each fold is selected non-randomly and contains an unequal number of observations. This approach can be considered for researchers as a support tool to evaluate the reliability of each dataset involved in the CALPHAD modeling and quantify the impact of weighting by statistical analysis of the corresponding model. We demonstrate the efficacy of this method through the evaluation of heat capacity data of fcc nickel, hcp magnesium, and bcc iron.
AB - An essential step in CALPHAD is assigning relative weights to different datasets, but there is no consensus as to the best approach regarding this issue. Currently, such an assignment of weights for experimental or first-principles data is performed manually based on the knowledge and experience of the modeler. Since the existing manual treatment is subjective and time consuming, manipulation of such data is rapidly advancing toward automated procedures through statistical and data mining tools. In the present study, we propose an automated approach to determine the weight of datasets based on the K-Fold Cross-Validation method, modified under the conditions that each fold is selected non-randomly and contains an unequal number of observations. This approach can be considered for researchers as a support tool to evaluate the reliability of each dataset involved in the CALPHAD modeling and quantify the impact of weighting by statistical analysis of the corresponding model. We demonstrate the efficacy of this method through the evaluation of heat capacity data of fcc nickel, hcp magnesium, and bcc iron.
UR - http://www.scopus.com/inward/record.url?scp=85089347771&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85089347771&partnerID=8YFLogxK
U2 - 10.1016/j.calphad.2020.101994
DO - 10.1016/j.calphad.2020.101994
M3 - Article
AN - SCOPUS:85089347771
VL - 71
JO - Calphad: Computer Coupling of Phase Diagrams and Thermochemistry
JF - Calphad: Computer Coupling of Phase Diagrams and Thermochemistry
SN - 0364-5916
M1 - 101994
ER -