Applying classification techniques to remotely-collected program execution data

Murali Haran, Alan Karr, Alessandro Orso, Adam Porter, Ashish Sanil

Research output: Chapter in Book/Report/Conference proceedingConference contribution

37 Citations (Scopus)

Abstract

There is an increasing interest in techniques that support measurement and analysis of fielded software systems. One of the main goals of these techniques is to better understand how software actually behaves in the field. In particular, many of these techniques require a way to distinguish, in the field, failing from passing executions. So far, researchers and practitioners have only partially addressed this problem: they have simply assumed that program failure status is either obvious (i.e., the program crashes) or provided by an external source (e.g., the users). In this paper, we propose a technique for automatically classifying execution data, collected in the field, as coming from either passing or failing program runs. (Failing program runs are executions that terminate with a failure, such as a wrong outcome.) We use statistical learning algorithms to build the classification models. Our approach builds the models by analyzing executions performed in a controlled environment (e.g., test cases run in-house) and then uses the models to predict whether execution data produced by a fielded instance were generated by a passing or failing program execution. We also present results from an initial feasibility study, based on multiple versions of a software subject, in which we investigate several issues vital to the applicability of the technique. Finally, we present some lessons learned regarding the interplay between the reliability of classification models and the amount and type of data collected.

Original languageEnglish (US)
Title of host publicationESEC/FSE'05 - Proceedings of the Joint 10th European Software Engineering Conference (ESEC) and 13th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-13)
EditorsH. Gall
Pages146-155
Number of pages10
StatePublished - Dec 1 2005
EventESEC/FSE'05 - Joint 10th European Software Engineering Conference (ESEC) and 13th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-13) - Lisbon, Portugal
Duration: Sep 5 2005Sep 9 2005

Publication series

NameESEC/FSE'05 - Proceedings of the Joint 10th European Software Engineering Conference (ESEC) and 13th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-13)

Other

OtherESEC/FSE'05 - Joint 10th European Software Engineering Conference (ESEC) and 13th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-13)
CountryPortugal
CityLisbon
Period9/5/059/9/05

Fingerprint

Learning algorithms

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Haran, M., Karr, A., Orso, A., Porter, A., & Sanil, A. (2005). Applying classification techniques to remotely-collected program execution data. In H. Gall (Ed.), ESEC/FSE'05 - Proceedings of the Joint 10th European Software Engineering Conference (ESEC) and 13th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-13) (pp. 146-155). (ESEC/FSE'05 - Proceedings of the Joint 10th European Software Engineering Conference (ESEC) and 13th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-13)).
Haran, Murali ; Karr, Alan ; Orso, Alessandro ; Porter, Adam ; Sanil, Ashish. / Applying classification techniques to remotely-collected program execution data. ESEC/FSE'05 - Proceedings of the Joint 10th European Software Engineering Conference (ESEC) and 13th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-13). editor / H. Gall. 2005. pp. 146-155 (ESEC/FSE'05 - Proceedings of the Joint 10th European Software Engineering Conference (ESEC) and 13th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-13)).
@inproceedings{5f58a5a81a2a4b4ebe14de6c6a1189f2,
title = "Applying classification techniques to remotely-collected program execution data",
abstract = "There is an increasing interest in techniques that support measurement and analysis of fielded software systems. One of the main goals of these techniques is to better understand how software actually behaves in the field. In particular, many of these techniques require a way to distinguish, in the field, failing from passing executions. So far, researchers and practitioners have only partially addressed this problem: they have simply assumed that program failure status is either obvious (i.e., the program crashes) or provided by an external source (e.g., the users). In this paper, we propose a technique for automatically classifying execution data, collected in the field, as coming from either passing or failing program runs. (Failing program runs are executions that terminate with a failure, such as a wrong outcome.) We use statistical learning algorithms to build the classification models. Our approach builds the models by analyzing executions performed in a controlled environment (e.g., test cases run in-house) and then uses the models to predict whether execution data produced by a fielded instance were generated by a passing or failing program execution. We also present results from an initial feasibility study, based on multiple versions of a software subject, in which we investigate several issues vital to the applicability of the technique. Finally, we present some lessons learned regarding the interplay between the reliability of classification models and the amount and type of data collected.",
author = "Murali Haran and Alan Karr and Alessandro Orso and Adam Porter and Ashish Sanil",
year = "2005",
month = "12",
day = "1",
language = "English (US)",
isbn = "1595930140",
series = "ESEC/FSE'05 - Proceedings of the Joint 10th European Software Engineering Conference (ESEC) and 13th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-13)",
pages = "146--155",
editor = "H. Gall",
booktitle = "ESEC/FSE'05 - Proceedings of the Joint 10th European Software Engineering Conference (ESEC) and 13th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-13)",

}

Haran, M, Karr, A, Orso, A, Porter, A & Sanil, A 2005, Applying classification techniques to remotely-collected program execution data. in H Gall (ed.), ESEC/FSE'05 - Proceedings of the Joint 10th European Software Engineering Conference (ESEC) and 13th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-13). ESEC/FSE'05 - Proceedings of the Joint 10th European Software Engineering Conference (ESEC) and 13th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-13), pp. 146-155, ESEC/FSE'05 - Joint 10th European Software Engineering Conference (ESEC) and 13th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-13), Lisbon, Portugal, 9/5/05.

Applying classification techniques to remotely-collected program execution data. / Haran, Murali; Karr, Alan; Orso, Alessandro; Porter, Adam; Sanil, Ashish.

ESEC/FSE'05 - Proceedings of the Joint 10th European Software Engineering Conference (ESEC) and 13th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-13). ed. / H. Gall. 2005. p. 146-155 (ESEC/FSE'05 - Proceedings of the Joint 10th European Software Engineering Conference (ESEC) and 13th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-13)).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Applying classification techniques to remotely-collected program execution data

AU - Haran, Murali

AU - Karr, Alan

AU - Orso, Alessandro

AU - Porter, Adam

AU - Sanil, Ashish

PY - 2005/12/1

Y1 - 2005/12/1

N2 - There is an increasing interest in techniques that support measurement and analysis of fielded software systems. One of the main goals of these techniques is to better understand how software actually behaves in the field. In particular, many of these techniques require a way to distinguish, in the field, failing from passing executions. So far, researchers and practitioners have only partially addressed this problem: they have simply assumed that program failure status is either obvious (i.e., the program crashes) or provided by an external source (e.g., the users). In this paper, we propose a technique for automatically classifying execution data, collected in the field, as coming from either passing or failing program runs. (Failing program runs are executions that terminate with a failure, such as a wrong outcome.) We use statistical learning algorithms to build the classification models. Our approach builds the models by analyzing executions performed in a controlled environment (e.g., test cases run in-house) and then uses the models to predict whether execution data produced by a fielded instance were generated by a passing or failing program execution. We also present results from an initial feasibility study, based on multiple versions of a software subject, in which we investigate several issues vital to the applicability of the technique. Finally, we present some lessons learned regarding the interplay between the reliability of classification models and the amount and type of data collected.

AB - There is an increasing interest in techniques that support measurement and analysis of fielded software systems. One of the main goals of these techniques is to better understand how software actually behaves in the field. In particular, many of these techniques require a way to distinguish, in the field, failing from passing executions. So far, researchers and practitioners have only partially addressed this problem: they have simply assumed that program failure status is either obvious (i.e., the program crashes) or provided by an external source (e.g., the users). In this paper, we propose a technique for automatically classifying execution data, collected in the field, as coming from either passing or failing program runs. (Failing program runs are executions that terminate with a failure, such as a wrong outcome.) We use statistical learning algorithms to build the classification models. Our approach builds the models by analyzing executions performed in a controlled environment (e.g., test cases run in-house) and then uses the models to predict whether execution data produced by a fielded instance were generated by a passing or failing program execution. We also present results from an initial feasibility study, based on multiple versions of a software subject, in which we investigate several issues vital to the applicability of the technique. Finally, we present some lessons learned regarding the interplay between the reliability of classification models and the amount and type of data collected.

UR - http://www.scopus.com/inward/record.url?scp=32344438832&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=32344438832&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:32344438832

SN - 1595930140

SN - 9781595930149

T3 - ESEC/FSE'05 - Proceedings of the Joint 10th European Software Engineering Conference (ESEC) and 13th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-13)

SP - 146

EP - 155

BT - ESEC/FSE'05 - Proceedings of the Joint 10th European Software Engineering Conference (ESEC) and 13th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-13)

A2 - Gall, H.

ER -

Haran M, Karr A, Orso A, Porter A, Sanil A. Applying classification techniques to remotely-collected program execution data. In Gall H, editor, ESEC/FSE'05 - Proceedings of the Joint 10th European Software Engineering Conference (ESEC) and 13th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-13). 2005. p. 146-155. (ESEC/FSE'05 - Proceedings of the Joint 10th European Software Engineering Conference (ESEC) and 13th ACM SIGSOFT Symposium on the Foundations of Software Engineering (FSE-13)).