A Bayesian Sampling Method for Product Feature Extraction from Large-Scale Textual Data

Sunghoon Lim, Conrad S. Tucker

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

The authors of this work propose an algorithm that determines optimal search keyword combinations for querying online product data sources in order to minimize identification errors during the product feature extraction process. Data-driven product design methodologies based on acquiring and mining online product-feature-related data are presented with two fundamental challenges: (1) determining optimal search keywords that result in relevant product related data being returned and (2) determining how many search keywords are sufficient to minimize identification errors during the product feature extraction process. These challenges exist because online data, which is primarily textual in nature, may violate several statistical assumptions relating to the independence and identical distribution of samples relating to a query. Existing design methodologies have predetermined search terms that are used to acquire textual data online, which makes the resulting data acquired, a function of the quality of the search term(s) themselves. Furthermore, the lack of independence and identical distribution of text data from online sources impacts the quality of the acquired data. For example, a designer may search for a product feature using the term screen which may return relevant results such as the screen size is just perfect but may also contain irrelevant noise such as researchers should really screen for this type of error.A text mining algorithm is introduced to determine the optimal terms without labeled training data that would maximize the veracity of the data acquired to make a valid conclusion. A case study involving real-world smartphones is used to validate the proposed methodology.

Original languageEnglish (US)
Article number061403
JournalJournal of Mechanical Design, Transactions Of the ASME
Volume138
Issue number6
DOIs
StatePublished - Jun 1 2016

Fingerprint

Feature extraction
Sampling
Smartphones
Product design

All Science Journal Classification (ASJC) codes

  • Mechanics of Materials
  • Mechanical Engineering
  • Computer Science Applications
  • Computer Graphics and Computer-Aided Design

Cite this

@article{42e24c4bb89d4c9093f9915bfde501c2,
title = "A Bayesian Sampling Method for Product Feature Extraction from Large-Scale Textual Data",
abstract = "The authors of this work propose an algorithm that determines optimal search keyword combinations for querying online product data sources in order to minimize identification errors during the product feature extraction process. Data-driven product design methodologies based on acquiring and mining online product-feature-related data are presented with two fundamental challenges: (1) determining optimal search keywords that result in relevant product related data being returned and (2) determining how many search keywords are sufficient to minimize identification errors during the product feature extraction process. These challenges exist because online data, which is primarily textual in nature, may violate several statistical assumptions relating to the independence and identical distribution of samples relating to a query. Existing design methodologies have predetermined search terms that are used to acquire textual data online, which makes the resulting data acquired, a function of the quality of the search term(s) themselves. Furthermore, the lack of independence and identical distribution of text data from online sources impacts the quality of the acquired data. For example, a designer may search for a product feature using the term screen which may return relevant results such as the screen size is just perfect but may also contain irrelevant noise such as researchers should really screen for this type of error.A text mining algorithm is introduced to determine the optimal terms without labeled training data that would maximize the veracity of the data acquired to make a valid conclusion. A case study involving real-world smartphones is used to validate the proposed methodology.",
author = "Sunghoon Lim and Tucker, {Conrad S.}",
year = "2016",
month = "6",
day = "1",
doi = "10.1115/1.4033238",
language = "English (US)",
volume = "138",
journal = "Journal of Mechanical Design - Transactions of the ASME",
issn = "1050-0472",
publisher = "ASME",
number = "6",

}

A Bayesian Sampling Method for Product Feature Extraction from Large-Scale Textual Data. / Lim, Sunghoon; Tucker, Conrad S.

In: Journal of Mechanical Design, Transactions Of the ASME, Vol. 138, No. 6, 061403, 01.06.2016.

Research output: Contribution to journalArticle

TY - JOUR

T1 - A Bayesian Sampling Method for Product Feature Extraction from Large-Scale Textual Data

AU - Lim, Sunghoon

AU - Tucker, Conrad S.

PY - 2016/6/1

Y1 - 2016/6/1

N2 - The authors of this work propose an algorithm that determines optimal search keyword combinations for querying online product data sources in order to minimize identification errors during the product feature extraction process. Data-driven product design methodologies based on acquiring and mining online product-feature-related data are presented with two fundamental challenges: (1) determining optimal search keywords that result in relevant product related data being returned and (2) determining how many search keywords are sufficient to minimize identification errors during the product feature extraction process. These challenges exist because online data, which is primarily textual in nature, may violate several statistical assumptions relating to the independence and identical distribution of samples relating to a query. Existing design methodologies have predetermined search terms that are used to acquire textual data online, which makes the resulting data acquired, a function of the quality of the search term(s) themselves. Furthermore, the lack of independence and identical distribution of text data from online sources impacts the quality of the acquired data. For example, a designer may search for a product feature using the term screen which may return relevant results such as the screen size is just perfect but may also contain irrelevant noise such as researchers should really screen for this type of error.A text mining algorithm is introduced to determine the optimal terms without labeled training data that would maximize the veracity of the data acquired to make a valid conclusion. A case study involving real-world smartphones is used to validate the proposed methodology.

AB - The authors of this work propose an algorithm that determines optimal search keyword combinations for querying online product data sources in order to minimize identification errors during the product feature extraction process. Data-driven product design methodologies based on acquiring and mining online product-feature-related data are presented with two fundamental challenges: (1) determining optimal search keywords that result in relevant product related data being returned and (2) determining how many search keywords are sufficient to minimize identification errors during the product feature extraction process. These challenges exist because online data, which is primarily textual in nature, may violate several statistical assumptions relating to the independence and identical distribution of samples relating to a query. Existing design methodologies have predetermined search terms that are used to acquire textual data online, which makes the resulting data acquired, a function of the quality of the search term(s) themselves. Furthermore, the lack of independence and identical distribution of text data from online sources impacts the quality of the acquired data. For example, a designer may search for a product feature using the term screen which may return relevant results such as the screen size is just perfect but may also contain irrelevant noise such as researchers should really screen for this type of error.A text mining algorithm is introduced to determine the optimal terms without labeled training data that would maximize the veracity of the data acquired to make a valid conclusion. A case study involving real-world smartphones is used to validate the proposed methodology.

UR - http://www.scopus.com/inward/record.url?scp=84971513897&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84971513897&partnerID=8YFLogxK

U2 - 10.1115/1.4033238

DO - 10.1115/1.4033238

M3 - Article

AN - SCOPUS:84971513897

VL - 138

JO - Journal of Mechanical Design - Transactions of the ASME

JF - Journal of Mechanical Design - Transactions of the ASME

SN - 1050-0472

IS - 6

M1 - 061403

ER -