Text truth: An unsupervised approach to discover trustworthy information from multi-sourced text data

Hengtong Zhang, Yaliang Li, Fenglong Ma, Jing Gao, Lu Su

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

Truth discovery has attracted increasingly more attention due to its ability to distill trustworthy information from noisy multi-sourced data without any supervision. However, most existing truth discovery methods are designed for structured data, and cannot meet the strong need to extract trustworthy information from raw text data as text data has its unique characteristics. The major challenges of inferring true information on text data stem from the multifactorial property of text answers (i.e., an answer may contain multiple key factors) and the diversity of word usages (i.e., different words may have the same semantic meaning). To tackle these challenges, in this paper, we propose a novel truth discovery method, named “TextTruth”, which jointly groups the keywords extracted from the answers of a specific question into multiple interpretable factors, and infers the trustworthiness of both answer factors and answer providers. After that, the answers to each question can be ranked based on the estimated trustworthiness of factors. The proposed method works in an unsupervised manner, and thus can be applied to various application scenarios that involve text data. Experiments on three real-world datasets show that the proposed TextTruth model can accurately select trustworthy answers, even when these answers are formed by multiple factors.

Original languageEnglish (US)
Title of host publicationKDD 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
PublisherAssociation for Computing Machinery
Pages2729-2737
Number of pages9
ISBN (Print)9781450355520
DOIs
StatePublished - Jul 19 2018
Event24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018 - London, United Kingdom
Duration: Aug 19 2018Aug 23 2018

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

Other

Other24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018
CountryUnited Kingdom
CityLondon
Period8/19/188/23/18

Fingerprint

Semantics
Experiments

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems

Cite this

Zhang, H., Li, Y., Ma, F., Gao, J., & Su, L. (2018). Text truth: An unsupervised approach to discover trustworthy information from multi-sourced text data. In KDD 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 2729-2737). (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). Association for Computing Machinery. https://doi.org/10.1145/3219819.3219977
Zhang, Hengtong ; Li, Yaliang ; Ma, Fenglong ; Gao, Jing ; Su, Lu. / Text truth : An unsupervised approach to discover trustworthy information from multi-sourced text data. KDD 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2018. pp. 2729-2737 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).
@inproceedings{8da5f22a539f48c48ee71b56722ea327,
title = "Text truth: An unsupervised approach to discover trustworthy information from multi-sourced text data",
abstract = "Truth discovery has attracted increasingly more attention due to its ability to distill trustworthy information from noisy multi-sourced data without any supervision. However, most existing truth discovery methods are designed for structured data, and cannot meet the strong need to extract trustworthy information from raw text data as text data has its unique characteristics. The major challenges of inferring true information on text data stem from the multifactorial property of text answers (i.e., an answer may contain multiple key factors) and the diversity of word usages (i.e., different words may have the same semantic meaning). To tackle these challenges, in this paper, we propose a novel truth discovery method, named “TextTruth”, which jointly groups the keywords extracted from the answers of a specific question into multiple interpretable factors, and infers the trustworthiness of both answer factors and answer providers. After that, the answers to each question can be ranked based on the estimated trustworthiness of factors. The proposed method works in an unsupervised manner, and thus can be applied to various application scenarios that involve text data. Experiments on three real-world datasets show that the proposed TextTruth model can accurately select trustworthy answers, even when these answers are formed by multiple factors.",
author = "Hengtong Zhang and Yaliang Li and Fenglong Ma and Jing Gao and Lu Su",
year = "2018",
month = "7",
day = "19",
doi = "10.1145/3219819.3219977",
language = "English (US)",
isbn = "9781450355520",
series = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",
publisher = "Association for Computing Machinery",
pages = "2729--2737",
booktitle = "KDD 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

}

Zhang, H, Li, Y, Ma, F, Gao, J & Su, L 2018, Text truth: An unsupervised approach to discover trustworthy information from multi-sourced text data. in KDD 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Association for Computing Machinery, pp. 2729-2737, 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2018, London, United Kingdom, 8/19/18. https://doi.org/10.1145/3219819.3219977

Text truth : An unsupervised approach to discover trustworthy information from multi-sourced text data. / Zhang, Hengtong; Li, Yaliang; Ma, Fenglong; Gao, Jing; Su, Lu.

KDD 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery, 2018. p. 2729-2737 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Text truth

T2 - An unsupervised approach to discover trustworthy information from multi-sourced text data

AU - Zhang, Hengtong

AU - Li, Yaliang

AU - Ma, Fenglong

AU - Gao, Jing

AU - Su, Lu

PY - 2018/7/19

Y1 - 2018/7/19

N2 - Truth discovery has attracted increasingly more attention due to its ability to distill trustworthy information from noisy multi-sourced data without any supervision. However, most existing truth discovery methods are designed for structured data, and cannot meet the strong need to extract trustworthy information from raw text data as text data has its unique characteristics. The major challenges of inferring true information on text data stem from the multifactorial property of text answers (i.e., an answer may contain multiple key factors) and the diversity of word usages (i.e., different words may have the same semantic meaning). To tackle these challenges, in this paper, we propose a novel truth discovery method, named “TextTruth”, which jointly groups the keywords extracted from the answers of a specific question into multiple interpretable factors, and infers the trustworthiness of both answer factors and answer providers. After that, the answers to each question can be ranked based on the estimated trustworthiness of factors. The proposed method works in an unsupervised manner, and thus can be applied to various application scenarios that involve text data. Experiments on three real-world datasets show that the proposed TextTruth model can accurately select trustworthy answers, even when these answers are formed by multiple factors.

AB - Truth discovery has attracted increasingly more attention due to its ability to distill trustworthy information from noisy multi-sourced data without any supervision. However, most existing truth discovery methods are designed for structured data, and cannot meet the strong need to extract trustworthy information from raw text data as text data has its unique characteristics. The major challenges of inferring true information on text data stem from the multifactorial property of text answers (i.e., an answer may contain multiple key factors) and the diversity of word usages (i.e., different words may have the same semantic meaning). To tackle these challenges, in this paper, we propose a novel truth discovery method, named “TextTruth”, which jointly groups the keywords extracted from the answers of a specific question into multiple interpretable factors, and infers the trustworthiness of both answer factors and answer providers. After that, the answers to each question can be ranked based on the estimated trustworthiness of factors. The proposed method works in an unsupervised manner, and thus can be applied to various application scenarios that involve text data. Experiments on three real-world datasets show that the proposed TextTruth model can accurately select trustworthy answers, even when these answers are formed by multiple factors.

UR - http://www.scopus.com/inward/record.url?scp=85051568167&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85051568167&partnerID=8YFLogxK

U2 - 10.1145/3219819.3219977

DO - 10.1145/3219819.3219977

M3 - Conference contribution

AN - SCOPUS:85051568167

SN - 9781450355520

T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 2729

EP - 2737

BT - KDD 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

PB - Association for Computing Machinery

ER -

Zhang H, Li Y, Ma F, Gao J, Su L. Text truth: An unsupervised approach to discover trustworthy information from multi-sourced text data. In KDD 2018 - Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery. 2018. p. 2729-2737. (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). https://doi.org/10.1145/3219819.3219977