Estimation of discourse segmentation labels from crowd data

Ziheng Huang, Jialu Zhong, Rebecca Jane Passonneau

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

For annotation tasks involving independent judgments, probabilistic models have been used to infer ground truth labels from data where a crowd of many annotators labels the same items. Such models have been shown to produce results superior to taking the majority vote, but have not been applied to sequential data. We present two methods to infer ground truth labels from sequential annotations where we assume judgments are not independent, based on the observation that an annotator's segments all tend to be several utterances long. The data consists of crowd labels for annotation of discourse segment boundaries. The new methods extend Hidden Markov Models to relax the independence assumption. The two methods are distinct, so positive labels proposed by both are taken to be ground truth. In addition, results of the models are checked using metrics that test whether an annotator's accuracy relative to a given model remains consistent across different conversations.

Original languageEnglish (US)
Title of host publicationConference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing
PublisherAssociation for Computational Linguistics (ACL)
Pages2190-2200
Number of pages11
ISBN (Electronic)9781941643327
StatePublished - 2015
EventConference on Empirical Methods in Natural Language Processing, EMNLP 2015 - Lisbon, Portugal
Duration: Sep 17 2015Sep 21 2015

Other

OtherConference on Empirical Methods in Natural Language Processing, EMNLP 2015
CountryPortugal
CityLisbon
Period9/17/159/21/15

Fingerprint

Labels
Hidden Markov models

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Cite this

Huang, Z., Zhong, J., & Passonneau, R. J. (2015). Estimation of discourse segmentation labels from crowd data. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing (pp. 2190-2200). Association for Computational Linguistics (ACL).
Huang, Ziheng ; Zhong, Jialu ; Passonneau, Rebecca Jane. / Estimation of discourse segmentation labels from crowd data. Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL), 2015. pp. 2190-2200
@inproceedings{c760ee8768c74861993ca98fa3454d63,
title = "Estimation of discourse segmentation labels from crowd data",
abstract = "For annotation tasks involving independent judgments, probabilistic models have been used to infer ground truth labels from data where a crowd of many annotators labels the same items. Such models have been shown to produce results superior to taking the majority vote, but have not been applied to sequential data. We present two methods to infer ground truth labels from sequential annotations where we assume judgments are not independent, based on the observation that an annotator's segments all tend to be several utterances long. The data consists of crowd labels for annotation of discourse segment boundaries. The new methods extend Hidden Markov Models to relax the independence assumption. The two methods are distinct, so positive labels proposed by both are taken to be ground truth. In addition, results of the models are checked using metrics that test whether an annotator's accuracy relative to a given model remains consistent across different conversations.",
author = "Ziheng Huang and Jialu Zhong and Passonneau, {Rebecca Jane}",
year = "2015",
language = "English (US)",
pages = "2190--2200",
booktitle = "Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing",
publisher = "Association for Computational Linguistics (ACL)",

}

Huang, Z, Zhong, J & Passonneau, RJ 2015, Estimation of discourse segmentation labels from crowd data. in Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL), pp. 2190-2200, Conference on Empirical Methods in Natural Language Processing, EMNLP 2015, Lisbon, Portugal, 9/17/15.

Estimation of discourse segmentation labels from crowd data. / Huang, Ziheng; Zhong, Jialu; Passonneau, Rebecca Jane.

Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL), 2015. p. 2190-2200.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Estimation of discourse segmentation labels from crowd data

AU - Huang, Ziheng

AU - Zhong, Jialu

AU - Passonneau, Rebecca Jane

PY - 2015

Y1 - 2015

N2 - For annotation tasks involving independent judgments, probabilistic models have been used to infer ground truth labels from data where a crowd of many annotators labels the same items. Such models have been shown to produce results superior to taking the majority vote, but have not been applied to sequential data. We present two methods to infer ground truth labels from sequential annotations where we assume judgments are not independent, based on the observation that an annotator's segments all tend to be several utterances long. The data consists of crowd labels for annotation of discourse segment boundaries. The new methods extend Hidden Markov Models to relax the independence assumption. The two methods are distinct, so positive labels proposed by both are taken to be ground truth. In addition, results of the models are checked using metrics that test whether an annotator's accuracy relative to a given model remains consistent across different conversations.

AB - For annotation tasks involving independent judgments, probabilistic models have been used to infer ground truth labels from data where a crowd of many annotators labels the same items. Such models have been shown to produce results superior to taking the majority vote, but have not been applied to sequential data. We present two methods to infer ground truth labels from sequential annotations where we assume judgments are not independent, based on the observation that an annotator's segments all tend to be several utterances long. The data consists of crowd labels for annotation of discourse segment boundaries. The new methods extend Hidden Markov Models to relax the independence assumption. The two methods are distinct, so positive labels proposed by both are taken to be ground truth. In addition, results of the models are checked using metrics that test whether an annotator's accuracy relative to a given model remains consistent across different conversations.

UR - http://www.scopus.com/inward/record.url?scp=84959922810&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84959922810&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84959922810

SP - 2190

EP - 2200

BT - Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing

PB - Association for Computational Linguistics (ACL)

ER -

Huang Z, Zhong J, Passonneau RJ. Estimation of discourse segmentation labels from crowd data. In Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (ACL). 2015. p. 2190-2200