Unambiguity regularization for unsupervised learning of probabilistic grammars

Kewei Tu, Vasant Honavar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

17 Scopus citations

Abstract

We introduce a novel approach named unam-biguity regularization for unsupervised learning of probabilistic natural language grammars. The approach is based on the observation that natural language is remarkably unambiguous in the sense that only a tiny portion of the large number of possible parses of a natural language sentence are syntactically valid. We incorporate an inductive bias into grammar learning in favor of grammars that lead to unambiguous parses on natural language sentences. The resulting family of algorithms includes the expectation-maximization algorithm (EM) and its variant, Viterbi EM, as well as a so-called softmax-EM algorithm. The softmax-EM algorithm can be implemented with a simple and computationally efficient extension to standard EM. In our experiments of unsupervised dependency grammar learning, we show that unambiguity regularization is beneficial to learning, and in combination with annealing (of the regularization strength) and sparsity priors it leads to improvement over the current state of the art.

Original languageEnglish (US)
Title of host publicationEMNLP-CoNLL 2012 - 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference
Pages1324-1334
Number of pages11
StatePublished - Dec 1 2012
Event2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012 - Jeju Island, Korea, Republic of
Duration: Jul 12 2012Jul 14 2012

Publication series

NameEMNLP-CoNLL 2012 - 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Proceedings of the Conference

Other

Other2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP-CoNLL 2012
CountryKorea, Republic of
CityJeju Island
Period7/12/127/14/12

All Science Journal Classification (ASJC) codes

  • Software

Fingerprint Dive into the research topics of 'Unambiguity regularization for unsupervised learning of probabilistic grammars'. Together they form a unique fingerprint.

Cite this