Construction and first analysis of a corpus for the evaluation and training of microblog/twitter geoparsers

Jan Oliver Wallgrün, Frank Hardisty, Alan Maceachren, Morteza Karimzadeh, Yiting Ju, Scott Pezanowski

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Citations (Scopus)

Abstract

This article presents an approach to place reference corpus building and application of the approach to a Geo-Microblog Corpus that will foster research and development in the areas of microblog/twitter geoparsing and geographic information retrieval. Our corpus currently consists of 6000 tweets with identified and georeferenced place names. 30% of the tweets contain at least one place name. The corpus is intended to support the evaluation, comparison, and training of geoparsers. We introduce our corpus building framework, which is developed to be generally applicable beyond microblogs, and explain how we use crowdsourcing and geovisual analytics technology to support the construction of relatively large corpora. We then report on the corpus building work and present an analysis of causes of disagreement between the lay persons performing place identification in our crowdsourcing approach.

Original languageEnglish (US)
Title of host publicationProceedings of the 8th Workshop on Geographic Information Retrieval, GIR 2014
EditorsRoss S. Purves, Christopher B. Jones
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450331357
DOIs
StatePublished - Nov 4 2014
Event8th Workshop on Geographic Information Retrieval, GIR 2014 - Dallas, United States
Duration: Nov 4 2014Nov 7 2014

Publication series

NameProceedings of the 8th Workshop on Geographic Information Retrieval, GIR 2014

Other

Other8th Workshop on Geographic Information Retrieval, GIR 2014
CountryUnited States
CityDallas
Period11/4/1411/7/14

Fingerprint

twitter
place name
evaluation
information retrieval
research and development
cause
human being
present
analysis

All Science Journal Classification (ASJC) codes

  • Geography, Planning and Development

Cite this

Wallgrün, J. O., Hardisty, F., Maceachren, A., Karimzadeh, M., Ju, Y., & Pezanowski, S. (2014). Construction and first analysis of a corpus for the evaluation and training of microblog/twitter geoparsers. In R. S. Purves, & C. B. Jones (Eds.), Proceedings of the 8th Workshop on Geographic Information Retrieval, GIR 2014 [a4] (Proceedings of the 8th Workshop on Geographic Information Retrieval, GIR 2014). Association for Computing Machinery, Inc. https://doi.org/10.1145/2675354.2675701
Wallgrün, Jan Oliver ; Hardisty, Frank ; Maceachren, Alan ; Karimzadeh, Morteza ; Ju, Yiting ; Pezanowski, Scott. / Construction and first analysis of a corpus for the evaluation and training of microblog/twitter geoparsers. Proceedings of the 8th Workshop on Geographic Information Retrieval, GIR 2014. editor / Ross S. Purves ; Christopher B. Jones. Association for Computing Machinery, Inc, 2014. (Proceedings of the 8th Workshop on Geographic Information Retrieval, GIR 2014).
@inproceedings{01755e655dac4581bdfd74bf33b82958,
title = "Construction and first analysis of a corpus for the evaluation and training of microblog/twitter geoparsers",
abstract = "This article presents an approach to place reference corpus building and application of the approach to a Geo-Microblog Corpus that will foster research and development in the areas of microblog/twitter geoparsing and geographic information retrieval. Our corpus currently consists of 6000 tweets with identified and georeferenced place names. 30{\%} of the tweets contain at least one place name. The corpus is intended to support the evaluation, comparison, and training of geoparsers. We introduce our corpus building framework, which is developed to be generally applicable beyond microblogs, and explain how we use crowdsourcing and geovisual analytics technology to support the construction of relatively large corpora. We then report on the corpus building work and present an analysis of causes of disagreement between the lay persons performing place identification in our crowdsourcing approach.",
author = "Wallgr{\"u}n, {Jan Oliver} and Frank Hardisty and Alan Maceachren and Morteza Karimzadeh and Yiting Ju and Scott Pezanowski",
year = "2014",
month = "11",
day = "4",
doi = "10.1145/2675354.2675701",
language = "English (US)",
series = "Proceedings of the 8th Workshop on Geographic Information Retrieval, GIR 2014",
publisher = "Association for Computing Machinery, Inc",
editor = "Purves, {Ross S.} and Jones, {Christopher B.}",
booktitle = "Proceedings of the 8th Workshop on Geographic Information Retrieval, GIR 2014",

}

Wallgrün, JO, Hardisty, F, Maceachren, A, Karimzadeh, M, Ju, Y & Pezanowski, S 2014, Construction and first analysis of a corpus for the evaluation and training of microblog/twitter geoparsers. in RS Purves & CB Jones (eds), Proceedings of the 8th Workshop on Geographic Information Retrieval, GIR 2014., a4, Proceedings of the 8th Workshop on Geographic Information Retrieval, GIR 2014, Association for Computing Machinery, Inc, 8th Workshop on Geographic Information Retrieval, GIR 2014, Dallas, United States, 11/4/14. https://doi.org/10.1145/2675354.2675701

Construction and first analysis of a corpus for the evaluation and training of microblog/twitter geoparsers. / Wallgrün, Jan Oliver; Hardisty, Frank; Maceachren, Alan; Karimzadeh, Morteza; Ju, Yiting; Pezanowski, Scott.

Proceedings of the 8th Workshop on Geographic Information Retrieval, GIR 2014. ed. / Ross S. Purves; Christopher B. Jones. Association for Computing Machinery, Inc, 2014. a4 (Proceedings of the 8th Workshop on Geographic Information Retrieval, GIR 2014).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Construction and first analysis of a corpus for the evaluation and training of microblog/twitter geoparsers

AU - Wallgrün, Jan Oliver

AU - Hardisty, Frank

AU - Maceachren, Alan

AU - Karimzadeh, Morteza

AU - Ju, Yiting

AU - Pezanowski, Scott

PY - 2014/11/4

Y1 - 2014/11/4

N2 - This article presents an approach to place reference corpus building and application of the approach to a Geo-Microblog Corpus that will foster research and development in the areas of microblog/twitter geoparsing and geographic information retrieval. Our corpus currently consists of 6000 tweets with identified and georeferenced place names. 30% of the tweets contain at least one place name. The corpus is intended to support the evaluation, comparison, and training of geoparsers. We introduce our corpus building framework, which is developed to be generally applicable beyond microblogs, and explain how we use crowdsourcing and geovisual analytics technology to support the construction of relatively large corpora. We then report on the corpus building work and present an analysis of causes of disagreement between the lay persons performing place identification in our crowdsourcing approach.

AB - This article presents an approach to place reference corpus building and application of the approach to a Geo-Microblog Corpus that will foster research and development in the areas of microblog/twitter geoparsing and geographic information retrieval. Our corpus currently consists of 6000 tweets with identified and georeferenced place names. 30% of the tweets contain at least one place name. The corpus is intended to support the evaluation, comparison, and training of geoparsers. We introduce our corpus building framework, which is developed to be generally applicable beyond microblogs, and explain how we use crowdsourcing and geovisual analytics technology to support the construction of relatively large corpora. We then report on the corpus building work and present an analysis of causes of disagreement between the lay persons performing place identification in our crowdsourcing approach.

UR - http://www.scopus.com/inward/record.url?scp=84942429201&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84942429201&partnerID=8YFLogxK

U2 - 10.1145/2675354.2675701

DO - 10.1145/2675354.2675701

M3 - Conference contribution

T3 - Proceedings of the 8th Workshop on Geographic Information Retrieval, GIR 2014

BT - Proceedings of the 8th Workshop on Geographic Information Retrieval, GIR 2014

A2 - Purves, Ross S.

A2 - Jones, Christopher B.

PB - Association for Computing Machinery, Inc

ER -

Wallgrün JO, Hardisty F, Maceachren A, Karimzadeh M, Ju Y, Pezanowski S. Construction and first analysis of a corpus for the evaluation and training of microblog/twitter geoparsers. In Purves RS, Jones CB, editors, Proceedings of the 8th Workshop on Geographic Information Retrieval, GIR 2014. Association for Computing Machinery, Inc. 2014. a4. (Proceedings of the 8th Workshop on Geographic Information Retrieval, GIR 2014). https://doi.org/10.1145/2675354.2675701