GeoTxt: A scalable geoparsing system for unstructured text geolocation

Morteza Karimzadeh, Scott Pezanowski, Alan Maceachren, Jan O. Wallgrün

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

In this article we present GeoTxt, a scalable geoparsing system for the recognition and geolocation of place names in unstructured text. GeoTxt offers six named entity recognition (NER) algorithms for place name recognition, and utilizes an enterprise search engine for the indexing, ranking, and retrieval of toponyms, enabling scalable geoparsing for streaming text. GeoTxt offers a flexible application programming interface (API), allowing for customized attribute and/or spatial ranking of retrieved toponyms. We evaluate the system on a corpus of manually geo-annotated tweets. First, we benchmark the performance of the six NERs that GeoTxt provides access to. Second, we assess GeoTxt toponym resolution accuracy incrementally, demonstrating improvements in toponym resolution achieved (or not achieved) by adding specific heuristics and disambiguation methods. Compared to using the GeoNames web service, GeoTxt's toponym resolution demonstrates a 20% accuracy gain. Our results show that places mentioned in the same tweet do not tend to be geographically proximate.

Original languageEnglish (US)
Pages (from-to)118-136
Number of pages19
JournalTransactions in GIS
Volume23
Issue number1
DOIs
StatePublished - Feb 1 2019

Fingerprint

place name
ranking
heuristics
engine
attribute
method
services

All Science Journal Classification (ASJC) codes

  • Earth and Planetary Sciences(all)

Cite this

Karimzadeh, Morteza ; Pezanowski, Scott ; Maceachren, Alan ; Wallgrün, Jan O. / GeoTxt : A scalable geoparsing system for unstructured text geolocation. In: Transactions in GIS. 2019 ; Vol. 23, No. 1. pp. 118-136.
@article{c422151aa2fe4655a0d9ac6907423c34,
title = "GeoTxt: A scalable geoparsing system for unstructured text geolocation",
abstract = "In this article we present GeoTxt, a scalable geoparsing system for the recognition and geolocation of place names in unstructured text. GeoTxt offers six named entity recognition (NER) algorithms for place name recognition, and utilizes an enterprise search engine for the indexing, ranking, and retrieval of toponyms, enabling scalable geoparsing for streaming text. GeoTxt offers a flexible application programming interface (API), allowing for customized attribute and/or spatial ranking of retrieved toponyms. We evaluate the system on a corpus of manually geo-annotated tweets. First, we benchmark the performance of the six NERs that GeoTxt provides access to. Second, we assess GeoTxt toponym resolution accuracy incrementally, demonstrating improvements in toponym resolution achieved (or not achieved) by adding specific heuristics and disambiguation methods. Compared to using the GeoNames web service, GeoTxt's toponym resolution demonstrates a 20{\%} accuracy gain. Our results show that places mentioned in the same tweet do not tend to be geographically proximate.",
author = "Morteza Karimzadeh and Scott Pezanowski and Alan Maceachren and Wallgr{\"u}n, {Jan O.}",
year = "2019",
month = "2",
day = "1",
doi = "10.1111/tgis.12510",
language = "English (US)",
volume = "23",
pages = "118--136",
journal = "Transactions in GIS",
issn = "1361-1682",
publisher = "Wiley-Blackwell",
number = "1",

}

GeoTxt : A scalable geoparsing system for unstructured text geolocation. / Karimzadeh, Morteza; Pezanowski, Scott; Maceachren, Alan; Wallgrün, Jan O.

In: Transactions in GIS, Vol. 23, No. 1, 01.02.2019, p. 118-136.

Research output: Contribution to journalArticle

TY - JOUR

T1 - GeoTxt

T2 - A scalable geoparsing system for unstructured text geolocation

AU - Karimzadeh, Morteza

AU - Pezanowski, Scott

AU - Maceachren, Alan

AU - Wallgrün, Jan O.

PY - 2019/2/1

Y1 - 2019/2/1

N2 - In this article we present GeoTxt, a scalable geoparsing system for the recognition and geolocation of place names in unstructured text. GeoTxt offers six named entity recognition (NER) algorithms for place name recognition, and utilizes an enterprise search engine for the indexing, ranking, and retrieval of toponyms, enabling scalable geoparsing for streaming text. GeoTxt offers a flexible application programming interface (API), allowing for customized attribute and/or spatial ranking of retrieved toponyms. We evaluate the system on a corpus of manually geo-annotated tweets. First, we benchmark the performance of the six NERs that GeoTxt provides access to. Second, we assess GeoTxt toponym resolution accuracy incrementally, demonstrating improvements in toponym resolution achieved (or not achieved) by adding specific heuristics and disambiguation methods. Compared to using the GeoNames web service, GeoTxt's toponym resolution demonstrates a 20% accuracy gain. Our results show that places mentioned in the same tweet do not tend to be geographically proximate.

AB - In this article we present GeoTxt, a scalable geoparsing system for the recognition and geolocation of place names in unstructured text. GeoTxt offers six named entity recognition (NER) algorithms for place name recognition, and utilizes an enterprise search engine for the indexing, ranking, and retrieval of toponyms, enabling scalable geoparsing for streaming text. GeoTxt offers a flexible application programming interface (API), allowing for customized attribute and/or spatial ranking of retrieved toponyms. We evaluate the system on a corpus of manually geo-annotated tweets. First, we benchmark the performance of the six NERs that GeoTxt provides access to. Second, we assess GeoTxt toponym resolution accuracy incrementally, demonstrating improvements in toponym resolution achieved (or not achieved) by adding specific heuristics and disambiguation methods. Compared to using the GeoNames web service, GeoTxt's toponym resolution demonstrates a 20% accuracy gain. Our results show that places mentioned in the same tweet do not tend to be geographically proximate.

UR - http://www.scopus.com/inward/record.url?scp=85060155279&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85060155279&partnerID=8YFLogxK

U2 - 10.1111/tgis.12510

DO - 10.1111/tgis.12510

M3 - Article

AN - SCOPUS:85060155279

VL - 23

SP - 118

EP - 136

JO - Transactions in GIS

JF - Transactions in GIS

SN - 1361-1682

IS - 1

ER -