Geoannotator: A collaborative semi-automatic platform for constructing geo-annotated text corpora

Morteza Karimzadeh, Alan Maceachren

Research output: Contribution to journalArticle

Abstract

Ground-truth datasets are essential for the training and evaluation of any automated algorithm. As such, gold-standard annotated corpora underlie most advances in natural language processing (NLP). However, only a few relatively small (geo-)annotated datasets are available for geoparsing, i.e., the automatic recognition and geolocation of place references in unstructured text. The creation of geoparsing corpora that include both the recognition of place names in text and matching of those names to toponyms in a geographic gazetteer (a process we call geo-annotation), is a laborious, time-consuming and expensive task. The field lacks efficient geo-annotation tools to support corpus building and lacks design guidelines for the development of such tools. Here, we present the iterative design of GeoAnnotator, a web-based, semi-automatic and collaborative visual analytics platform for geo-annotation. GeoAnnotator facilitates collaborative, multi-annotator creation of large corpora of geo-annotated text by generating computationally-generated pre-annotations that can be improved by human-annotator users. The resulting corpora can be used in improving and benchmarking geoparsing algorithms as well as various other spatial language-related methods. Further, the iterative design process and the resulting design decisions can be used in annotation platforms tailored for other application domains of NLP.

Original languageEnglish (US)
Article number161
JournalISPRS International Journal of Geo-Information
Volume8
Issue number4
DOIs
StatePublished - Mar 27 2019

Fingerprint

language
gold standard
lack
benchmarking
place name
Benchmarking
Processing
gold
Gold
evaluation
time
decision
method

All Science Journal Classification (ASJC) codes

  • Geography, Planning and Development
  • Computers in Earth Sciences
  • Earth and Planetary Sciences (miscellaneous)

Cite this

@article{ed5916efcd5d480da5613494a558b382,
title = "Geoannotator: A collaborative semi-automatic platform for constructing geo-annotated text corpora",
abstract = "Ground-truth datasets are essential for the training and evaluation of any automated algorithm. As such, gold-standard annotated corpora underlie most advances in natural language processing (NLP). However, only a few relatively small (geo-)annotated datasets are available for geoparsing, i.e., the automatic recognition and geolocation of place references in unstructured text. The creation of geoparsing corpora that include both the recognition of place names in text and matching of those names to toponyms in a geographic gazetteer (a process we call geo-annotation), is a laborious, time-consuming and expensive task. The field lacks efficient geo-annotation tools to support corpus building and lacks design guidelines for the development of such tools. Here, we present the iterative design of GeoAnnotator, a web-based, semi-automatic and collaborative visual analytics platform for geo-annotation. GeoAnnotator facilitates collaborative, multi-annotator creation of large corpora of geo-annotated text by generating computationally-generated pre-annotations that can be improved by human-annotator users. The resulting corpora can be used in improving and benchmarking geoparsing algorithms as well as various other spatial language-related methods. Further, the iterative design process and the resulting design decisions can be used in annotation platforms tailored for other application domains of NLP.",
author = "Morteza Karimzadeh and Alan Maceachren",
year = "2019",
month = "3",
day = "27",
doi = "10.3390/ijgi8040161",
language = "English (US)",
volume = "8",
journal = "ISPRS International Journal of Geo-Information",
issn = "2220-9964",
publisher = "MDPI AG",
number = "4",

}

Geoannotator : A collaborative semi-automatic platform for constructing geo-annotated text corpora. / Karimzadeh, Morteza; Maceachren, Alan.

In: ISPRS International Journal of Geo-Information, Vol. 8, No. 4, 161, 27.03.2019.

Research output: Contribution to journalArticle

TY - JOUR

T1 - Geoannotator

T2 - A collaborative semi-automatic platform for constructing geo-annotated text corpora

AU - Karimzadeh, Morteza

AU - Maceachren, Alan

PY - 2019/3/27

Y1 - 2019/3/27

N2 - Ground-truth datasets are essential for the training and evaluation of any automated algorithm. As such, gold-standard annotated corpora underlie most advances in natural language processing (NLP). However, only a few relatively small (geo-)annotated datasets are available for geoparsing, i.e., the automatic recognition and geolocation of place references in unstructured text. The creation of geoparsing corpora that include both the recognition of place names in text and matching of those names to toponyms in a geographic gazetteer (a process we call geo-annotation), is a laborious, time-consuming and expensive task. The field lacks efficient geo-annotation tools to support corpus building and lacks design guidelines for the development of such tools. Here, we present the iterative design of GeoAnnotator, a web-based, semi-automatic and collaborative visual analytics platform for geo-annotation. GeoAnnotator facilitates collaborative, multi-annotator creation of large corpora of geo-annotated text by generating computationally-generated pre-annotations that can be improved by human-annotator users. The resulting corpora can be used in improving and benchmarking geoparsing algorithms as well as various other spatial language-related methods. Further, the iterative design process and the resulting design decisions can be used in annotation platforms tailored for other application domains of NLP.

AB - Ground-truth datasets are essential for the training and evaluation of any automated algorithm. As such, gold-standard annotated corpora underlie most advances in natural language processing (NLP). However, only a few relatively small (geo-)annotated datasets are available for geoparsing, i.e., the automatic recognition and geolocation of place references in unstructured text. The creation of geoparsing corpora that include both the recognition of place names in text and matching of those names to toponyms in a geographic gazetteer (a process we call geo-annotation), is a laborious, time-consuming and expensive task. The field lacks efficient geo-annotation tools to support corpus building and lacks design guidelines for the development of such tools. Here, we present the iterative design of GeoAnnotator, a web-based, semi-automatic and collaborative visual analytics platform for geo-annotation. GeoAnnotator facilitates collaborative, multi-annotator creation of large corpora of geo-annotated text by generating computationally-generated pre-annotations that can be improved by human-annotator users. The resulting corpora can be used in improving and benchmarking geoparsing algorithms as well as various other spatial language-related methods. Further, the iterative design process and the resulting design decisions can be used in annotation platforms tailored for other application domains of NLP.

UR - http://www.scopus.com/inward/record.url?scp=85066443192&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85066443192&partnerID=8YFLogxK

U2 - 10.3390/ijgi8040161

DO - 10.3390/ijgi8040161

M3 - Article

AN - SCOPUS:85066443192

VL - 8

JO - ISPRS International Journal of Geo-Information

JF - ISPRS International Journal of Geo-Information

SN - 2220-9964

IS - 4

M1 - 161

ER -