Abstract
How spatial language varies regionally? This study investigates the possibility of exploring regional linguistic variations in spatial language by collecting and analyzing a Spatially-strAtified Route Direction Corpus (SARD Corpus) from volunteered spatial language text on the Web. Because of the fast content sharing functionality of the World Wide Web, it quickly becomes a hotbed for volunteered spatial language text, such as directions on hotels' Websites. These route directions can serve as a representation of everyday spatial language usage on the WWW. The spatial coverage and abundance of the data source is appealing while collecting and analyzing large quantities of spatially distributed data is still challenging. Through automated crawling, classifying and geo-referencing web documents containing route directions from the web, the SARD Corpus has been built covering the U.S., the U.K. and Australia. We implement a semantic categorical analysis scheme to explore regional variations in cardinal versus relative direction usages. Preliminary results show both similarity and differences at national level and geographic patterns at regional level. The design and implementation of building a geo-referenced large-scale corpus from Web documents offers a methodological contribution to corpus linguistics, spatial cognition, and the GISciences.
Original language | English (US) |
---|---|
Pages (from-to) | 49-52 |
Number of pages | 4 |
Journal | CEUR Workshop Proceedings |
Volume | 620 |
State | Published - Dec 1 2010 |
Event | Workshop on Computational Models of Spatial Language Interpretation at Spatial Cognition 2010, COSLI 2010 - Portland, OR, United States Duration: Aug 15 2010 → Aug 15 2010 |
All Science Journal Classification (ASJC) codes
- Computer Science(all)