Automatic tag recommendation for metadata annotation using probabilistic topic modeling

Suppawong Tuarob, Line C. Pouchard, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

42 Scopus citations

Abstract

The increase of the complexity and advancement in ecological and environmental sciences encourages scientists across the world to collect data from multiple places, times, and thematic scales to verify their hypotheses. Accumulated over time, such data not only increases in amount, but also in the diversity of the data sources spread around the world. This poses a huge challenge for scientists who have to manually search for information. To alleviate such problems, ONEMercury has recently been implemented as part of the DataONE project to serve as a portal for accessing environmental and observational data across the globe. ONEMercury harvests metadata from the data hosted by multiple repositories and makes it searchable. However, harvested metadata records sometimes are poorly annotated or lacking meaningful keywords, which could affect effective retrieval. Here, we develop algorithms for automatic annotation of metadata. We transform the problem into a tag recommendation problem with a controlled tag library, and propose two variants of an algorithm for recommending tags. Our experiments on four datasets of environmental science metadata records not only show great promises on the performance of our method, but also shed light on the different natures of the datasets.

Original languageEnglish (US)
Title of host publicationJCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries
Pages239-248
Number of pages10
DOIs
StatePublished - Aug 23 2013
Event13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2013 - Indianapolis, IN, United States
Duration: Jul 22 2013Jul 26 2013

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
ISSN (Print)1552-5996

Other

Other13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2013
CountryUnited States
CityIndianapolis, IN
Period7/22/137/26/13

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Fingerprint Dive into the research topics of 'Automatic tag recommendation for metadata annotation using probabilistic topic modeling'. Together they form a unique fingerprint.

Cite this