Automatic tag recommendation for metadata annotation using probabilistic topic modeling

Suppawong Tuarob, Line C. Pouchard, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

40 Citations (Scopus)

Abstract

The increase of the complexity and advancement in ecological and environmental sciences encourages scientists across the world to collect data from multiple places, times, and thematic scales to verify their hypotheses. Accumulated over time, such data not only increases in amount, but also in the diversity of the data sources spread around the world. This poses a huge challenge for scientists who have to manually search for information. To alleviate such problems, ONEMercury has recently been implemented as part of the DataONE project to serve as a portal for accessing environmental and observational data across the globe. ONEMercury harvests metadata from the data hosted by multiple repositories and makes it searchable. However, harvested metadata records sometimes are poorly annotated or lacking meaningful keywords, which could affect effective retrieval. Here, we develop algorithms for automatic annotation of metadata. We transform the problem into a tag recommendation problem with a controlled tag library, and propose two variants of an algorithm for recommending tags. Our experiments on four datasets of environmental science metadata records not only show great promises on the performance of our method, but also shed light on the different natures of the datasets.

Original languageEnglish (US)
Title of host publicationJCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries
Pages239-248
Number of pages10
DOIs
StatePublished - Aug 23 2013
Event13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2013 - Indianapolis, IN, United States
Duration: Jul 22 2013Jul 26 2013

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
ISSN (Print)1552-5996

Other

Other13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2013
CountryUnited States
CityIndianapolis, IN
Period7/22/137/26/13

Fingerprint

Metadata
Experiments

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Cite this

Tuarob, S., Pouchard, L. C., & Lee Giles, C. (2013). Automatic tag recommendation for metadata annotation using probabilistic topic modeling. In JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries (pp. 239-248). (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries). https://doi.org/10.1145/2467696.2467706
Tuarob, Suppawong ; Pouchard, Line C. ; Lee Giles, C. / Automatic tag recommendation for metadata annotation using probabilistic topic modeling. JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries. 2013. pp. 239-248 (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries).
@inproceedings{95321ff2db364c79b637f626fdd3d133,
title = "Automatic tag recommendation for metadata annotation using probabilistic topic modeling",
abstract = "The increase of the complexity and advancement in ecological and environmental sciences encourages scientists across the world to collect data from multiple places, times, and thematic scales to verify their hypotheses. Accumulated over time, such data not only increases in amount, but also in the diversity of the data sources spread around the world. This poses a huge challenge for scientists who have to manually search for information. To alleviate such problems, ONEMercury has recently been implemented as part of the DataONE project to serve as a portal for accessing environmental and observational data across the globe. ONEMercury harvests metadata from the data hosted by multiple repositories and makes it searchable. However, harvested metadata records sometimes are poorly annotated or lacking meaningful keywords, which could affect effective retrieval. Here, we develop algorithms for automatic annotation of metadata. We transform the problem into a tag recommendation problem with a controlled tag library, and propose two variants of an algorithm for recommending tags. Our experiments on four datasets of environmental science metadata records not only show great promises on the performance of our method, but also shed light on the different natures of the datasets.",
author = "Suppawong Tuarob and Pouchard, {Line C.} and {Lee Giles}, C.",
year = "2013",
month = "8",
day = "23",
doi = "10.1145/2467696.2467706",
language = "English (US)",
isbn = "9781450320764",
series = "Proceedings of the ACM/IEEE Joint Conference on Digital Libraries",
pages = "239--248",
booktitle = "JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries",

}

Tuarob, S, Pouchard, LC & Lee Giles, C 2013, Automatic tag recommendation for metadata annotation using probabilistic topic modeling. in JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, pp. 239-248, 13th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL 2013, Indianapolis, IN, United States, 7/22/13. https://doi.org/10.1145/2467696.2467706

Automatic tag recommendation for metadata annotation using probabilistic topic modeling. / Tuarob, Suppawong; Pouchard, Line C.; Lee Giles, C.

JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries. 2013. p. 239-248 (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Automatic tag recommendation for metadata annotation using probabilistic topic modeling

AU - Tuarob, Suppawong

AU - Pouchard, Line C.

AU - Lee Giles, C.

PY - 2013/8/23

Y1 - 2013/8/23

N2 - The increase of the complexity and advancement in ecological and environmental sciences encourages scientists across the world to collect data from multiple places, times, and thematic scales to verify their hypotheses. Accumulated over time, such data not only increases in amount, but also in the diversity of the data sources spread around the world. This poses a huge challenge for scientists who have to manually search for information. To alleviate such problems, ONEMercury has recently been implemented as part of the DataONE project to serve as a portal for accessing environmental and observational data across the globe. ONEMercury harvests metadata from the data hosted by multiple repositories and makes it searchable. However, harvested metadata records sometimes are poorly annotated or lacking meaningful keywords, which could affect effective retrieval. Here, we develop algorithms for automatic annotation of metadata. We transform the problem into a tag recommendation problem with a controlled tag library, and propose two variants of an algorithm for recommending tags. Our experiments on four datasets of environmental science metadata records not only show great promises on the performance of our method, but also shed light on the different natures of the datasets.

AB - The increase of the complexity and advancement in ecological and environmental sciences encourages scientists across the world to collect data from multiple places, times, and thematic scales to verify their hypotheses. Accumulated over time, such data not only increases in amount, but also in the diversity of the data sources spread around the world. This poses a huge challenge for scientists who have to manually search for information. To alleviate such problems, ONEMercury has recently been implemented as part of the DataONE project to serve as a portal for accessing environmental and observational data across the globe. ONEMercury harvests metadata from the data hosted by multiple repositories and makes it searchable. However, harvested metadata records sometimes are poorly annotated or lacking meaningful keywords, which could affect effective retrieval. Here, we develop algorithms for automatic annotation of metadata. We transform the problem into a tag recommendation problem with a controlled tag library, and propose two variants of an algorithm for recommending tags. Our experiments on four datasets of environmental science metadata records not only show great promises on the performance of our method, but also shed light on the different natures of the datasets.

UR - http://www.scopus.com/inward/record.url?scp=84882251627&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84882251627&partnerID=8YFLogxK

U2 - 10.1145/2467696.2467706

DO - 10.1145/2467696.2467706

M3 - Conference contribution

AN - SCOPUS:84882251627

SN - 9781450320764

T3 - Proceedings of the ACM/IEEE Joint Conference on Digital Libraries

SP - 239

EP - 248

BT - JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries

ER -

Tuarob S, Pouchard LC, Lee Giles C. Automatic tag recommendation for metadata annotation using probabilistic topic modeling. In JCDL 2013 - Proceedings of the 13th ACM/IEEE-CS Joint Conference on Digital Libraries. 2013. p. 239-248. (Proceedings of the ACM/IEEE Joint Conference on Digital Libraries). https://doi.org/10.1145/2467696.2467706