Towards building a scholarly big data platform: Challenges, lessons and opportunities

Zhaohui Wu, Jian Wu, Madian Khabsa, Kyle Williams, Hung Hsuan Chen, Wenyi Huang, Suppawong Tuarob, Sagnik Ray Choudhury, Alexander Ororbia, Prasenjit Mitra, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

32 Scopus citations

Abstract

We introduce a big data platform that provides various services for harvesting scholarly information and enabling efficient scholarly applications. The core architecture of the platform is built on a secured private cloud, crawls data using a scholarly focused crawler that leverages a dynamic scheduler, processes by utilizing a map reduce based crawl-extraction-ingestion (CEI) workflow, and is stored in distributed repositories and databases. Services such as scholarly data harvesting, information extraction, and user information and log data analytics are integrated into the platform and provided by an OAI and RESTful API. We also introduce a set of scholarly applications built on top of this platform including citation recommendation and collaborator discovery.

Original languageEnglish (US)
Title of host publication2014 IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages117-126
Number of pages10
ISBN (Electronic)9781479955695
DOIs
StatePublished - Dec 1 2014
Event2014 14th IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014 - London, United Kingdom
Duration: Sep 8 2014Sep 12 2014

Publication series

NameProceedings of the ACM/IEEE Joint Conference on Digital Libraries
ISSN (Print)1552-5996

Other

Other2014 14th IEEE/ACM Joint Conference on Digital Libraries, JCDL 2014
CountryUnited Kingdom
CityLondon
Period9/8/149/12/14

All Science Journal Classification (ASJC) codes

  • Engineering(all)

Fingerprint Dive into the research topics of 'Towards building a scholarly big data platform: Challenges, lessons and opportunities'. Together they form a unique fingerprint.

Cite this