Document analysis and retrieval tasks in scientific digital libraries

Sujatha Das Gollapalli, Cornelia Caragea, Xiaoli Li, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Machine Learning (ML) algorithms have opened up new possibilities for the acquisition and processing of documents in Information Retrieval (IR) systems. Indeed, it is now possible to automate several labor-intensive tasks related to documents such as categorization and entity extraction. Consequently, the application of machine learning techniques for various large-scale IR tasks has gathered significant research interest in both the ML and IR communities. This tutorial provides a reference summary of our research in applying machine learning techniques to diverse tasks in Digital Libraries (DL). Digital library portals are specialized IR systems that work on collections of documents related to particular domains. We focus on open-access, scientific digital libraries such as CiteSeerx, which involve several crawling, ranking, content analysis, and metadata extraction tasks. We elaborate on the challenges involved in these tasks and highlight how machine learning methods can successfully address these challenges.

Original languageEnglish (US)
Title of host publicationInformation Retrieval - 8th Russian Summer School, RuSSIR 2014, Revised Selected Papers
EditorsPavel Braslavski, Yana Volkovich, Marcel Worring, Nikolay Karpov, Dmitry I. Ignatov
PublisherSpringer Verlag
Pages3-20
Number of pages18
ISBN (Print)9783319254845
DOIs
StatePublished - Jan 1 2015
Event8th Russian Summer School on Information Retrieval, RuSSIR 2014 - Nizhniy, Novgorod, Russian Federation
Duration: Aug 18 2014Aug 22 2014

Publication series

NameCommunications in Computer and Information Science
Volume505
ISSN (Print)1865-0929

Other

Other8th Russian Summer School on Information Retrieval, RuSSIR 2014
CountryRussian Federation
CityNizhniy, Novgorod
Period8/18/148/22/14

Fingerprint

Document Analysis
Document Retrieval
Digital libraries
Digital Libraries
Learning systems
Machine Learning
Information Retrieval
Information retrieval systems
Information retrieval
Content Analysis
Categorization
Metadata
Learning algorithms
Learning Algorithm
Ranking
Personnel
Processing

All Science Journal Classification (ASJC) codes

  • Computer Science(all)
  • Mathematics(all)

Cite this

Gollapalli, S. D., Caragea, C., Li, X., & Giles, C. L. (2015). Document analysis and retrieval tasks in scientific digital libraries. In P. Braslavski, Y. Volkovich, M. Worring, N. Karpov, & D. I. Ignatov (Eds.), Information Retrieval - 8th Russian Summer School, RuSSIR 2014, Revised Selected Papers (pp. 3-20). (Communications in Computer and Information Science; Vol. 505). Springer Verlag. https://doi.org/10.1007/978-3-319-25485-2_1
Gollapalli, Sujatha Das ; Caragea, Cornelia ; Li, Xiaoli ; Giles, C. Lee. / Document analysis and retrieval tasks in scientific digital libraries. Information Retrieval - 8th Russian Summer School, RuSSIR 2014, Revised Selected Papers. editor / Pavel Braslavski ; Yana Volkovich ; Marcel Worring ; Nikolay Karpov ; Dmitry I. Ignatov. Springer Verlag, 2015. pp. 3-20 (Communications in Computer and Information Science).
@inproceedings{41fe6f282c1d449daa2342b34dd460d2,
title = "Document analysis and retrieval tasks in scientific digital libraries",
abstract = "Machine Learning (ML) algorithms have opened up new possibilities for the acquisition and processing of documents in Information Retrieval (IR) systems. Indeed, it is now possible to automate several labor-intensive tasks related to documents such as categorization and entity extraction. Consequently, the application of machine learning techniques for various large-scale IR tasks has gathered significant research interest in both the ML and IR communities. This tutorial provides a reference summary of our research in applying machine learning techniques to diverse tasks in Digital Libraries (DL). Digital library portals are specialized IR systems that work on collections of documents related to particular domains. We focus on open-access, scientific digital libraries such as CiteSeerx, which involve several crawling, ranking, content analysis, and metadata extraction tasks. We elaborate on the challenges involved in these tasks and highlight how machine learning methods can successfully address these challenges.",
author = "Gollapalli, {Sujatha Das} and Cornelia Caragea and Xiaoli Li and Giles, {C. Lee}",
year = "2015",
month = "1",
day = "1",
doi = "10.1007/978-3-319-25485-2_1",
language = "English (US)",
isbn = "9783319254845",
series = "Communications in Computer and Information Science",
publisher = "Springer Verlag",
pages = "3--20",
editor = "Pavel Braslavski and Yana Volkovich and Marcel Worring and Nikolay Karpov and Ignatov, {Dmitry I.}",
booktitle = "Information Retrieval - 8th Russian Summer School, RuSSIR 2014, Revised Selected Papers",
address = "Germany",

}

Gollapalli, SD, Caragea, C, Li, X & Giles, CL 2015, Document analysis and retrieval tasks in scientific digital libraries. in P Braslavski, Y Volkovich, M Worring, N Karpov & DI Ignatov (eds), Information Retrieval - 8th Russian Summer School, RuSSIR 2014, Revised Selected Papers. Communications in Computer and Information Science, vol. 505, Springer Verlag, pp. 3-20, 8th Russian Summer School on Information Retrieval, RuSSIR 2014, Nizhniy, Novgorod, Russian Federation, 8/18/14. https://doi.org/10.1007/978-3-319-25485-2_1

Document analysis and retrieval tasks in scientific digital libraries. / Gollapalli, Sujatha Das; Caragea, Cornelia; Li, Xiaoli; Giles, C. Lee.

Information Retrieval - 8th Russian Summer School, RuSSIR 2014, Revised Selected Papers. ed. / Pavel Braslavski; Yana Volkovich; Marcel Worring; Nikolay Karpov; Dmitry I. Ignatov. Springer Verlag, 2015. p. 3-20 (Communications in Computer and Information Science; Vol. 505).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Document analysis and retrieval tasks in scientific digital libraries

AU - Gollapalli, Sujatha Das

AU - Caragea, Cornelia

AU - Li, Xiaoli

AU - Giles, C. Lee

PY - 2015/1/1

Y1 - 2015/1/1

N2 - Machine Learning (ML) algorithms have opened up new possibilities for the acquisition and processing of documents in Information Retrieval (IR) systems. Indeed, it is now possible to automate several labor-intensive tasks related to documents such as categorization and entity extraction. Consequently, the application of machine learning techniques for various large-scale IR tasks has gathered significant research interest in both the ML and IR communities. This tutorial provides a reference summary of our research in applying machine learning techniques to diverse tasks in Digital Libraries (DL). Digital library portals are specialized IR systems that work on collections of documents related to particular domains. We focus on open-access, scientific digital libraries such as CiteSeerx, which involve several crawling, ranking, content analysis, and metadata extraction tasks. We elaborate on the challenges involved in these tasks and highlight how machine learning methods can successfully address these challenges.

AB - Machine Learning (ML) algorithms have opened up new possibilities for the acquisition and processing of documents in Information Retrieval (IR) systems. Indeed, it is now possible to automate several labor-intensive tasks related to documents such as categorization and entity extraction. Consequently, the application of machine learning techniques for various large-scale IR tasks has gathered significant research interest in both the ML and IR communities. This tutorial provides a reference summary of our research in applying machine learning techniques to diverse tasks in Digital Libraries (DL). Digital library portals are specialized IR systems that work on collections of documents related to particular domains. We focus on open-access, scientific digital libraries such as CiteSeerx, which involve several crawling, ranking, content analysis, and metadata extraction tasks. We elaborate on the challenges involved in these tasks and highlight how machine learning methods can successfully address these challenges.

UR - http://www.scopus.com/inward/record.url?scp=84951786317&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84951786317&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-25485-2_1

DO - 10.1007/978-3-319-25485-2_1

M3 - Conference contribution

AN - SCOPUS:84951786317

SN - 9783319254845

T3 - Communications in Computer and Information Science

SP - 3

EP - 20

BT - Information Retrieval - 8th Russian Summer School, RuSSIR 2014, Revised Selected Papers

A2 - Braslavski, Pavel

A2 - Volkovich, Yana

A2 - Worring, Marcel

A2 - Karpov, Nikolay

A2 - Ignatov, Dmitry I.

PB - Springer Verlag

ER -

Gollapalli SD, Caragea C, Li X, Giles CL. Document analysis and retrieval tasks in scientific digital libraries. In Braslavski P, Volkovich Y, Worring M, Karpov N, Ignatov DI, editors, Information Retrieval - 8th Russian Summer School, RuSSIR 2014, Revised Selected Papers. Springer Verlag. 2015. p. 3-20. (Communications in Computer and Information Science). https://doi.org/10.1007/978-3-319-25485-2_1