Panorama: Extending digital libraries with topical crawlers

Gautam Pant, Kostas Tsioutsiouliklis, Judy Johnson, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

32 Citations (Scopus)

Abstract

A large amount of research, technical and professional documents are available today in digital formats. Digital libraries are created to facilitate search and retrieval of information supplied by the documents. These libraries may span an entire area of interest (e.g., computer science) or be limited to documents within a small organization. While tools that index, classify, rank and retrieve documents from such libraries are important, it would be worthwhile to complement these tools with information available on the Web. We propose one such technique that uses a topical crawler driven by the information extracted from a research document. The goal of the crawler is to harvest a collection of Web pages that are focused on the topical subspaces associated with the given document. The collection created through Web crawling is further processed using lexical and linkage analysis. The entire process is automated and uses machine learning techniques to both guide the crawler as well as analyze the collection it fetches. A report is generated at the end that provides visual cues and information to the researcher.

Original languageEnglish (US)
Title of host publicationProceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries; Global Reach and Diverse Impact, JCDL 2004
EditorsH. Chen, M. Christel, E.P. Lim
Pages142-150
Number of pages9
StatePublished - Oct 18 2004
EventProceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries; Global reach and Diverse Impact, JCDL 2004 - Tucson, AZ, United States
Duration: Jun 7 2004Jun 11 2004

Publication series

NameProceedings of the ACM IEEE International Conference on Digital Libraries, JCDL 2004

Other

OtherProceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries; Global reach and Diverse Impact, JCDL 2004
CountryUnited States
CityTucson, AZ
Period6/7/046/11/04

Fingerprint

Digital libraries
Computer science
Learning systems
Websites
available information
computer science
organization
learning
Web crawler

All Science Journal Classification (ASJC) codes

  • Software
  • Information Systems
  • Computer Science Applications
  • Library and Information Sciences

Cite this

Pant, G., Tsioutsiouliklis, K., Johnson, J., & Giles, C. L. (2004). Panorama: Extending digital libraries with topical crawlers. In H. Chen, M. Christel, & E. P. Lim (Eds.), Proceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries; Global Reach and Diverse Impact, JCDL 2004 (pp. 142-150). (Proceedings of the ACM IEEE International Conference on Digital Libraries, JCDL 2004).
Pant, Gautam ; Tsioutsiouliklis, Kostas ; Johnson, Judy ; Giles, C. Lee. / Panorama : Extending digital libraries with topical crawlers. Proceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries; Global Reach and Diverse Impact, JCDL 2004. editor / H. Chen ; M. Christel ; E.P. Lim. 2004. pp. 142-150 (Proceedings of the ACM IEEE International Conference on Digital Libraries, JCDL 2004).
@inproceedings{97bb5cd637b54637babd00b627b68e3c,
title = "Panorama: Extending digital libraries with topical crawlers",
abstract = "A large amount of research, technical and professional documents are available today in digital formats. Digital libraries are created to facilitate search and retrieval of information supplied by the documents. These libraries may span an entire area of interest (e.g., computer science) or be limited to documents within a small organization. While tools that index, classify, rank and retrieve documents from such libraries are important, it would be worthwhile to complement these tools with information available on the Web. We propose one such technique that uses a topical crawler driven by the information extracted from a research document. The goal of the crawler is to harvest a collection of Web pages that are focused on the topical subspaces associated with the given document. The collection created through Web crawling is further processed using lexical and linkage analysis. The entire process is automated and uses machine learning techniques to both guide the crawler as well as analyze the collection it fetches. A report is generated at the end that provides visual cues and information to the researcher.",
author = "Gautam Pant and Kostas Tsioutsiouliklis and Judy Johnson and Giles, {C. Lee}",
year = "2004",
month = "10",
day = "18",
language = "English (US)",
isbn = "1581138326",
series = "Proceedings of the ACM IEEE International Conference on Digital Libraries, JCDL 2004",
pages = "142--150",
editor = "H. Chen and M. Christel and E.P. Lim",
booktitle = "Proceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries; Global Reach and Diverse Impact, JCDL 2004",

}

Pant, G, Tsioutsiouliklis, K, Johnson, J & Giles, CL 2004, Panorama: Extending digital libraries with topical crawlers. in H Chen, M Christel & EP Lim (eds), Proceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries; Global Reach and Diverse Impact, JCDL 2004. Proceedings of the ACM IEEE International Conference on Digital Libraries, JCDL 2004, pp. 142-150, Proceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries; Global reach and Diverse Impact, JCDL 2004, Tucson, AZ, United States, 6/7/04.

Panorama : Extending digital libraries with topical crawlers. / Pant, Gautam; Tsioutsiouliklis, Kostas; Johnson, Judy; Giles, C. Lee.

Proceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries; Global Reach and Diverse Impact, JCDL 2004. ed. / H. Chen; M. Christel; E.P. Lim. 2004. p. 142-150 (Proceedings of the ACM IEEE International Conference on Digital Libraries, JCDL 2004).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Panorama

T2 - Extending digital libraries with topical crawlers

AU - Pant, Gautam

AU - Tsioutsiouliklis, Kostas

AU - Johnson, Judy

AU - Giles, C. Lee

PY - 2004/10/18

Y1 - 2004/10/18

N2 - A large amount of research, technical and professional documents are available today in digital formats. Digital libraries are created to facilitate search and retrieval of information supplied by the documents. These libraries may span an entire area of interest (e.g., computer science) or be limited to documents within a small organization. While tools that index, classify, rank and retrieve documents from such libraries are important, it would be worthwhile to complement these tools with information available on the Web. We propose one such technique that uses a topical crawler driven by the information extracted from a research document. The goal of the crawler is to harvest a collection of Web pages that are focused on the topical subspaces associated with the given document. The collection created through Web crawling is further processed using lexical and linkage analysis. The entire process is automated and uses machine learning techniques to both guide the crawler as well as analyze the collection it fetches. A report is generated at the end that provides visual cues and information to the researcher.

AB - A large amount of research, technical and professional documents are available today in digital formats. Digital libraries are created to facilitate search and retrieval of information supplied by the documents. These libraries may span an entire area of interest (e.g., computer science) or be limited to documents within a small organization. While tools that index, classify, rank and retrieve documents from such libraries are important, it would be worthwhile to complement these tools with information available on the Web. We propose one such technique that uses a topical crawler driven by the information extracted from a research document. The goal of the crawler is to harvest a collection of Web pages that are focused on the topical subspaces associated with the given document. The collection created through Web crawling is further processed using lexical and linkage analysis. The entire process is automated and uses machine learning techniques to both guide the crawler as well as analyze the collection it fetches. A report is generated at the end that provides visual cues and information to the researcher.

UR - http://www.scopus.com/inward/record.url?scp=4944227235&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=4944227235&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:4944227235

SN - 1581138326

SN - 9781581138320

T3 - Proceedings of the ACM IEEE International Conference on Digital Libraries, JCDL 2004

SP - 142

EP - 150

BT - Proceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries; Global Reach and Diverse Impact, JCDL 2004

A2 - Chen, H.

A2 - Christel, M.

A2 - Lim, E.P.

ER -

Pant G, Tsioutsiouliklis K, Johnson J, Giles CL. Panorama: Extending digital libraries with topical crawlers. In Chen H, Christel M, Lim EP, editors, Proceedings of the Fourth ACM/IEEE Joint Conference on Digital Libraries; Global Reach and Diverse Impact, JCDL 2004. 2004. p. 142-150. (Proceedings of the ACM IEEE International Conference on Digital Libraries, JCDL 2004).