CiteSeerχ - A scalable autonomous scientific digital library

Huajing Li, Isaac G. Councill, Levent Bolelli, Ding Zhou, Yang Song, Wang Chien Lee, Anand Sivasubramaniam, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

7 Citations (Scopus)

Abstract

CiteSeer is a scientific literature digital library and search engine which automatically crawls and indexes scientific documents in the fields of computer and information science. Since it's inception in 1997 CiteSeer has grown to index over 730,000 documents and serves over 800,000 requests daily, pushing the limits of the current system's capabilities. In addition, CiteSeer's monolithic architecture inconveniences system maintenance and reduces the flexibility of the system in terms of new feature development, algorithm updates, and system interoperability. In this paper, we discuss the problems of the current CiteSeer architecture and propose a new architecture for a next generation CiteSeer application. The new architecture is based on modular web services and pluggable service components. Preliminary results based on a prototype system show the new architecture enhances flexibility, scalability, and performance for CiteSeer. In addition, new services in development for the next generation CiteSeer system are discussed.

Original languageEnglish (US)
Title of host publicationProceedings of the 1st International Conference on Scalable Information Systems, InfoScale '06
DOIs
StatePublished - Dec 1 2006
Event1st International Conference on Scalable Information Systems, InfoScale '06 - Hong Kong, China
Duration: May 30 2006Jun 1 2006

Publication series

NameACM International Conference Proceeding Series
Volume152

Other

Other1st International Conference on Scalable Information Systems, InfoScale '06
CountryChina
CityHong Kong
Period5/30/066/1/06

Fingerprint

Information science
Digital libraries
Search engines
Interoperability
Computer science
Web services
Scalability

All Science Journal Classification (ASJC) codes

  • Software
  • Human-Computer Interaction
  • Computer Vision and Pattern Recognition
  • Computer Networks and Communications

Cite this

Li, H., Councill, I. G., Bolelli, L., Zhou, D., Song, Y., Lee, W. C., ... Lee Giles, C. (2006). CiteSeerχ - A scalable autonomous scientific digital library. In Proceedings of the 1st International Conference on Scalable Information Systems, InfoScale '06 [1146865] (ACM International Conference Proceeding Series; Vol. 152). https://doi.org/10.1145/1146847.1146865
Li, Huajing ; Councill, Isaac G. ; Bolelli, Levent ; Zhou, Ding ; Song, Yang ; Lee, Wang Chien ; Sivasubramaniam, Anand ; Lee Giles, C. / CiteSeerχ - A scalable autonomous scientific digital library. Proceedings of the 1st International Conference on Scalable Information Systems, InfoScale '06. 2006. (ACM International Conference Proceeding Series).
@inproceedings{68cb07b0c8094ca7a45f21c3536a0a84,
title = "CiteSeerχ - A scalable autonomous scientific digital library",
abstract = "CiteSeer is a scientific literature digital library and search engine which automatically crawls and indexes scientific documents in the fields of computer and information science. Since it's inception in 1997 CiteSeer has grown to index over 730,000 documents and serves over 800,000 requests daily, pushing the limits of the current system's capabilities. In addition, CiteSeer's monolithic architecture inconveniences system maintenance and reduces the flexibility of the system in terms of new feature development, algorithm updates, and system interoperability. In this paper, we discuss the problems of the current CiteSeer architecture and propose a new architecture for a next generation CiteSeer application. The new architecture is based on modular web services and pluggable service components. Preliminary results based on a prototype system show the new architecture enhances flexibility, scalability, and performance for CiteSeer. In addition, new services in development for the next generation CiteSeer system are discussed.",
author = "Huajing Li and Councill, {Isaac G.} and Levent Bolelli and Ding Zhou and Yang Song and Lee, {Wang Chien} and Anand Sivasubramaniam and {Lee Giles}, C.",
year = "2006",
month = "12",
day = "1",
doi = "10.1145/1146847.1146865",
language = "English (US)",
isbn = "1595934286",
series = "ACM International Conference Proceeding Series",
booktitle = "Proceedings of the 1st International Conference on Scalable Information Systems, InfoScale '06",

}

Li, H, Councill, IG, Bolelli, L, Zhou, D, Song, Y, Lee, WC, Sivasubramaniam, A & Lee Giles, C 2006, CiteSeerχ - A scalable autonomous scientific digital library. in Proceedings of the 1st International Conference on Scalable Information Systems, InfoScale '06., 1146865, ACM International Conference Proceeding Series, vol. 152, 1st International Conference on Scalable Information Systems, InfoScale '06, Hong Kong, China, 5/30/06. https://doi.org/10.1145/1146847.1146865

CiteSeerχ - A scalable autonomous scientific digital library. / Li, Huajing; Councill, Isaac G.; Bolelli, Levent; Zhou, Ding; Song, Yang; Lee, Wang Chien; Sivasubramaniam, Anand; Lee Giles, C.

Proceedings of the 1st International Conference on Scalable Information Systems, InfoScale '06. 2006. 1146865 (ACM International Conference Proceeding Series; Vol. 152).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - CiteSeerχ - A scalable autonomous scientific digital library

AU - Li, Huajing

AU - Councill, Isaac G.

AU - Bolelli, Levent

AU - Zhou, Ding

AU - Song, Yang

AU - Lee, Wang Chien

AU - Sivasubramaniam, Anand

AU - Lee Giles, C.

PY - 2006/12/1

Y1 - 2006/12/1

N2 - CiteSeer is a scientific literature digital library and search engine which automatically crawls and indexes scientific documents in the fields of computer and information science. Since it's inception in 1997 CiteSeer has grown to index over 730,000 documents and serves over 800,000 requests daily, pushing the limits of the current system's capabilities. In addition, CiteSeer's monolithic architecture inconveniences system maintenance and reduces the flexibility of the system in terms of new feature development, algorithm updates, and system interoperability. In this paper, we discuss the problems of the current CiteSeer architecture and propose a new architecture for a next generation CiteSeer application. The new architecture is based on modular web services and pluggable service components. Preliminary results based on a prototype system show the new architecture enhances flexibility, scalability, and performance for CiteSeer. In addition, new services in development for the next generation CiteSeer system are discussed.

AB - CiteSeer is a scientific literature digital library and search engine which automatically crawls and indexes scientific documents in the fields of computer and information science. Since it's inception in 1997 CiteSeer has grown to index over 730,000 documents and serves over 800,000 requests daily, pushing the limits of the current system's capabilities. In addition, CiteSeer's monolithic architecture inconveniences system maintenance and reduces the flexibility of the system in terms of new feature development, algorithm updates, and system interoperability. In this paper, we discuss the problems of the current CiteSeer architecture and propose a new architecture for a next generation CiteSeer application. The new architecture is based on modular web services and pluggable service components. Preliminary results based on a prototype system show the new architecture enhances flexibility, scalability, and performance for CiteSeer. In addition, new services in development for the next generation CiteSeer system are discussed.

UR - http://www.scopus.com/inward/record.url?scp=34547294914&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34547294914&partnerID=8YFLogxK

U2 - 10.1145/1146847.1146865

DO - 10.1145/1146847.1146865

M3 - Conference contribution

AN - SCOPUS:34547294914

SN - 1595934286

SN - 9781595934284

T3 - ACM International Conference Proceeding Series

BT - Proceedings of the 1st International Conference on Scalable Information Systems, InfoScale '06

ER -

Li H, Councill IG, Bolelli L, Zhou D, Song Y, Lee WC et al. CiteSeerχ - A scalable autonomous scientific digital library. In Proceedings of the 1st International Conference on Scalable Information Systems, InfoScale '06. 2006. 1146865. (ACM International Conference Proceeding Series). https://doi.org/10.1145/1146847.1146865