US-Singapore Planning Visit: Collaborative Research on Next Generation Bibliographic Search Engine

Project: Research project

Project Details




This award supports a planning visit to enable Dr. Dongwon Lee at the Pennsylvania State University in College Park to meet with Dr. Min-Yen Kan at the National University of Singapore. The planned collaborative research aims at building a next generation bibliographic search engine, improving focused crawling and indexing, metadata extraction, and quality of metadata, etc. The researchers will discuss: 1) the detailed scope and goals of the collaborative research; 2) the development of large-scale metadata cleaning algorithm under co-development; and 3) the scheduling and logistics of a proposed workshop. The primary characteristic of BSE (Barq Search Engine) versus standard search engines is the unique nature and structure of data associated with bibliographic references. This calls for a finely focused search that looks at data features more specific and limited than simple keywords. The sample data would be representative and contain many records with flawed entries (missing values, ambiguous names and titles, mixed and redundant citations). Progress would depend upon discovery of new algorithmic properties and system-level features associated with format recognition and error correction. Metadata extractions are a key and difficult research area and one that requires international collaborative efforts to make significant progress. The problems of metadata extraction are difficult. The need for new approaches to large-scale metadata manipulation and management are pressing.

Improved BSE systems could not only search bibliographic text more efficiently, but can also serve to identify and report errors and corrupted data since there is fore knowledge of what a given entry can contain. Improved BSE systems could also extract metadata across documents genres thereby enabling uniform new collection building. This would be of great benefit to communities handling specific document types.

Effective start/end date7/15/066/30/07


  • National Science Foundation: $5,852.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.