'This award is funded under the American Recovery and Reinvestment Act of 2009 (Public Law 111-5).'
This project is concerned with retrieval of information contained in the tables, figures and references contained in digital document on the web. At present a large amount of information present in tables and figures in documents is not easy accessible by automated methods. End-users cannot confidently pose a query and retrieve data presented in tables and figures in documents directly through a search engine. While there has been substantial research on table boundary detection and some research on table content extraction, none of these methods tried to identify the data and segregate it from the metadata present in the tables and figures which would represent a significant step forward. The research goal of this proposal is to make fundamental contributions in digital document processing and information retrieval. The objective is to identify data clearly, separate it from the metadata automatically, and store in a digital repository. An end-user will then be able to effectively query and retrieve datasets or their interesting parts.
|Effective start/end date||6/1/09 → 5/31/14|
- National Science Foundation: $449,782.00