Accelerating Substructure Similarity Search for Formula Retrieval

Wei Zhong, Shaurya Rohatgi, Jian Wu, C. Lee Giles, Richard Zanibbi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

19 Scopus citations

Abstract

Formula retrieval systems using substructure matching are effective, but suffer from slow retrieval times caused by the complexity of structure matching. We present a specialized inverted index and rank-safe dynamic pruning algorithm for faster substructure retrieval. Formulas are indexed from their Operator Tree (OPT) representations. Our model is evaluated using the NTCIR-12 Wikipedia Formula Browsing Task and a new formula corpus produced from Math StackExchange posts. Our approach preserves the effectiveness of structure matching while allowing queries to be executed in real-time.

Original languageEnglish (US)
Title of host publicationAdvances in Information Retrieval - 42nd European Conference on IR Research, ECIR 2020, Proceedings
EditorsJoemon M. Jose, Emine Yilmaz, João Magalhães, Flávio Martins, Pablo Castells, Nicola Ferro, Mário J. Silva
PublisherSpringer
Pages714-727
Number of pages14
ISBN (Print)9783030454388
DOIs
StatePublished - 2020
Event42nd European Conference on IR Research, ECIR 2020 - Lisbon, Portugal
Duration: Apr 14 2020Apr 17 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12035 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference42nd European Conference on IR Research, ECIR 2020
Country/TerritoryPortugal
CityLisbon
Period4/14/204/17/20

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'Accelerating Substructure Similarity Search for Formula Retrieval'. Together they form a unique fingerprint.

Cite this