Learning relational Bayesian classifiers from RDF data

Harris T. Lin, Neeraj Koul, Vasant Honavar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Scopus citations

Abstract

The increasing availability of large RDF datasets offers an exciting opportunity to use such data to build predictive models using machine learning algorithms. However, the massive size and distributed nature of RDF data calls for approaches to learning from RDF data in a setting where the data can be accessed only through a query interface, e.g., the SPARQL endpoint of the RDF store. In applications where the data are subject to frequent updates, there is a need for algorithms that allow the predictive model to be incrementally updated in response to changes in the data. Furthermore, in some applications, the attributes that are relevant for specific prediction tasks are not known a priori and hence need to be discovered by the algorithm. We present an approach to learning Relational Bayesian Classifiers (RBCs) from RDF data that addresses such scenarios. Specifically, we show how to build RBCs from RDF data using statistical queries through the SPARQL endpoint of the RDF store. We compare the communication complexity of our algorithm with one that requires direct centralized access to the data and hence has to retrieve the entire RDF dataset from the remote location for processing. We establish the conditions under which the RBC models can be incrementally updated in response to addition or deletion of RDF data. We show how our approach can be extended to the setting where the attributes that are relevant for prediction are not known a priori, by selectively crawling the RDF data for attributes of interest. We provide open source implementation and evaluate the proposed approach on several large RDF datasets.

Original languageEnglish (US)
Title of host publicationThe Semantic Web, ISWC 2011 - 10th International Semantic Web Conference, Proceedings
Pages389-404
Number of pages16
EditionPART 1
DOIs
StatePublished - Nov 2 2011
Event10th International Semantic Web Conference, ISWC 2011 - Bonn, Germany
Duration: Oct 23 2011Oct 27 2011

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
NumberPART 1
Volume7031 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other10th International Semantic Web Conference, ISWC 2011
CountryGermany
CityBonn
Period10/23/1110/27/11

All Science Journal Classification (ASJC) codes

  • Theoretical Computer Science
  • Computer Science(all)

Fingerprint Dive into the research topics of 'Learning relational Bayesian classifiers from RDF data'. Together they form a unique fingerprint.

  • Cite this

    Lin, H. T., Koul, N., & Honavar, V. (2011). Learning relational Bayesian classifiers from RDF data. In The Semantic Web, ISWC 2011 - 10th International Semantic Web Conference, Proceedings (PART 1 ed., pp. 389-404). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 7031 LNCS, No. PART 1). https://doi.org/10.1007/978-3-642-25073-6_25