Learning classifiers from chains of multiple interlinked RDF data stores

Harris T. Lin, Vasant Honavar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

4 Citations (Scopus)

Abstract

The emergence of many interlinked, physically distributed, and autonomously maintained RDF stores offers unprecedented opportunities for predictive modeling and knowledge discovery from such data. However existing machine learning approaches are limited in their applicability because it is neither desirable nor feasible to gather all of the data in a centralized location for analysis due to access, memory, bandwidth, computational restrictions, and sometimes privacy and confidentiality constraints. Against this background, we consider the problem of learning predictive models from multiple interlinked RDF stores. Specifically we: (i) introduce statistical query based formulations of several representative algorithms for learning classifiers from RDF data, (ii) introduce a distributed learning framework to learn classifiers from multiple interlinked RDF stores that form a chain, (iii) identify three special cases of RDF data fragmentation and describe effective strategies for learning predictive models in each case, (iv) consider a novel application of a matrix reconstruction technique from the field of Computerized Tomography [1] to approximate the statistics needed by the learning algorithm from projections using count queries, thus dramatically reducing the amount of information transmitted from the remote data sources to the learner, and (v) report results of experiments with a real-world social network data set (Last.fm), which demonstrate the feasibility of the proposed approach.

Original languageEnglish (US)
Title of host publicationProceedings - 2013 IEEE International Congress on Big Data, BigData 2013
Pages94-101
Number of pages8
DOIs
StatePublished - Oct 28 2013
Event2013 IEEE International Congress on Big Data, BigData 2013 - Santa Clara, CA, United States
Duration: Jun 27 2013Jul 2 2013

Publication series

NameProceedings - 2013 IEEE International Congress on Big Data, BigData 2013

Other

Other2013 IEEE International Congress on Big Data, BigData 2013
CountryUnited States
CitySanta Clara, CA
Period6/27/137/2/13

Fingerprint

Classifiers
Computerized tomography
Learning algorithms
Data mining
Learning systems
Statistics
Bandwidth
Data storage equipment
Experiments

All Science Journal Classification (ASJC) codes

  • Computer Science Applications

Cite this

Lin, H. T., & Honavar, V. (2013). Learning classifiers from chains of multiple interlinked RDF data stores. In Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013 (pp. 94-101). [6597124] (Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013). https://doi.org/10.1109/BigData.Congress.2013.22
Lin, Harris T. ; Honavar, Vasant. / Learning classifiers from chains of multiple interlinked RDF data stores. Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013. 2013. pp. 94-101 (Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013).
@inproceedings{e9c76b0db7ba4bc98c7d026cfc1607a8,
title = "Learning classifiers from chains of multiple interlinked RDF data stores",
abstract = "The emergence of many interlinked, physically distributed, and autonomously maintained RDF stores offers unprecedented opportunities for predictive modeling and knowledge discovery from such data. However existing machine learning approaches are limited in their applicability because it is neither desirable nor feasible to gather all of the data in a centralized location for analysis due to access, memory, bandwidth, computational restrictions, and sometimes privacy and confidentiality constraints. Against this background, we consider the problem of learning predictive models from multiple interlinked RDF stores. Specifically we: (i) introduce statistical query based formulations of several representative algorithms for learning classifiers from RDF data, (ii) introduce a distributed learning framework to learn classifiers from multiple interlinked RDF stores that form a chain, (iii) identify three special cases of RDF data fragmentation and describe effective strategies for learning predictive models in each case, (iv) consider a novel application of a matrix reconstruction technique from the field of Computerized Tomography [1] to approximate the statistics needed by the learning algorithm from projections using count queries, thus dramatically reducing the amount of information transmitted from the remote data sources to the learner, and (v) report results of experiments with a real-world social network data set (Last.fm), which demonstrate the feasibility of the proposed approach.",
author = "Lin, {Harris T.} and Vasant Honavar",
year = "2013",
month = "10",
day = "28",
doi = "10.1109/BigData.Congress.2013.22",
language = "English (US)",
isbn = "9780768550060",
series = "Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013",
pages = "94--101",
booktitle = "Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013",

}

Lin, HT & Honavar, V 2013, Learning classifiers from chains of multiple interlinked RDF data stores. in Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013., 6597124, Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013, pp. 94-101, 2013 IEEE International Congress on Big Data, BigData 2013, Santa Clara, CA, United States, 6/27/13. https://doi.org/10.1109/BigData.Congress.2013.22

Learning classifiers from chains of multiple interlinked RDF data stores. / Lin, Harris T.; Honavar, Vasant.

Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013. 2013. p. 94-101 6597124 (Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Learning classifiers from chains of multiple interlinked RDF data stores

AU - Lin, Harris T.

AU - Honavar, Vasant

PY - 2013/10/28

Y1 - 2013/10/28

N2 - The emergence of many interlinked, physically distributed, and autonomously maintained RDF stores offers unprecedented opportunities for predictive modeling and knowledge discovery from such data. However existing machine learning approaches are limited in their applicability because it is neither desirable nor feasible to gather all of the data in a centralized location for analysis due to access, memory, bandwidth, computational restrictions, and sometimes privacy and confidentiality constraints. Against this background, we consider the problem of learning predictive models from multiple interlinked RDF stores. Specifically we: (i) introduce statistical query based formulations of several representative algorithms for learning classifiers from RDF data, (ii) introduce a distributed learning framework to learn classifiers from multiple interlinked RDF stores that form a chain, (iii) identify three special cases of RDF data fragmentation and describe effective strategies for learning predictive models in each case, (iv) consider a novel application of a matrix reconstruction technique from the field of Computerized Tomography [1] to approximate the statistics needed by the learning algorithm from projections using count queries, thus dramatically reducing the amount of information transmitted from the remote data sources to the learner, and (v) report results of experiments with a real-world social network data set (Last.fm), which demonstrate the feasibility of the proposed approach.

AB - The emergence of many interlinked, physically distributed, and autonomously maintained RDF stores offers unprecedented opportunities for predictive modeling and knowledge discovery from such data. However existing machine learning approaches are limited in their applicability because it is neither desirable nor feasible to gather all of the data in a centralized location for analysis due to access, memory, bandwidth, computational restrictions, and sometimes privacy and confidentiality constraints. Against this background, we consider the problem of learning predictive models from multiple interlinked RDF stores. Specifically we: (i) introduce statistical query based formulations of several representative algorithms for learning classifiers from RDF data, (ii) introduce a distributed learning framework to learn classifiers from multiple interlinked RDF stores that form a chain, (iii) identify three special cases of RDF data fragmentation and describe effective strategies for learning predictive models in each case, (iv) consider a novel application of a matrix reconstruction technique from the field of Computerized Tomography [1] to approximate the statistics needed by the learning algorithm from projections using count queries, thus dramatically reducing the amount of information transmitted from the remote data sources to the learner, and (v) report results of experiments with a real-world social network data set (Last.fm), which demonstrate the feasibility of the proposed approach.

UR - http://www.scopus.com/inward/record.url?scp=84885967020&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84885967020&partnerID=8YFLogxK

U2 - 10.1109/BigData.Congress.2013.22

DO - 10.1109/BigData.Congress.2013.22

M3 - Conference contribution

AN - SCOPUS:84885967020

SN - 9780768550060

T3 - Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013

SP - 94

EP - 101

BT - Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013

ER -

Lin HT, Honavar V. Learning classifiers from chains of multiple interlinked RDF data stores. In Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013. 2013. p. 94-101. 6597124. (Proceedings - 2013 IEEE International Congress on Big Data, BigData 2013). https://doi.org/10.1109/BigData.Congress.2013.22