Learning classifiers from remote RDF data stores augmented with RDFS subclass hierarchies

Harris T. Lin, Ngot Bui, Vasant Honavar

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

Rapid growth of RDF data in the Linked Open Data (LOD) cloud offers unprecedented opportunities for analyzing such data using machine learning algorithms. The massive size and distributed nature of LOD cloud present a challenging machine learning problem where the data can only be accessed remotely, i.e. through a query interface such as the SPARQL end-point of the data store. Existing approaches to learning classifiers from RDF data in such a setting fail to take advantage of RDF schema (RDFS) associated with the data store that asserts subclass hierarchies which provide information that can potentially be exploited by the learner. Against this background, we present a general approach that augments an existing directed graphical model with hidden variables that encode subclass hierarchies via probabilistic constraints. We also present an algorithm ProbAVT that adopts the variational Bayesian expectation maximization approach to efficiently learn parameters in such settings. Our experiments with several synthetic and real world datasets show that: (i) ProbAVT matches or outperforms its counterpart that does not incorporate background knowledge in the form of subclass hierarchies; (ii) ProbAVT remains competitive compared to other state-of-art models that incorporate subclass hierarchies, and is able to scale up to large hierarchies consisting of over tens of thousands of nodes.

Original languageEnglish (US)
Title of host publicationProceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015
EditorsFeng Luo, Kemafor Ogan, Mohammed J. Zaki, Laura Haas, Beng Chin Ooi, Vipin Kumar, Sudarsan Rachuri, Saumyadipta Pyne, Howard Ho, Xiaohua Hu, Shipeng Yu, Morris Hui-I Hsiao, Jian Li
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages1807-1813
Number of pages7
ISBN (Electronic)9781479999255
DOIs
StatePublished - Dec 22 2015
Event3rd IEEE International Conference on Big Data, IEEE Big Data 2015 - Santa Clara, United States
Duration: Oct 29 2015Nov 1 2015

Publication series

NameProceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015

Other

Other3rd IEEE International Conference on Big Data, IEEE Big Data 2015
CountryUnited States
CitySanta Clara
Period10/29/1511/1/15

Fingerprint

Learning systems
Classifiers
Learning algorithms
Experiments

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Computer Science Applications
  • Information Systems
  • Software

Cite this

Lin, H. T., Bui, N., & Honavar, V. (2015). Learning classifiers from remote RDF data stores augmented with RDFS subclass hierarchies. In F. Luo, K. Ogan, M. J. Zaki, L. Haas, B. C. Ooi, V. Kumar, S. Rachuri, S. Pyne, H. Ho, X. Hu, S. Yu, M. H-I. Hsiao, ... J. Li (Eds.), Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015 (pp. 1807-1813). [7363953] (Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2015.7363953
Lin, Harris T. ; Bui, Ngot ; Honavar, Vasant. / Learning classifiers from remote RDF data stores augmented with RDFS subclass hierarchies. Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015. editor / Feng Luo ; Kemafor Ogan ; Mohammed J. Zaki ; Laura Haas ; Beng Chin Ooi ; Vipin Kumar ; Sudarsan Rachuri ; Saumyadipta Pyne ; Howard Ho ; Xiaohua Hu ; Shipeng Yu ; Morris Hui-I Hsiao ; Jian Li. Institute of Electrical and Electronics Engineers Inc., 2015. pp. 1807-1813 (Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015).
@inproceedings{84e2769e196d40d88ac8ce1be4bf6cfb,
title = "Learning classifiers from remote RDF data stores augmented with RDFS subclass hierarchies",
abstract = "Rapid growth of RDF data in the Linked Open Data (LOD) cloud offers unprecedented opportunities for analyzing such data using machine learning algorithms. The massive size and distributed nature of LOD cloud present a challenging machine learning problem where the data can only be accessed remotely, i.e. through a query interface such as the SPARQL end-point of the data store. Existing approaches to learning classifiers from RDF data in such a setting fail to take advantage of RDF schema (RDFS) associated with the data store that asserts subclass hierarchies which provide information that can potentially be exploited by the learner. Against this background, we present a general approach that augments an existing directed graphical model with hidden variables that encode subclass hierarchies via probabilistic constraints. We also present an algorithm ProbAVT that adopts the variational Bayesian expectation maximization approach to efficiently learn parameters in such settings. Our experiments with several synthetic and real world datasets show that: (i) ProbAVT matches or outperforms its counterpart that does not incorporate background knowledge in the form of subclass hierarchies; (ii) ProbAVT remains competitive compared to other state-of-art models that incorporate subclass hierarchies, and is able to scale up to large hierarchies consisting of over tens of thousands of nodes.",
author = "Lin, {Harris T.} and Ngot Bui and Vasant Honavar",
year = "2015",
month = "12",
day = "22",
doi = "10.1109/BigData.2015.7363953",
language = "English (US)",
series = "Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
pages = "1807--1813",
editor = "Feng Luo and Kemafor Ogan and Zaki, {Mohammed J.} and Laura Haas and Ooi, {Beng Chin} and Vipin Kumar and Sudarsan Rachuri and Saumyadipta Pyne and Howard Ho and Xiaohua Hu and Shipeng Yu and Hsiao, {Morris Hui-I} and Jian Li",
booktitle = "Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015",
address = "United States",

}

Lin, HT, Bui, N & Honavar, V 2015, Learning classifiers from remote RDF data stores augmented with RDFS subclass hierarchies. in F Luo, K Ogan, MJ Zaki, L Haas, BC Ooi, V Kumar, S Rachuri, S Pyne, H Ho, X Hu, S Yu, MH-I Hsiao & J Li (eds), Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015., 7363953, Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015, Institute of Electrical and Electronics Engineers Inc., pp. 1807-1813, 3rd IEEE International Conference on Big Data, IEEE Big Data 2015, Santa Clara, United States, 10/29/15. https://doi.org/10.1109/BigData.2015.7363953

Learning classifiers from remote RDF data stores augmented with RDFS subclass hierarchies. / Lin, Harris T.; Bui, Ngot; Honavar, Vasant.

Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015. ed. / Feng Luo; Kemafor Ogan; Mohammed J. Zaki; Laura Haas; Beng Chin Ooi; Vipin Kumar; Sudarsan Rachuri; Saumyadipta Pyne; Howard Ho; Xiaohua Hu; Shipeng Yu; Morris Hui-I Hsiao; Jian Li. Institute of Electrical and Electronics Engineers Inc., 2015. p. 1807-1813 7363953 (Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Learning classifiers from remote RDF data stores augmented with RDFS subclass hierarchies

AU - Lin, Harris T.

AU - Bui, Ngot

AU - Honavar, Vasant

PY - 2015/12/22

Y1 - 2015/12/22

N2 - Rapid growth of RDF data in the Linked Open Data (LOD) cloud offers unprecedented opportunities for analyzing such data using machine learning algorithms. The massive size and distributed nature of LOD cloud present a challenging machine learning problem where the data can only be accessed remotely, i.e. through a query interface such as the SPARQL end-point of the data store. Existing approaches to learning classifiers from RDF data in such a setting fail to take advantage of RDF schema (RDFS) associated with the data store that asserts subclass hierarchies which provide information that can potentially be exploited by the learner. Against this background, we present a general approach that augments an existing directed graphical model with hidden variables that encode subclass hierarchies via probabilistic constraints. We also present an algorithm ProbAVT that adopts the variational Bayesian expectation maximization approach to efficiently learn parameters in such settings. Our experiments with several synthetic and real world datasets show that: (i) ProbAVT matches or outperforms its counterpart that does not incorporate background knowledge in the form of subclass hierarchies; (ii) ProbAVT remains competitive compared to other state-of-art models that incorporate subclass hierarchies, and is able to scale up to large hierarchies consisting of over tens of thousands of nodes.

AB - Rapid growth of RDF data in the Linked Open Data (LOD) cloud offers unprecedented opportunities for analyzing such data using machine learning algorithms. The massive size and distributed nature of LOD cloud present a challenging machine learning problem where the data can only be accessed remotely, i.e. through a query interface such as the SPARQL end-point of the data store. Existing approaches to learning classifiers from RDF data in such a setting fail to take advantage of RDF schema (RDFS) associated with the data store that asserts subclass hierarchies which provide information that can potentially be exploited by the learner. Against this background, we present a general approach that augments an existing directed graphical model with hidden variables that encode subclass hierarchies via probabilistic constraints. We also present an algorithm ProbAVT that adopts the variational Bayesian expectation maximization approach to efficiently learn parameters in such settings. Our experiments with several synthetic and real world datasets show that: (i) ProbAVT matches or outperforms its counterpart that does not incorporate background knowledge in the form of subclass hierarchies; (ii) ProbAVT remains competitive compared to other state-of-art models that incorporate subclass hierarchies, and is able to scale up to large hierarchies consisting of over tens of thousands of nodes.

UR - http://www.scopus.com/inward/record.url?scp=84963749395&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84963749395&partnerID=8YFLogxK

U2 - 10.1109/BigData.2015.7363953

DO - 10.1109/BigData.2015.7363953

M3 - Conference contribution

T3 - Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015

SP - 1807

EP - 1813

BT - Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015

A2 - Luo, Feng

A2 - Ogan, Kemafor

A2 - Zaki, Mohammed J.

A2 - Haas, Laura

A2 - Ooi, Beng Chin

A2 - Kumar, Vipin

A2 - Rachuri, Sudarsan

A2 - Pyne, Saumyadipta

A2 - Ho, Howard

A2 - Hu, Xiaohua

A2 - Yu, Shipeng

A2 - Hsiao, Morris Hui-I

A2 - Li, Jian

PB - Institute of Electrical and Electronics Engineers Inc.

ER -

Lin HT, Bui N, Honavar V. Learning classifiers from remote RDF data stores augmented with RDFS subclass hierarchies. In Luo F, Ogan K, Zaki MJ, Haas L, Ooi BC, Kumar V, Rachuri S, Pyne S, Ho H, Hu X, Yu S, Hsiao MH-I, Li J, editors, Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015. Institute of Electrical and Electronics Engineers Inc. 2015. p. 1807-1813. 7363953. (Proceedings - 2015 IEEE International Conference on Big Data, IEEE Big Data 2015). https://doi.org/10.1109/BigData.2015.7363953