Unstructured data extraction in distributed NoSQL

Richard Kwadzo Lomotey, Ralph Deters

Research output: Chapter in Book/Report/Conference proceedingConference contribution

1 Citation (Scopus)

Abstract

While 'Big data' has brought good tidings in terms of easy accessibility to voluminous data, we are faced with challenges too. The existing Knowledge Discovery in Database (KDD) processes which have been proposed for schema-oriented data sources are no longer applicable since todays data is unstructured. Previously, we deployed a tool called TouchR which relies on the Hidden Markov Model (HMM) to extract terms from unstructured data sources (specifically, NoSQL databases). This paper has advanced on the initially deployed version where we introduced re-usable dictionary and association rules to improve on the quality of the extracted terms. Also, the tool in its present stage is more adaptable to the user search based on the most frequently searched term.

Original languageEnglish (US)
Title of host publication2013 7th IEEE International Conference on Digital Ecosystems and Technologies
Subtitle of host publicationSmart Planet and Cyber Physical Systems as Embodiment of Digital Ecosystems, DEST 2013
Pages160-165
Number of pages6
DOIs
StatePublished - Oct 22 2013
Event2013 7th IEEE International Conference on Digital Ecosystems and Technologies: Smart Planet and Cyber Physical Systems as Embodiment of Digital Ecosystems, DEST 2013 - Menlo Park, CA, United States
Duration: Jul 24 2013Jul 26 2013

Publication series

NameIEEE International Conference on Digital Ecosystems and Technologies
ISSN (Print)2150-4938
ISSN (Electronic)2150-4946

Other

Other2013 7th IEEE International Conference on Digital Ecosystems and Technologies: Smart Planet and Cyber Physical Systems as Embodiment of Digital Ecosystems, DEST 2013
CountryUnited States
CityMenlo Park, CA
Period7/24/137/26/13

Fingerprint

Association rules
Hidden Markov models
Glossaries
Data mining
Big data

All Science Journal Classification (ASJC) codes

  • Computer Graphics and Computer-Aided Design
  • Computer Networks and Communications
  • Environmental Engineering

Cite this

Lomotey, R. K., & Deters, R. (2013). Unstructured data extraction in distributed NoSQL. In 2013 7th IEEE International Conference on Digital Ecosystems and Technologies: Smart Planet and Cyber Physical Systems as Embodiment of Digital Ecosystems, DEST 2013 (pp. 160-165). [6611347] (IEEE International Conference on Digital Ecosystems and Technologies). https://doi.org/10.1109/DEST.2013.6611347
Lomotey, Richard Kwadzo ; Deters, Ralph. / Unstructured data extraction in distributed NoSQL. 2013 7th IEEE International Conference on Digital Ecosystems and Technologies: Smart Planet and Cyber Physical Systems as Embodiment of Digital Ecosystems, DEST 2013. 2013. pp. 160-165 (IEEE International Conference on Digital Ecosystems and Technologies).
@inproceedings{f2ef0ddeaad14762b0f47a389b25bce5,
title = "Unstructured data extraction in distributed NoSQL",
abstract = "While 'Big data' has brought good tidings in terms of easy accessibility to voluminous data, we are faced with challenges too. The existing Knowledge Discovery in Database (KDD) processes which have been proposed for schema-oriented data sources are no longer applicable since todays data is unstructured. Previously, we deployed a tool called TouchR which relies on the Hidden Markov Model (HMM) to extract terms from unstructured data sources (specifically, NoSQL databases). This paper has advanced on the initially deployed version where we introduced re-usable dictionary and association rules to improve on the quality of the extracted terms. Also, the tool in its present stage is more adaptable to the user search based on the most frequently searched term.",
author = "Lomotey, {Richard Kwadzo} and Ralph Deters",
year = "2013",
month = "10",
day = "22",
doi = "10.1109/DEST.2013.6611347",
language = "English (US)",
isbn = "9781479907861",
series = "IEEE International Conference on Digital Ecosystems and Technologies",
pages = "160--165",
booktitle = "2013 7th IEEE International Conference on Digital Ecosystems and Technologies",

}

Lomotey, RK & Deters, R 2013, Unstructured data extraction in distributed NoSQL. in 2013 7th IEEE International Conference on Digital Ecosystems and Technologies: Smart Planet and Cyber Physical Systems as Embodiment of Digital Ecosystems, DEST 2013., 6611347, IEEE International Conference on Digital Ecosystems and Technologies, pp. 160-165, 2013 7th IEEE International Conference on Digital Ecosystems and Technologies: Smart Planet and Cyber Physical Systems as Embodiment of Digital Ecosystems, DEST 2013, Menlo Park, CA, United States, 7/24/13. https://doi.org/10.1109/DEST.2013.6611347

Unstructured data extraction in distributed NoSQL. / Lomotey, Richard Kwadzo; Deters, Ralph.

2013 7th IEEE International Conference on Digital Ecosystems and Technologies: Smart Planet and Cyber Physical Systems as Embodiment of Digital Ecosystems, DEST 2013. 2013. p. 160-165 6611347 (IEEE International Conference on Digital Ecosystems and Technologies).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

TY - GEN

T1 - Unstructured data extraction in distributed NoSQL

AU - Lomotey, Richard Kwadzo

AU - Deters, Ralph

PY - 2013/10/22

Y1 - 2013/10/22

N2 - While 'Big data' has brought good tidings in terms of easy accessibility to voluminous data, we are faced with challenges too. The existing Knowledge Discovery in Database (KDD) processes which have been proposed for schema-oriented data sources are no longer applicable since todays data is unstructured. Previously, we deployed a tool called TouchR which relies on the Hidden Markov Model (HMM) to extract terms from unstructured data sources (specifically, NoSQL databases). This paper has advanced on the initially deployed version where we introduced re-usable dictionary and association rules to improve on the quality of the extracted terms. Also, the tool in its present stage is more adaptable to the user search based on the most frequently searched term.

AB - While 'Big data' has brought good tidings in terms of easy accessibility to voluminous data, we are faced with challenges too. The existing Knowledge Discovery in Database (KDD) processes which have been proposed for schema-oriented data sources are no longer applicable since todays data is unstructured. Previously, we deployed a tool called TouchR which relies on the Hidden Markov Model (HMM) to extract terms from unstructured data sources (specifically, NoSQL databases). This paper has advanced on the initially deployed version where we introduced re-usable dictionary and association rules to improve on the quality of the extracted terms. Also, the tool in its present stage is more adaptable to the user search based on the most frequently searched term.

UR - http://www.scopus.com/inward/record.url?scp=84885820360&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84885820360&partnerID=8YFLogxK

U2 - 10.1109/DEST.2013.6611347

DO - 10.1109/DEST.2013.6611347

M3 - Conference contribution

AN - SCOPUS:84885820360

SN - 9781479907861

T3 - IEEE International Conference on Digital Ecosystems and Technologies

SP - 160

EP - 165

BT - 2013 7th IEEE International Conference on Digital Ecosystems and Technologies

ER -

Lomotey RK, Deters R. Unstructured data extraction in distributed NoSQL. In 2013 7th IEEE International Conference on Digital Ecosystems and Technologies: Smart Planet and Cyber Physical Systems as Embodiment of Digital Ecosystems, DEST 2013. 2013. p. 160-165. 6611347. (IEEE International Conference on Digital Ecosystems and Technologies). https://doi.org/10.1109/DEST.2013.6611347