FishStore: Fast ingestion and indexing of raw data

Badrish Chandramouli, Dong Xie, Yinan Li, Donald Kossmann

Research output: Contribution to journalConference articlepeer-review

1 Scopus citations

Abstract

The last decade has witnessed a huge increase in data being ingested into the cloud from a variety of data sources. The ingested data takes various forms such as JSON, CSV, and binary formats. Traditionally, data is either ingested into storage in raw form, indexed ad-hoc using range indices, or cooked into analytics-friendly columnar formats. None of these solutions is able to handle modern requirements on storage: making the data available immediately for ad-hoc and streaming queries while ingesting at extremely high throughputs. We demonstrate FishStore, our opensource concurrent latch-free storage layer for data with exible schema. FishStore builds on recent advances in parsing and indexing techniques, and is based on multi-chain hash indexing of dynamically registered predicated subsets of data. We find predicated subset hashing to be a powerful primitive that supports a broad range of queries on ingested data and admits a higher performance (by up to an order of magnitude) implementation than current alternatives.

Original languageEnglish (US)
Pages (from-to)1922-1925
Number of pages4
JournalProceedings of the VLDB Endowment
Volume12
Issue number12
DOIs
StatePublished - 2018
Event45th International Conference on Very Large Data Bases, VLDB 2019 - Los Angeles, United States
Duration: Aug 26 2017Aug 30 2017

All Science Journal Classification (ASJC) codes

  • Computer Science (miscellaneous)
  • Computer Science(all)

Fingerprint

Dive into the research topics of 'FishStore: Fast ingestion and indexing of raw data'. Together they form a unique fingerprint.

Cite this