Abstract
The last decade has witnessed a huge increase in data being ingested into the cloud from a variety of data sources. The ingested data takes various forms such as JSON, CSV, and binary formats. Traditionally, data is either ingested into storage in raw form, indexed ad-hoc using range indices, or cooked into analytics-friendly columnar formats. None of these solutions is able to handle modern requirements on storage: making the data available immediately for ad-hoc and streaming queries while ingesting at extremely high throughputs. We demonstrate FishStore, our opensource concurrent latch-free storage layer for data with exible schema. FishStore builds on recent advances in parsing and indexing techniques, and is based on multi-chain hash indexing of dynamically registered predicated subsets of data. We find predicated subset hashing to be a powerful primitive that supports a broad range of queries on ingested data and admits a higher performance (by up to an order of magnitude) implementation than current alternatives.
Original language | English (US) |
---|---|
Pages (from-to) | 1922-1925 |
Number of pages | 4 |
Journal | Proceedings of the VLDB Endowment |
Volume | 12 |
Issue number | 12 |
DOIs | |
State | Published - 2018 |
Event | 45th International Conference on Very Large Data Bases, VLDB 2019 - Los Angeles, United States Duration: Aug 26 2017 → Aug 30 2017 |
All Science Journal Classification (ASJC) codes
- Computer Science (miscellaneous)
- Computer Science(all)