Generative adversarial networks for increasing the veracity of big data

Matthew L. Dering, Conrad S. Tucker

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Scopus citations

Abstract

This work describes how automated data generation integrates in a big data pipeline. A lack of veracity in big data can cause models that are inaccurate, or biased by trends in the training data. This can lead to issues as a pipeline matures that are difficult to overcome. This work describes the use of a Generative Adversarial Network to generate sketch data, such as those that might be used in a human verification task. These generated sketches are verified as recognizable using a crowd-sourcing methodology, and finds that the generated sketches were correctly recognized 43.8% of the time, in contrast to human drawn sketches which were 87.7% accurate. This method is scalable and can be used to generate realistic data in many domains and bootstrap a dataset used for training a model prior to deployment.

Original languageEnglish (US)
Title of host publicationProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
EditorsJian-Yun Nie, Zoran Obradovic, Toyotaro Suzumura, Rumi Ghosh, Raghunath Nambiar, Chonggang Wang, Hui Zang, Ricardo Baeza-Yates, Ricardo Baeza-Yates, Xiaohua Hu, Jeremy Kepner, Alfredo Cuzzocrea, Jian Tang, Masashi Toyoda
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages2595-2602
Number of pages8
ISBN (Electronic)9781538627143
DOIs
StatePublished - Jul 1 2017
Event5th IEEE International Conference on Big Data, Big Data 2017 - Boston, United States
Duration: Dec 11 2017Dec 14 2017

Publication series

NameProceedings - 2017 IEEE International Conference on Big Data, Big Data 2017
Volume2018-January

Other

Other5th IEEE International Conference on Big Data, Big Data 2017
CountryUnited States
CityBoston
Period12/11/1712/14/17

All Science Journal Classification (ASJC) codes

  • Computer Networks and Communications
  • Hardware and Architecture
  • Information Systems
  • Information Systems and Management
  • Control and Optimization

Fingerprint Dive into the research topics of 'Generative adversarial networks for increasing the veracity of big data'. Together they form a unique fingerprint.

  • Cite this

    Dering, M. L., & Tucker, C. S. (2017). Generative adversarial networks for increasing the veracity of big data. In J-Y. Nie, Z. Obradovic, T. Suzumura, R. Ghosh, R. Nambiar, C. Wang, H. Zang, R. Baeza-Yates, R. Baeza-Yates, X. Hu, J. Kepner, A. Cuzzocrea, J. Tang, & M. Toyoda (Eds.), Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017 (pp. 2595-2602). (Proceedings - 2017 IEEE International Conference on Big Data, Big Data 2017; Vol. 2018-January). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/BigData.2017.8258219