Text extraction from smartphone screenshots to archive in situ Media Behavior

Agnese Chiatti, Xiao Yang, Miriam Brinberg, Mu Jung Cho, Anupriya Gagneja, Nilam Ram, Byron Reeves, C. Lee Giles

Research output: Chapter in Book/Report/Conference proceedingConference contribution

8 Scopus citations

Abstract

Life experiences are increasingly intertwined with digital devices, suggesting screens as a preferred, if not required, data source for behavioral studies and health interventions. Text Information Extraction from digital screenshots is then a key prerequisite to the overall accuracy of analyses regarding media behaviors. This unique image data set offers the opportunity i) to test existing Image Processing and Text Recognition methods, and ii) to identify and discuss the computational challenges specific to the considered case. Our aim is to assess whether and how state-of-the-art methodologies can be applied to this novel data set. We show how combining OpenCV-based pre-processing with a Long short-term memory (LSTM) based release of Tesseract OCR, without ad hoc training, ensured a 74% text accuracy at the character level. The implications and incidence of different error factors on the resulting quality of text are discussed, prompting the discussion of future research trajectories.

Original languageEnglish (US)
Title of host publicationProceedings of the Knowledge Capture Conference, K-CAP 2017
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450355537
DOIs
StatePublished - Dec 4 2017
Event9th International Conference on Knowledge Capture, K-CAP 2017 - Austin, United States
Duration: Dec 4 2017Dec 6 2017

Publication series

NameProceedings of the Knowledge Capture Conference, K-CAP 2017

Other

Other9th International Conference on Knowledge Capture, K-CAP 2017
CountryUnited States
CityAustin
Period12/4/1712/6/17

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Computational Theory and Mathematics
  • Software
  • Computer Science Applications
  • Information Systems

Cite this

Chiatti, A., Yang, X., Brinberg, M., Cho, M. J., Gagneja, A., Ram, N., Reeves, B., & Giles, C. L. (2017). Text extraction from smartphone screenshots to archive in situ Media Behavior. In Proceedings of the Knowledge Capture Conference, K-CAP 2017 [40] (Proceedings of the Knowledge Capture Conference, K-CAP 2017). Association for Computing Machinery, Inc. https://doi.org/10.1145/3148011.3154468