TY - JOUR
T1 - I Spy You
T2 - Eavesdropping Continuous Speech on Smartphones via Motion Sensors
AU - Zhang, Shijia
AU - Liu, Yilin
AU - Gowda, Mahanth
N1 - Funding Information:
We sincerely thank the editors and reviewers for their comments and feedback. This research was partially supported by NSF grants: CNS-2008384.
Publisher Copyright:
© 2023 ACM.
PY - 2023/1/11
Y1 - 2023/1/11
N2 - This paper presents iSpyU, a system that shows the feasibility of recognition of natural speech content played on a phone during conference calls (Skype, Zoom, etc) using a fusion of motion sensors such as accelerometer and gyroscope. While microphones require permissions from the user to be accessible by an app developer, the motion sensors are zero-permission sensors, thus accessible by a developer without alerting the user. This allows a malicious app to potentially eavesdrop on sensitive speech content played by the user's phone. In designing the attack, iSpyU tackles a number of technical challenges including: (i) Low sampling rate of motion sensors (500 Hz in comparison to 44 kHz for a microphone). (ii) Lack of availability of large-scale training datasets to train models for Automatic Speech Recognition (ASR) with motion sensors. iSpyU systematically addresses these challenges by a combination of techniques in synthetic training data generation, ASR modeling, and domain adaptation. Extensive measurement studies on modern smartphones show a word level accuracy of 53.3 - 59.9% over a dictionary of 2000-10000 words, and a character level accuracy of 70.0 - 74.8%. We believe such levels of accuracy poses a significant threat when viewed from a privacy perspective.
AB - This paper presents iSpyU, a system that shows the feasibility of recognition of natural speech content played on a phone during conference calls (Skype, Zoom, etc) using a fusion of motion sensors such as accelerometer and gyroscope. While microphones require permissions from the user to be accessible by an app developer, the motion sensors are zero-permission sensors, thus accessible by a developer without alerting the user. This allows a malicious app to potentially eavesdrop on sensitive speech content played by the user's phone. In designing the attack, iSpyU tackles a number of technical challenges including: (i) Low sampling rate of motion sensors (500 Hz in comparison to 44 kHz for a microphone). (ii) Lack of availability of large-scale training datasets to train models for Automatic Speech Recognition (ASR) with motion sensors. iSpyU systematically addresses these challenges by a combination of techniques in synthetic training data generation, ASR modeling, and domain adaptation. Extensive measurement studies on modern smartphones show a word level accuracy of 53.3 - 59.9% over a dictionary of 2000-10000 words, and a character level accuracy of 70.0 - 74.8%. We believe such levels of accuracy poses a significant threat when viewed from a privacy perspective.
UR - http://www.scopus.com/inward/record.url?scp=85146435299&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85146435299&partnerID=8YFLogxK
U2 - 10.1145/3569486
DO - 10.1145/3569486
M3 - Article
AN - SCOPUS:85146435299
SN - 2474-9567
VL - 6
JO - Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
JF - Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
IS - 4
M1 - 197
ER -