Ubiquitous finger motion tracking enables a number of exciting applications in augmented reality, sports analytics, rehabilitation-healthcare etc. While finger motion tracking with cameras is very mature, largely due to availability of massive training datasets, there is a dearth of training data for developing robust machine learning (ML) models for wearable IoT devices with Inertial Measurement Unit (IMU) sensors. Towards addressing this problem, this paper presents ZeroNet, a system that shows the feasibility of developing ML models for IMU sensors with zero training overhead. ZeroNet harvests training data from publicly available videos for performing inferences on IMU. The difference in data among video and IMU domains introduces a number of challenges due to differences in sensor-camera coordinate systems, body sizes of users, speed/orientation changes during gesturing, sensor position variations etc. ZeroNet addresses these challenges by systematically extracting motion data from videos and transforming them into acceleration and orientation information measured by IMU sensors. Furthermore, data-augmentation techniques are exploited that create synthetic variations in the harvested training data to enhance the generalizability and robustness of the ML models to user diversity. Evaluation with 10 users demonstrates a top-1 accuracy of 82.4% and a top-3 accuracy of 94.8% for recognition of 50 finger gestures thus indicating promise. While we have only scratched the surface, we outline a number of interesting possibilities for extending this work in the cross-disciplinary areas of computer vision, machine learning, and wearable IoT for enabling novel applications in finger motion tracking.