Mobile devices such as smartphones are enabling users to generate and share videos with increasing rates. In some cases, these videos may contain valuable information, which can be exploited for a variety of purposes. However, instead of centrally collecting and processing videos for information retrieval, we consider crowdprocessing videos, where each mobile device locally processes stored videos. While the computational capability of mobile devices continues to improve, processing videos using deep learning, i.e., convolutional neural networks, is still a demanding task for mobile devices. To this end, we design and build CrowdVision, a computing platform that enables mobile devices to crowdprocess videos using deep learning in a distributed and energy-efficient manner leveraging cloud offload. CrowdVision can quickly and efficiently process videos with offload under various settings and different network connections and greatly outperform the existing computation offload framework (e.g., with a 2× speed-up). In doing so CrowdVision tackles several challenges: (i) how to exploit the characteristics of the computing of deep learning for video processing; (ii) how to parallelize processing and offloading for acceleration; and (iii) how to optimize both time and energy at runtime by just determining the right moments to offload.