In recent years, significant research efforts have been spent towards building intelligent and user-friendly IoT systems to enable a new generation of applications capable of performing complex sensing and recognition tasks. In many of such applications, there are usually multiple different sensors monitoring the same object. Each of these sensors can be regarded as an information source and provides us a unique “view” of the observed object. Intuitively, if we can combine the complementary information carried by multiple sensors, we will be able to improve the sensing performance. Towards this end, we propose DeepFusion, a unified multi-sensor deep learning framework, to learn informative representations of heterogeneous sensory data. DeepFusion can combine different sensors’ information weighted by the quality of their data and incorporate cross-sensor correlations, and thus can benefit a wide spectrum of IoT applications. To evaluate the proposed DeepFusion model, we set up two real-world human activity recognition testbeds using commercialized wearable and wireless sensing devices. Experiment results show that DeepFusion can outperform the state-of-the-art human activity recognition methods.