To improve the diagnostic accuracy of cervical dysplasia,it is important to fuse multimodal information collected during a patient’s screening visit. However,current multimodal frameworks suffer from low sensitivity at high specificity levels,due to their limitations in learning correlations among highly heterogeneous modalities. In this paper,we design a deep learning framework for cervical dysplasia diagnosis by leveraging multimodal information. We first employ the convolutional neural network (CNN) to convert the low-level image data into a feature vector fusible with other non-image modalities. We then jointly learn the non-linear correlations among all modalities in a deep neural network. Our multimodal framework is an end-to-end deep network which can learn better complementary features from the image and non-image modalities. It automatically gives the final diagnosis for cervical dysplasia with 87.83% sensitivity at 90% specificity on a large dataset,which significantly outperforms methods using any single source of information alone and previous multimodal frameworks.