Recent advances in attention networks have gained enormous interest in time series data mining. Various attention mechanisms are proposed to soft-select relevant timestamps from temporal data by assigning learnable attention scores. However, many real-world tasks involve complex multivariate time series that continuously measure target from multiple views. Different views may provide information of different levels of quality varied over time, and thus should be assigned with different attention scores as well. Unfortunately, the existing attention-based architectures cannot be directly used to jointly learn the attention scores in both time and view domains, due to the data structure complexity. Towards this end, we propose a novel multi-view attention network, namely MuVAN, to learn fine-grained attentional representations from multivariate temporal data. MuVAN is a unified deep learning model that can jointly calculate the two-dimensional attention scores to estimate the quality of information contributed by each view within different timestamps. By constructing a hybrid focus procedure, we are able to bring more diversity to attention, in order to fully utilize the multi-view information. To evaluate the performance of our model, we carry out experiments on three real-world benchmark datasets. Experimental results show that the proposed MuVAN model outperforms the state-of-the-art deep representation approaches in different real-world tasks. Analytical results through a case study demonstrate that MuVAN can discover discriminative and meaningful attention scores across views over time, which improves the feature representation of multivariate temporal data.