Predicting user confusion can help improve information presentation on websites, mobile apps, and virtual reality interfaces. One promising information source for such prediction is eye-tracking data about gaze movements on the screen. Coupled with think-aloud records, we explore if user's confusion is correlated with primarily fixation-level features. We find that random forest achieves an accuracy of more than 70% when prediction user confusion using only fixation features. In addition, adding user-level features (age and gender) improves the accuracy to more than 90%. We also find that balancing the classes before training improves performance. We test two balancing algorithms, Synthetic Minority Over Sampling Technique (SMOTE) and Adaptive Synthetic Sampling (ADASYN) finding that SMOTE provides a higher performance increase. Overall, this research contains implications for researchers interested in inferring users' cognitive states from eye-tracking data.