Efficient and robust routing is central to wireless sensor networks (WSN) that feature energy-constrained nodes, unreliable links, and frequent topology change. While most existing routing techniques are designed to reduce routing cost by optimizing one goal, e.g., routing path length, load balance, re-transmission rate, etc, in real scenarios however, these factors affect the routing performance in a complex way, leading to the need of a more sophisticated scheme that makes correct trade-offs. In this paper, we present a novel routing scheme, AdaR that adaptively learns an optimal routing strategy, depending on multiple optimization goals. We base our approach on a least squares reinforcement learning technique, which is both data efficient, and insensitive against initial setting, thus ideal for the context of ad-hoc sensor networks. Experimental results suggest a significant performance gain over a naïve Q-learning based implementation.