Learning under a Wasserstein loss, a.k.a. Wasserstein loss minimization (WLM), is an emerging research topic for gaining insights from a large set of structured objects. Despite being conceptually simple, WLM problems are computationally challenging because they involve minimizing over functions of quantities (i.e. Wasserstein distances) that themselves require numerical algorithms to compute. In this paper, we introduce a stochastic approach based on simulated annealing for solving WLMs. Particularly, we have developed a Gibbs sampler to approximate effectively and efficiently the partial gradients of a sequence of Wasserstein losses. Our new approach has the advantages of numerical stability and readiness for warm starts. These characteristics are valuable for WLM problems that often require multiple levels of iterations in which the oracle for computing the value and gradient of a loss function is embedded. We applied the method to optimal transport with Coulomb cost and the Wasserstein non-negative matrix factorization problem, and made comparisons with the existing method of entropy regularization.