There is an emerging demand for the development of data fusion techniques and algorithms that are capable of combining conventional "hard" sensor inputs such as video, radar, and multispectral sensor data with "soft" data including textual situation reports, open-source web information, and "hard/soft" data such as image or video data that includes human-generated annotations. New techniques that assist in sense-making over a wide range of vastly heterogeneous sources are critical to improving tactical situational awareness in counterinsurgency (COIN) and other asymmetric warfare situations. A major challenge in this area is the lack of realistic datasets available for test and evaluation of such algorithms. While "soft" message sets exist, they tend to be of limited use for data fusion applications due to the lack of critical message pedigree and other metadata. They also lack corresponding hard sensor data that presents reasonable "fusion opportunities" to evaluate the ability to make connections and inferences that span the soft and hard data sets. This paper outlines the design methodologies, content, and some potential use cases of a COIN-based synthetic soft and hard dataset created under a United States Multi-disciplinary University Research Initiative (MURI) program funded by the U.S. Army Research Office (ARO). The dataset includes realistic synthetic reports from a variety of sources, corresponding synthetic hard data, and an extensive supporting database that maintains "ground truth" through logical grouping of related data into "vignettes."The supporting database also maintains the pedigree of messages and other critical metadata.