A semantic framework for information fusion in sensor networks for object and situation assessment is proposed. The overall vision is to construct machine representations that would enable human-like perceptual understanding of observed scenes via fusion of heterogeneous sensor data. In this regard, a hierarchical framework is proposed that is based on the Data Fusion Information Group (DFIG) model. Unlike a simple set-theoretic information fusion methodology that leads to loss of information, relational dependencies are modeled as cross-machines called relational Probabilistic Finite State Automata using the xD-Markov machine construction. This leads to a tractable approach for modeling composite patterns as structured sets for both object and scene representation. An illustrative example demonstrates the superior capability of the proposed methodology for pattern classification in urban scenarios.