We consider an agent-target assignment problem in an unknown environment modeled as an undirected graph. Agents incur cost or reward while traveling on the edges of this graph. Agents do not know the graph or the locations of the targets on it. However, they can obtain local information about these by local sensing and communicating with other agents within a limited range. To solve this problem, we come up with a new distributed algorithm that integrates Q-Learning and a distributed auction. The Q-Learning part helps estimate the assignment benefits calculated by summing up rewards over the graph edges for each agent-target pair, while the auction part takes care of assigning agents to targets in a distributed fashion. The algorithm is shown to terminate with a near-optimal assignment in a finite time. Optimality refers to the assignment benefit maximization, which can depend on a target-agent pair value, and the routing cost of the agent to visit the target.