This paper describes the design and implementation of high performance micro-grained architectures. These architectures are capable of teraops performance. Each architecture is organized as a systolic array of processors. A prototyping system for the architectures is proposed. The prototyping system will provide control, I/O, and an interface to a host system for each of the micro-grained architectures. The prototyping system has been designed with flexibility in mind to support a wide variety of these micro-grained architectures. Beyond the research outlined here, we anticipate using the prototyping system as a `test-bed' for various class/student VLSI design projects within the department. Three micro-grained architectures are described: an associative memory-based architecture, a Mux-based architecture and a RAM-based architecture. These architectures will be useful for solving a number of important problems, such as: edge detection, locating connected components, two-dimensional signal and image processing, sorting elements, and performing element permutations.