Recent trends indicate that system intelligence is moving from main computational units to peripherals. In particular, several studies show the feasibility of building an intelligent disk architecture by executing some parts of the application code on an embedded processor attached to the disk system. This paper focuses on such an architecture and addresses the problem of what parts of the application code should be executed on the embedded processor attached to the disk system. Our focus is on image and video processing applications where large data sets (mostly arrays) need to be processed. To decide the work division between the disk system and the host system, we use an optimizing compiler to identify computations that exhibit a filtering characteristic; i.e., their output data sets are much smaller than their input data sets. By performing such computations on the disk, we reduce the data volume that need to be communicated from the disk to the host system substantially. Our experimental results show significant improvements in execution cycles of six applications.