There has been relatively little analytical work on processor optimizations for multimedia applications. With the introduction of MMX by Intel, it is clear that this is an area of increasing importance. Building on previous work, we propose optimizations for multimedia architectures that support independent parallel execution of instructions within dynamically assembled traces, resulting in dramatic performance improvements. Specifically, we propose simplified instruction scheduling and register renaming algorithms due to constraints on trace formation. In addition, we suggest specific instruction pool and trace cache parameters. We constructed a simulator in order to measure the benefits of these processor optimizations for multimedia applications. The simulated machine, which could fetch/decode 2 instructions per cycle, performed better than a superscalar machine that could fetch/decode 8 instructions per cycle. Execution rates as high as 7.3 instructions per cycle were achieved for the benchmarks simulated, assuming 16 instructions per trace.