As product designs become more sophisticated, both the finite element (FE) and the design optimization (DO) models have grown bigger. At the same time there is increasing evidence that computing clusters created with commodity chips are capable of outperforming traditional supercomputers. In this paper, the HYI-3D design optimization software system is discussed. HYI-3D has been developed to work in sequential and distributed processing environments. The two major objectives are as follows - (a) implement the FE methodology using the well-known domain decomposition technique where the original FE model is split into a number of smaller subdomains, and (b) implement a design optimization methodology for sizing, shape and topology optimization using coarse and fine-grain parallelism. For the FE engine, a direct sparse solver is used at the subdomain level and preconditioned conjugate gradient (PCG) is used at the interface system level. The finite element equations are then generated and assembled at the individual domain level in parallel. Matrix and vector operations involving sparse matrices form the bulk of the computations in this step. Once these equations are assembled, the condensed system level equations are formed. These condensed system level equations are usually much smaller (but denser) than the original system equations, and hence can be computationally expensive. With respect to design optimization, multi-level parallelism is employed. Not only can the finite element analysis be carried out in parallel but also other steps in the design optimization algorithm can be computed in parallel - gradients, line search and direction-finding problem. Numerical examples show the gains obtained from coarse and fine grain parallelism.