Multiplication is the dominant operation for many applications implemented on field-programmable gate arrays (FPGAs). Although most current FPGA families have embedded hard multipliers, soft multipliers using lookup tables (LUTs) in the logic fabric remain important. This paper presents a novel circuit that combines radix-4 partial-product generation with addition (patent pending) and shows how it can be used to implement two's-complement multipliers. Single-cycle and pipelined designs for 8×8, 10×10, 12×12, 14×14 and 16×16 multipliers are compared to Xilinx LogiCORE IP multipliers. Proposed single-cycle parallel-tree multipliers use 35% to 45% fewer LUTs and have 9% to 22% less delay than LogiCORE IP multipliers. Proposed pipelined parallel-tree multipliers use 32% to 40% fewer LUTs than LogiCORE IP multipliers. Proposed parallel-array multipliers use even fewer LUTs than parallel-tree multipliers at the expense of increased delay.