In this work, we propose a novel method of implementing an ultrafast ultrasound beamformer for plane wave imaging (PWI) on a field programmable gate array (FPGA). First, a modified delay calculation method was proposed to (1) separate the transmit and receive delay, (2) reduce the size of delay profile, and (3) enable parallel beamforming by delay reuse and data vectorization. Second, a parallelized implementation of beamformer on single FPGA was proposed by (1) loading pre-calculated delay profile from external memory instead of calculating delay on run-time, (2) vectorizing channel data fetching, (3) compensating transmit and receive delays separately, and (4) using fixed summing networks to reduce consumption of logic resources. The proposed method was also highly scalable, which was demonstrated by implementing the beamformer with different beamforming rates ranging from 2.4 G to 9.6 G samples per second to three different sizes of FPGAs ranging from entry-level FPGA to high-end FPGA. The power consumption was less than 3 watts for 2.4 G samples per second beamforming rate, which demonstrates the possibility of implementing ultrafast ultrasound imaging on handheld probe. The FPGA beamformer’s results were compared with Verasonics CPU beamformer’s result to verify that the image quality was not compromised for speed.