H.264/AVC standard has been widely used in video compression at various kinds of application domain. Motion estimation takes the most calculation workload of H.264/AVC encoder. Memory optimization has played an even more important role in encoder design. Firstly, dependency relation between motion vectors was analyzed and removed at a little cost of estimation accuracy decrement, and then a 3-stage macro-block level pipeline architecture was proposed to increase parallel process ability of motion estimation. Then an optimized memory organization strategy of reference frame data was put forward, aiming at avoiding row changing frequently in SDRAM access. Finally, based on the 3-stage pipeline structure, a shared cyclic search window memory was proposed: 1) data relativity between adjacent macro-block was analyzed, 2) and search window memory size was elaborated, 3) and then a slice based structure and the work process were discussed. Analysis and experiment result show that 50% of on chip memory resource and cycles for off chip SDRAM access can be saved. The whole design was implemented with Verilog HDL and integrated into a H.264 encoder, which can demo 1280*720@30 video successfully at frequency of 120MHz under a cyclone III FPGA development board.