技術探索

A Low-complexity Coding Algorithm for H.264/AVC

  Abstract—In the present video coding standards, the capacities of frame memory and bus bandwidth are the major aspects that affect the performance of hardware. Frame memory compression(FMC) is a technique using spatial correlation of frame for reducing memory bandwidth to improve the using performance of memory and bus. This paper proposes a novel high compression rate, but low computation complexity coding scheme for the FMC.The proposed approach uses the spatial correlation between pixels through special compression’s order, and employs Golomb-Rice coding to encode the magnitude of pixel difference according to its occurring frequency. The rate-control of encoded bit streams is implemented with the concept of GOB (Group of Blocks), to achieve higher compression efficiency goal.

  Index Terms—FMC, GOB, Golomb-Rice Coding, videocompression

I. INTRODUCTION

  THE capacities of frame memory and bus bandwidth are major aspects that significantly affect the performance of hardware for present main standard video compression.The video processor is responsible to process the video data, such as motion estimation, intra prediction in MPEG-4, H.264/AVC codec. The overall performance of this multimedia hardware architecture depends on the capacity of external storage and system bus bandwidth as a result of the video processor accesses the reference frames that store in the external video frame memory through the system bus.

Fig. 1. An example of video processing hardware architecture with the FMC hardware units.Fig. 1. An example of video processing hardware architecture with the FMC hardware units.

  A valid approach to improve the system’s efficacy is to embed frame memory compression (FMC) hardware unit between the video processor and the system bus as shown in Figure 1. The FMC encoder provides high efficient compression to frame data that processed by the video processor, reduces the data transfer rate, and lowers the usage of the system bus bandwidth and memory. The video processor reads the compressed reference frame data from the external video frame memory and passes them to the FMC decoder to reconstruct the original reference frame data. Even though the transmission data between the system bus and the external memory can be reduced with implementing the FMC algorithm,the latency for the encoding and the decoding process of FMC also delays the data process and transmission time. Meanwhile,  

  C.-C. Lin, J.-S. Tu, Y.-J. Chang, and C.-L. Lin are in the Department of Information and Communications Research Laboratories, Industrial Technology Research Institute, Hsinchu, Taiwan. (e-mail: JackLin@itri.org.tw,sunriseJSTu@itri.org.tw, britpablo@itri.org.tw, chunlung@itri.org.tw).the hardware complexity of the video processor increases. If the FMC lowers the latency by using lossy compression, the quality of the reference frame will degrade, and the error propagation of the distortion will significantly affect the video encoding system’s performance. In other words, an efficient FMC algorithm includes four properties, low latency, random accessibility, low complexity and near lossless.

  Many researches have been made to achieve these four purposes. Transforming pixels from spatial domain into frequency domain, and using the energy aggregation in frequency domain for efficient compression is a common technique. For example, pixels are transformed into frequency domain by hierarchically conversion, such as Haar transform,and then the encoder quantizes the frequency domain coefficients followed by variable length encoding [3]. Previous techniques also take different consideration like pixel data of greyscale [4], and compression by fixed length or variable length encoding. For compression efficiency and low complexity, another example is to utilize a modified Hadamard transform and Golomb-Rice coding. The compression ratio is fixed at 50% [6] [7]. Frame memory compression techniques for a specific purpose have been proposed. For display devices,the modified Hadamard transform followed by Golomb-Rice coding have been used to achieve memory reduction for display devices [5] [8]. For mobile video applications, the discrete cosine transform and the modified bit plane zonal coding are employed for transformation and compression [10][16].However, the transform-based approach, in general, not only needs many additional computational resources but also highly complicates the hardware complexity. In addition, this external storage, but also improves the efficiency of accessing the system bus and the external storage. The proposed algorithm can apply to the present video compression standard such as H.264/AVC, to improve the performance of a video compression or decompression hardware system. At the same time, the quality of video is guaranteed, the reference frames can be restored in lossless way and the error propagation of the distortion can be prevented. The proposed FMC algorithm makes effective use of spatial correlation between pixels through special compression’s order, and employs variable length coding (Golomb-Rice encoding) approach. In addition to encode the magnitude of pixel difference optimally according to its occurring frequency, the concept of GOB(Group of Blocks) is implemented for rate control. An advantage resulting from GOB is that the bits can be shared in a group and improve the usage of bits.

II. PROPOSED FMC ALGORITHM

  The proposed algorithm is designed for input video compression, the video can be a static image or dynamic frame from a video, and can be applied for any application that required video compression. The FMC compression is a block-based compression, input of the compression unit is a block of the video. Figure 2 shows the flowchart of the proposed algorithm,including quantization, prediction,Golomb-Rice coding,padding,cutting and GOB-based rate control.The bit removed in quantization step will be stored for packing step afterward, the details will be explained later. In Golomb-Rice coding step, a modified Golomb-Rice coding has been applied for better compression efficiency in the algorithm even other variable length coding approaches are also acceptable.

  Quantization quantities every pixel and removes the bits from the binary representation of pixel value. The number of the bits to be removed is determined through QP (Quantization Parameter), which is assigned by users. No bits will be removed when QP equals to 0, and 1 bit will be removed when QP equals to 1. Figure 3 shows how the quantization process removes the bits. In the binary representation of the pixel value, the order of removing bits starts from the least significant bit to the most significant. In other word, the least significant bit will be removed at first, and the most significant bit will be the last.The quantization step in the proposed algorithm will retain the removed bits, while others quantization step will drop the bits.In the packing step, if there is space left in the result of the compression, the retained bits can be used for compensation.

Fig. 2. A flowchart of the proposed FMC algorithmFig. 2. A flowchart of the proposed FMC algorithm
Fig. 3. An example of quantization process.Fig. 3. An example of quantization process.

  A new prediction structure has been proposed here, the prediction uses the boundary and the internal pixel values as shown in Figure 4. Every rectangular box in Figure 4 represents a pixel and the number in the box is the coding order. The prediction starts from the coding order 0 pixel (bottom right corner pixel) which is predicted from the left adjacent pixel (L3)and the upper adjacent pixel (U3). Next, the coding order 1 pixel is predicted from the coding order 0 pixel and the upper adjacent pixel (U3). Then, the coding order 2 pixel can be predicted from the coding order 0 pixel and the coding order 1 pixel. All the predictions are performed according to the order of the prediction structure in Figure 4. The magnitude of the prediction error is residual value, and the residual values of all the pixels constitute a residual map. The residual values are small and focused on few specific values when the correlation between two pixels is higher. Due to this characteristic, the algorithm adopts the concept of Golomb-Rice coding and modifies Golomb-Rice coding for probability distribution to compress the residual values efficiently. If none of the adjacent pixel value is available, the internal encoded pixel value is the only input for the prediction as shown in Figure 5. If only one of the upper adjacent pixel or the left adjacent pixel is available,the prediction works with available adjacent pixel values(upper or left) and the internal encoded pixel value (see Figure6 ).

Fig. 4. An example of the proposed prediction structure.Fig. 4. An example of the proposed prediction structure.
Fig. 5. An example of the proposed prediction structure without any adjacent pixel value.Fig. 5. An example of the proposed prediction structure without any adjacent pixel value.
Fig. 6. An example of the proposed prediction structure when only the left adjacent pixel values are valid for prediction.Fig. 6. An example of the proposed prediction structure when only the left adjacent pixel values are valid for prediction.

  Golomb-Rice coding is a variable length coding, it encodes the most frequent occurrence to the shortest bit stream whereas the less frequent occurrence will use longer bit stream.Therefore, the performance of Golomb-Rice coding depends on the distribution of the input data. The proposed algorithm provides four probability distributions of Golomb-Rice coding according to the statistics and each probability distribution corresponding to a Golomb-Rice coding table. Each residual value in a compression unit adopts different Golomb-Rice coding table for variable length coding. Each entry of the Golomb-Rice coding table includes at least two columns which are coding value and code word. Association between the coding value and the code word is many to one, in other words,different coding value can associate with the same code word.Numbers of the entries can be reduced by replacing numerous coding values with a represented coding value. Each table includes a special entry, which code word is all 0 or 1.

  Distortion is unavoidable from the quantization step for each pixel coding result, and the maximum distortion can go to 1 bit error. Since Golomb-Rice coding result is variable length, the result of the encoded bit streams might be shorter than the maximum allowed bit length. Therefore, the removed bits at the quantization step can be used as padding bits to compensate the distortion if these situations occur. The proposed FMC algorithm applies an important compensation order using the bit importance, bits will be remanded according the order until all pixels compensation done or reach the maximum length allowed

  For the situation Golomb-Rice coding resulting a bit stream exceeds the maximum allowed length for each pixel, the algorithm proposes an overflow cutting. The cutting removes the code words or the residual value’s code word that exceed the limited length. Newly useable space after the cutting will be filled up by another suitable code word, which length is shorter than the useable space and the residual value error of the pixel is the lowest. Repeating the padding step until no available suitable code word or all pixels are encoded. If there is a space left when the cutting step is done, padding step will compensate the distortion with the space. At last, the remaining space will fill by all zero.

  In order to achieve high compression performance, random accessibility, and bit rate control purposes, the FMC algorithm employs the concept of Group of Blocks (GOB). As shown in Figure 7, a block is a compression unit, and a group of blocks is a random access unit and also a rate control unit. Blocks in the same group share parts of the useable bit spaces, if one of a block has high compression efficiency, in other word, the encoded bit stream of that block is short, then the remaining space of it will be included into a shared pool. When other blocks in the same group need space more than the available space, spaces available in the shared pool can be provided to them. The GOB-Based bit rate control mechanism is as follows.The blocks in the same GOB will provide shared bit rate si to be shared before compression. Let SP(i) be the total shared bit rate and the initial value is  T(i) is the target bit rate,pi is the useable bit rate after the target bit rate deduct the sharedbit rate pi = T(i) - si , block_len(i) is the i-th block’s result of the bit stream length. The total of the useable bit rate can becalculated as follows:

Fig. 7. A conceptual diagram of Group of Blocks.Fig. 7. A conceptual diagram of Group of Blocks.

III. EXPERIMENTAL RESULTS

  Implementation of the proposed algorithm and the previous work approach [14] has been used for comparison.The two algorithms are evaluated with the settings in Table 1, which are the eight common used 1080p video sequences: Pedestrian Area, Blue Sky, Riverbed, Tractor, Toys and Calendar, Rush Hour, Station2 and Sunflower. Each video sequence has different number of frames, from 217 frames to 690 frames.Comparison of the two algorithms is done under QP=0 and QP=1. The experimental results of the two algorithms are shown in Table 2. The best result is all blocks can encode with 100% of QP=0, which means all blocks are compressed lossless and meet the target bit rate. In other word, the ratio of QP=0 stands for the ratio of blocks that can be compressed lossless under the condition of the target bit rate. Higher total ratio in Table 2 represents more blocks can be encoded with slightly distortion (QP=1) under conforming the target bit rate.The result shows that the proposed FMC algorithm has higher ratio of blocks can be compressed lossless. Compared with the method of [14], the total ratio of the proposed algorithm is also higher, and it can always reach 94%.

  Size of GOB is also an aspect that affects the performance of the algorithm; therefore, different size of GOB is also adopted with our proposed FMC algorithm for the experiments. The result shows that 16x16 GOB is the best size for the compression efficiency of the proposed algorithm, which is to say the rate control based on GOB concept can positively improve the general compression efficiency.

Table 1. Settings of the evaluationsTable 1. Settings of the evaluations
Table 2. The experimental results of the proposed algorithm and [14] under QP=0 and QP=1 testing conditions.Table 2. The experimental results of the proposed algorithm and [14] under QP=0 and QP=1 testing conditions.
Table 3. Settings of the evaluationsTable 3. Settings of the evaluations

IV. CONCLUSIONS

  This paper proposes a low computation complexity coding scheme for the frame memory compression. The proposed FMC algorithm achieves higher compression efficiency than other previous work. By holding the removed bit at the quantization step for compensation, the distortion can be reduced to the lowest. Residual values resulted from the prediction through the internal and the boundary pixels are encoded by Golomb-Rice coding. Golomb-Rice coding can be replaced by other variable length coding. However, four Golomb-Rice coding tables designed according to the probability distribution in the proposed algorithm are provided to improve the coding efficiency. In the proposed approach, the rate-control of the encoded bit streams is implemented with the concept of GOB, to achieve higher compression efficiency goal.

  As a result, higher compression efficiency is achieved than with previous work [14]. The proposed algorithm will provide better performance with 16x16 GOB.

REFERENCES

  1. Oscal Tzyh-Chiang Chen et al., "Method for Reducing Buffered-Frame Memory Sizes and Accesses in a Video Codec", Jan. 2005 (US 2006/0171685 A1)
  2. Oscal Tzyh-Chiang Chen et al., "Method for Reducing Buffered-Frame Memory Sizes and Accesses in a Video Codec", Feb. 2005 (TW I277013(2))
  3. Faramarz Azadegan et al., " System and Method of Video Frame Memory Reduction of Video Decoders", Jan. 2000 (US 6,693,961 B1)
  4. Jeongnam Youn, "Lossy frame memory compression using intra refresh",Mar. 2008 (US 2009/0245391 A1)
  5. T.B. Yng. B.k. Lee; H. Yoo; , "A low complexity and lossless frame memory compression for display devices," Consumer Electronics, IEEE Transactions on, vol. 54, no. 3, pp. 1453-1458, August 2008.
  6. T. Y. Lee, "A new frame-recompression algorithm and its hardware design for MPEG-2 video decoders," Circuits and Systems for Video Technology, IEEE Transactions on, vol. 13, no. 6, pp. 529 - 534, June 2003.
  7. T. Y. Lee, "A new algorithm and its implementation for frame recompression," Consumer Electronics, IEEE Transactions on , vol. 47,no. 4, pp. 849-854, Nov 2001.
  8. T. L. B. Yng, B.-G. Lee, H. Yoo, "Low Complexity, Lossless Frame Memory Compression using Modified Hadamard Transform and Adaptive Golomb-Rice Coding," IADIS International Conference Computer Graphics and Visualization, pp. 89-96, 2008.
  9. Y. Lee, C.-E. Rhee, H.-J. Lee, "A New Frame Recompression Algorithm Integrated with H.264 Video Compression," Circuits and Systems, IEEE International Symposium on, pp.1621-1624, May 2007.
  10. Y. D. Wu, Y. Li, and C. Y. Lee, “A Novel Embedded Bandwidth-Aware Frame Compressor for Mobile Video Applications,” IEEE Intelligent Signal Processing and Communication System, pp. 1-4, Feb. 2009.
  11. S.-H. Lee, M.-K. Chung, S.-M. Park, C.-M. Kyung, "Lossless frame memory recompression for video codec preserving random accessibility of coding unit," Consumer Electronics, IEEE Transactions on, vol. 55, no.4, pp. 2105-2113, Nov. 2009
  12. S. Lee, N. Eum, M.-K. Chung, C.-M. Kyung, "Low latency variable length coding scheme for frame memory recompression," Multimedia and Expo, IEEE International Conference on, pp.232-237, July 2010.
  13. C.-C. Chen, S.-S. Chen, O.T.C. Chen, C.-F. Chen, and F.-C. Chen,"Effective memory reduction scheme used for reference frames in H.264/AVC video codec," Circuits and Systems, IEEE International Symposium on, pp. 1549- 1552, Aug. 2005.
  14. Y. Jin, Y. Lee, and H.-J. Lee, “A New Frame Memory Compression Algorithm with DPCM and VLC in a 4×4 Block,” EURASIP Journal on Advances in Signal Processing, vol. 2009, Article ID 629285, 18 pages,2009.
  15. N.Y.-C. Chang, T. -S. Chang, "Combined frame memory architecture for motion compensation in video decoding", IEEE International Symposium on Circuits and Systems, vol. 2, pp. 1806-1809, May, 2005
  16. M. Choi, J. Kim, I. J. Chang, W.-K. Cho, "Low-complexity frame scheduler using shared frame memory for multi-view video coding",International SoC Design Conference, pp. 498-502, Nov. 2012

Ching-Chieh Lin received the M.S.degree and in electronic engineering from National Central University, Taiwan. He joined Industrial Technology Research Institute (ITRI), Hsinchu, Taiwan, in 2009. He is also an active participant of ISO/IEC Moving Picture Experts Group(MPEG) and Joint Collaborative Team on Video Coding (JCT-VC). His current research interests include image processing and video processing.

Jih-Sheng Tu received the B.S.degree in computer and information science from National Chiao Tung University,Hsinchu, Taiwan, R.O.C., in 2005, and the M.S. degree in information system and Applications from National Tsing Hua University, Hsinchu, Taiwan, in 2007. He joined the Industrial Technology Research Institute (ITRI), Hsinchu, Taiwan, in 2007. His current research interests include video compression and signal processing.

Yao-Jen Chang received his B.S. degree in electrical engineering from Chung Yuan Christian University,Taiwan,in 2004 and the M.S. degree and the Ph.D.degree in communication engineering from National Central University,Taiwan,in 2006 and 2010, respectively. Since 2011, he has been a researcher in Industrial Technology Research Institute, Taiwan. Today,Dr.Chang is an active participant in Joint Collaborative Team on Video Coding (JCT-VC),which is established by ISO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG).His current research interests include: high efficiency video coding and screen content coding.

Chun-Lung Lin received the B.S.degree in mathematics from National Tsing Hua University,Hsinchu,Taiwan, R.O.C., in 2001,the M.S.degree in computer science from National Chiao Tung University, Hsinchu, Taiwan, in 2003, and the Ph.D. degree in computer science from National Tsing Hua University, in 2010.He joined the Industrial Technology Research Institute (ITRI),Hsinchu, Taiwan, in 2010 as a Researcher. His current research interests include image processing, video processing and sensor networks.