The potential kick-off for the development of a new generation video compression standard is emerging. The informal Joint Video Exploration Team (JVET) of ITU-T SG16 and MPEG has issued a draft Call for Evidence (CfE) [link] at its meeting in January which has been approved by its partent bodies. The final CfE document is forseen to be approved by the following meeting in April 2017.

Inter predictionIn HEVC, inter prediction is performed on the prediction block (PB) level. The corresponding prediction unit (PU) contains the information how inter prediction is performed.

Inter prediction is called motion compensated prediction since shifted areas of the reference pictures are used for prediction of the current PB. The resulting displacement between the area in the reference picture and the current PB is interpreted as the motion of the area between the reference picture and the current picture.

Motion compensated prediction can be performed using one or two reference pictures as the prediction source. The number of available prediction sources depends on the slice type of the slice the current PU belongs to.

Leading and Trailing PicturesHEVC comprises a large number of different picture types. The picture types are indicated in the NAL unit headers of the NAL units carrying the slices of the pictures. Thereby, essential  properties of the NAL unit payload is available at a very high level for use by applications.

The picture types include the following:

  • Random access point pictures, where a decoder may start decoding a coded video sequence.  These are referred to as Intra Random Access Pictures (IRAP). Three IRAP picture types exist: Instantaneous Decoder Refresh (IDR), Clean Random Access (CRA), and Broken Link Access (BLA). The decoding process for a coded video sequence always starts at an IRAP.
  • Leading pictures, which precede a random access point picture in output order but are coded after it in the coded video sequence. Leading pictures which are independent of pictures preceding the random access point in coding order are called Random Access Decodable Leading pictures (RADL). Leading pictures which use pictures preceding the random access point in coding order for prediction might be corrupted if decoding starts at the corresponding IRAP. These are called Random Access Skipped Leading pictures (RASL).
  • Trailing pictures, which follow the IRAP and the leading pictures in both, output and display order.
  • Pictures at which the temporal resolution of the coded video sequence may be switched by the decoder: Temporal Sublayer Access (TSA) and Stepwise Temporal Sublayer Access (STSA)

The figure shows a coding structure with random access points, as well as leading and trailing pictures (illustrative example, not a recommended structure).

Sample Adaptive Offset (SAO) is a sample-based filtering operation which is operated on a CTU basis.

It is applied right after the deblocking filter, if activated. Two SAO modes are specified: edge offset and band offset. The former is driven by local directional structures in the picture to be filtered, the latter  modifies the intensity values of the samples without a dependency on the neighborhood.

HEVC Intra Prediction ModesA set of 35 intra prediction modes is available in HEVC, including a DC, a planar, and 33 angular prediction modes.While the DC and the planar mode are targetting at flat areas or areas with few structure, the angular modes provide directional prediction in a very granular way.

The modes are available for prediction block sizes from 4×4 to 32×32 samples. For luma and chroma blocks, the same prediction modes are applied. Some of the smoothing operations applied for luma intra prediction are omitted for chroma blocks as further
detailedbelow.The prediction reference is constructed from the sample row and column adjacent to the predicted block. The reference extends over two times the block size in horizontal and vertical direction using the available sample from previously reconstructed blocks.

Illustration for motion estimation

Motion estimation in a search range around the collocated block in the reference picture

The motion estimation stage operates on a prediction block level and is only part of the encoder. The estimator takes the current prediction block to be used and tries to find the best matching area in an available reference picture. The determination of what the best match would be is subject to the employed cost criterion.

NAL Unit raw byte sequence payloadThe network abstraction layer (NAL) defines the encapsulation of the coded video data for transportation and
storage. The encoded information is organized in NAL units, which contain either data of the video coding layer (VCL) or non-VCL data.

A NAL unit consists of a two-byte NAL unit header and the raw byte sequence payload (RBSP). The NAL unit header includes the syntax element for the NAL unit type (NUT), which indicates the content of the NAL unit. The last byte of the RBSP containing coded data includes the RBSP stop bit (‘1’) followed by ‘0’ bits such that byte alignment is achieved. The concatenation of the bits of the RBSP bytes before the RBSP stop bit constitutes the string of data bits (SODB). The SODB contains all coded data that is needed for the decoding process. Within the SODB, the bits of the bytes are concatenated from the most to the least significant bit.

The deblocking filter applies an adaptive smoothing filter operation across the boundaries of prediction and transform blocks inside the reconstructed picture.

The filtering strength depends on properties of the neighboring blocks, and the local decision whether to apply the deblocking filter depends on the difference between the edge sample values. If filtering across slice or tile boundaries is disabled, these boundaries are treated as if they were picture boundaries.