MPEG Encoding

MPEG stands for Moving Pictures Expert Group, the committee of industry which created the standard. MPEG is, in fact, a whole family of standards for digital video and audio signals using DCT compression. There are other ways of compressing signals, such as wavelet and fractal compression, but none has yet achieved such worldwide support that DCT has, and MPEG-2, which employs DCT compression, is certain to become the dominant standard in consumer equipment for the foreseeable future. MPEG takes the DCT compression algorithm and defines how it is used to reduce the data rate, how packets of video and audio data are multiplexed together in a way that will be understood by an MPEG decoder.

DCT or Discrete Cosine Transform, to give it its full name, uses the fact that adjacent pixels in a picture (either physically close in the image (spatial) or in succesive images (temporal)), may be the same value. Small blocks of 8 x 8 pixels are 'transformed' mathmatically in a way that tends to group the common digital signal elements in a block together. DCT doesnt directly reduce the data but the transform tends to concentrate the energy into the first few coefficients and many of the higher frequency coefficients are often close to zero. Bit rate reduction is achieved by not transmitting the higher frequency elements, which have a high probability of not carrying useful information. (That is why, when things start to fail, the picture dissolves into little blocks.)

MPEG has pedigree in two developments taking place in the late 1980's, an ITU standard for video conferencing and video telephony (H.261) and JPEG - a joint working group between ISO/IEC and CCITT that chose a DCT-based algorithm for still picture coding in a competetive process early in 1988. MPEG was started in 1988 as a working group within ISO/IEC with the aim of defining standards for digital compression of motion video and audio signals.

MPEG first aim was to define a video coding algorithm for application on 'digital storage media', in particular for CD-ROM. Very rapidly the need for audio coding was added and the scope was extended from being targeted solely on CD-ROM to trying to define a 'generic' algorithm capable of being used by virtually all applications, from storage-based multimedia systems, television broadcasting, and communications applications such as VoD and videophones.

MPEG's first project, MPEG- 1, was published in 1993 as a three part standard defining audio and video compression coding methods and a multiplexing system for interleaving audio and video data so that it can be played back in close synchronisation. It has been applied in the CDi system and Video-CD for publishing full-screen motion video on CD-ROM. A number of PC-based decoder systems are now available. It also forms the basis of a number of field trials for VoD services. It principally supports video coding at bit-rates up to about 1.5 Mbit/s giving quality similar to VHS, and virtually transparent stereo audio quality at 192 kbits/sec. and is optimised for a non-interlaced video signals. MPEG-1 assumes progressive scanning - the alternate fields of interlace-scanned pictures are dropped to achieve this.

During 1990, MPEG recognized the need for a second, related standard for coding video at higher datarates and in an interlaced format. The MPEG-2 standard is capable of coding standard definition television at bit-rates from about 1.5Mb/s to some 15 Mbit/s. MPEG-2 also adds the option of multi-channel surround sound coding. MPEG-2 is backwards compatible with MPEG-1 (ie MPEG2 decoders will decode MPEG1 pictures and sound). It is interesting to note that, for video signals coded at bitrates below about 3 Mbit/s, MPEG-1 may be more efficient than MPEG-2.

Both the MPEG-1 and MPEG-2 standards are split into three main parts: Audio coding, video coding, and system management and multiplexing. MPEG itself is split into three main sub-groups, one responsible for each part, and a number of other sub-groups to advise on implementation matters, to perform subjective tests, and to study the requirements that must be supported.

Each of the sub-groups has followed a similar procedure. Initially the requirements that the system must support were analysed, This lead to a statement of the problem and a call for proposals. That started a competitive phase during which many proposals from different laboratories were put through tests aimed at identifying promising algorithmic techniques that could be used in the collaborative phase that followed. The competitive phase lasted about 1 year until the collaborative phase took over after the evaluation of the results from a large series of subjective tests of all the different proposals based on the same input material and experimental conditions. During the collaborative phase a draft specification was produced and successively refined. At the end of this stage formal approval of the standard within ISO/IEC and ITU-T takes about 1 year to achieve during which the quality of the specification can be incrementally improved.

Work on MPEG-2 began in the summer of 1990. The Main Profile algorithm of the Video part of the standard was frozen in March 1993 so that no further changes would take place, and a draft of the whole audio, video and systems specifications were completed in November 1993. The ISO/IEC approval process of balloting, revision and approval was completed in November 1994. The final text was published during 1995 and early implementations of the standard are now (Mar '96 beginning to appear in consumer products.

MPEG aims to be a generic video coding system that supports different applications that have different requirements. It is not possible to provide a single, unique method for all the different problems. Instead MPEG has followed a 'tool-kit' approach in which an extensive get of algorithmic 'tools' are defined. For instance coding modes are provided both for scalable and non-scalable coding systems. The coding syntax that MPEG has defined provides tools to cover different applications, and parameters can be chosen to allow working at different bit-rates, picture sizes and resolutions etc.

It is neither cost effective nor an efficient use of bandwidth to support all the features of the standard in all applications. In order to make the standard practically useful and enforce interoperability between different implementations of the standard, MPEG has defined profiles and levels of the full standard. Roughly speaking, a profile is a sub-set, suitable for a particular application, of the full possible range of algorithmic tools, and a level is a defined range of parameter values (such as picture size for instance) that are reasonable to implement and practically useful. There are as many as six MPEG2 profiles though only two are currently relevent to broadcasting, main profile which is essentially MPEG-1 extended to take account of interlace scanning and encodes chroma 4:2:0 and professional profile which has 4:2:2 chrominance resolution and is designed for production and post production.

MPEG-2 makes extensive use of motion compensated prediction to eliminate redundancy. The prediction error remaining after motion compensation is coded using DCT, followed by quantisation and statistical coding of the remaining data. MPEG has two types of prediction. The so-called 'P' pictures are predicted only from pictures that are displayed before the current picture. 'B' pictures on the other hand are predicted from two pictures, one that is displayed earlier and one later. In order to do this non-causal prediction the encoder has to reorder the sequence of pictures before sending them to the decoder and then the decoder has to return them to the correct display order. B-pictures add complexity to the system but also produce a significant saving in bit-rate. An important feature of the MPEG prediction system is the use of 'I frames' that are coded without motion compensation. These break the chain of predictive coding so that channel switching can be done with a sufficiently short latency.

The most significant extension of MPEG-2 Main Profile over MPEG-1 is an improvement in options within a picture that can be used to do motion compensated prediction of interlaced signals. MPEG-1 treats each picture as a collection of samples from the same moment in time (known as frame-based coding). MPEG-2 understands about interlace, that samples within a frame come from two fields that may represent different moments of time, Therefore MPEG-2 has modes in which the data can either be predicted either using one motion vector to give an offset to a previous frame or two vectors giving offsets to two different fields.

MPEG-2 audio is a compatible extension of MPEG-1 audio. Audio compression relies on the fact that the ear cannot hear lower level sound frequencies close to larger ones. This psychoacoustic effect can be used to control the bit allocation to each sub-band. It achieves nearly transparent audio quality at 192 kbits/s/channel. With a minimal increase in bit-rate it is possible to encode Dolby Prologic surround sound signals. MPEG-2 audio's main extension to MPEG-1 is to provide compatible methods for coding multiple channel surround sound at between 384-512 kbits/s. Both MPEG-1 audio can be combined with MPEG-2 video or vice-versa.

The MPEG systems specification defines how to interleave multiple audio and video streams into a single stream, how to manage the buffering at the decoder, how to synchronise the streams on play back, and time identification for each of the streams. The MPEG-1 specification allows elementary streams sharing a common time-base to be multiplexed using a flexible packet size. The packet size is normally relatively large and is chosen by the application. MPEG-1 is suited to software processing, but is less satisfactory in an environment where data errors are common.

MPEG-2 extends this performance to allow:

  • Multiple programs with independent time-bases
  • Error prone environments
  • Remultiplexing
  • Support for scrambling

Two forms of multiplexed stream are defined by MPEG-2. The program stream and the transport stream. The program stream is similar to MPEG-1. All elementary streams share a common time-base, it has the same features as MPEG-1 but additionally supports scrambling, trick modes, a directory of the contents of the multiplex and a map describing the features of the streams. It is intended for use in storage-based interactive systems where software processing is important.

The transport stream is intended for broadcast systems where error resilience is one of the most important properties. It supports multiple programs with independent time-bases, multiplexed together with a fixed packet size of 188 bytes. It carries an extensible description of the contents of the programmes in the multiplex and supports remultiplexing and scrambling operations. (MPEG has not defined a method of scrambling - it has defined what can be scrambled and how access control data may be transmitted in an MPEG stream). Transcoding bctween the different MPEG-systems formats is possible and by suitable choices of parameters can be made relatively easy. It is likely that a systems digital interface specification may grow up around the transport stream.

Each of the MPEG specifications (audio, video and systems) allows encoders and decoders from different manufacturers to operate together. The interface between the two is the compressed bit-stream that represents the coded audio and video. To achieve interoperability MPEG has standardised the structure, content and meaning of the bit-stream and the way that it should be decoded to reconstruct the desired pictures or sound. The encoders arc not standardised. This approach has the advantage of leaving considerable freedom to encoder manufacturers to improve their encoding strategies as more is learnt about encoding, or to address different market segments with different trade offs of cost and complexity against picture quality.

The compliance parts of the standard specify when a bit-stream is compliant, when a decoder is compliant, and how to verify what has gone wrong if a decoder fails to decode a bit-stream properly. Compliant MPEG decoders are defined as being capable of decoding all bit-streams that comply to one of the defined profiles and levels. This means that all MPEG decoders decode only a sub-set of everything possible within MPEG. Decoders have to specify their capabilities (profile/level). MPEG has generated a number of test bit-streams that can be used to help with compliance testing. An essential tool in compliance testing is a bit-stream verifier. This is software that analyses a bit-stream to check whether or not it is compliant to the specification. It can be used as a 'referee' to determine whether it is the bit-stream or the decoder that is at fault if the system fails to work - if the bit-stream passes the verifier it must be decoder that is wrong.