«End-to-End QoS Provision over Heterogeneous IP and non IP Broadband Wired and Wireless Network Environments A dissertation submitted in satisfaction ...»
Temporal scalability  is achieved by distributing each frame of a video sequence over a set of layers. The more temporal layers used in the decoding process, the higher the frame rate of the video is. Temporal scalability has low complexity and can be easily implemented, since it includes handling of individuals frames. Temporal scalability impacts the design of the inter-frame compression scheme of the video codec, because the inter-frame dependencies imposed by the temporal prediction must be resolvable by a decoder that only receives a subset of the temporal layers.
In Spatial scalability  a multi-resolution representation is used to divide each frame into a set of layers. Thus, an increased number of reconstruction layers correspond to higher spatial resolution of the individual frames of the video. A mobile device might have a maximum resolution less than the full resolution of the encoded video. In this case the limited resolution dictates the maximum number of spatial reﬁnement layers to receive. For that kind of application the spatial scalability is desirable, because it can decode the video at diﬀerent spatial resolutions.
In Signal-to-Noise-Ratio scalability , the magnitude of lossy compression applied through quantization is progressively adjusted. Because quantization is used to achieve high compression ratios, the SNT scalability is very important order to get a scalable bitstream in terms of bandwidth.
MPEG-4  supports conventional rectangular, frame-based visual encoding and also arbitrary-shaped object coding. Since a natural scene cannot be separated into a number of objects that have the same weight, object segmentation must perform partitioning in such a way that the most important object is identiﬁed. Each object will be transmitted using its own elementary stream. In fact each object can be divided into multiple streams, a base layer (BL) stream and several enhancement layer (EL) streams. MPEG-4 supports three types of layered coding for each object: temporal scalability,spatial scalability, Fine Granular Scalability (FGS). The ﬁrst two are similar to their MPEG-2 counterparts.
The MPEG4 FGS  supports temporal scalability but does not support spatial scalability. In FGS Temporal Scalability (FGST), the enhancement layer also inserts new frames between the base layer frames. This makes the architecture most robust against packet losses. The MPEG4 FGS proﬁle does not have very good performance loss when it comes to compression eﬃciency, compared to optimized single-rate streams and particularly compared to ordinary scalability structures, as described in .
The spatial scalability schemes of MPEG2 and H.263+ require that the subsampled frames are ﬁrst compressed and then decompressed and upsampled again in order to compute the diﬀerential frame of the next higher level. This guides to a very high complexity of the compression engine. Thus, it conﬂicts between the block based DCT transform of the compression procedure and the sub-sampling procedure. A more interesting approach is to combine the transform of the compression procedure with the transform required for the sub-sampling into one operation. This is a feature of the wavelet transform coding.
In wavelet encoding  , the discrete wavelet transform (DWT) is applied to the entire frame instead of on small blocks of the frame as in DCT-based encoding. The compression is performed by quantizing and entropy coding the sub-bands. Since the DWT-based encoding provides a multiscale representation of a frame, it is a very good choice for spatial scalable video coding. Also, since the wavelet frame compression provides a more graceful degradation of frame quality at high compression ratios compared to DCT mechanisms, it can also work with a small scalable quantization scheme.
2.2.1 MPEG-4 Scalable video
The previously discussed conventional scalable coding schemes are not able to eﬃciently address the problem of easy, adaptive and eﬃcient adaptation to timevarying network conditions or deviced characteristics. The reason for this is that they provide onolu coards granularity rate adaptation and their coding eﬃcienty
often decrease due to overhead associated with an increased number of layers.
To address this problem, FGS coding has been standarized by the MPEG-4 standard, sa it is able to provide ﬁne-grain scalability to easily adapt to various time-varying network and device resource constraints. Moreover, FGS can enable a streaming server to perform minimal real-time processing and ratecontrol when ouputting a very large number of sumultaneous unicast ﬁll various (network) rate requirements. Also, FGS is easily adaptable to upredicable bandwidth variations due to heterogeneous access technologies or to dynamic changes in network conditions. Furthermore, FGS enables low-complexity decoding and low-memory requirements that provide common receivers, in addition to powerfull computers, the opportunity to stream and decode any desired streamed video content. Hence, receiver-driven streaming solutions can only select the protion of the FGS bit stream than fulﬁll these constraints.
In MPEG-4 FGS, a video sequence is represented by two layers of bit streams with identical spatial resolution, which are referred to as the base layer bit stream and the ﬁne granular enhancement layer bit stream, as illustrated in Figure 2.2.1.
The base layer bit stream is coded with non-scalable coding techniques, whereas the enhancement layer bit stream is generated by coding the diﬀerence between the original DCT coeﬃcients and the reconstructed base layer coeﬃcients using a bit-plane coding technique . The residual signal is represented with bit planes in the DCT domain, where the number of bit planes is not ﬁxed, but is based in the number if bit planes needed to represent the residual magnitude in binary format. Before a DCT residual picture is coded at the enhancement layer, the maximum number of bit planes of each color component (Y, U and V) is ﬁrt found. IN general, three color components may have diﬀerent numbers of it planes. Figure 5.7 gives an example of 5 bit planes in Y component and 4-bit
planes in U and V component. These three values are coded in the picture header of the enhancement layer stream and transmitted to the decoder.
All components have aligned themselves with the least signiﬁcant bit plane, The FGS encoder and decoder process bit planes from the most signiﬁcant bit plane to the LSB plane. Because of the possible diﬀerent maximum number of bit plane on Y, U and V components, the ﬁrst MSB planes may contain onlu one or two components. In the example given by Figure 2.2.1, there is only Y component existing in the MSB plane. In this case, bits for the coded block pattern (CBP) of each macroblock can be reduced signiﬁcantly. Every macroblock in a bit plane is coded with row scan order.
Since the enhancement layer bit stream can be truncated arbitrarily in any frame, MPEG-4 FGS provided the capability of easily adapting to channel band
Figure 2.3: Four-level hierarchical-B prediction structure width variations.
2.2.2 Scalable Extension of H.264/MPEG-4 AVC As scalable modes in other standards, MPEG-4 AVC/H.264 scalable extensions enables scalabilities while maintaining the compatibility of the base layer to the single layer MPEG-4 AVC/H.264. The H.264/MPEG-4 AVC scalable extensions provides temporal, spatial and quality scalabilites. Those scalabilities can be applied simultaneously. In MPEG-4 AVC/H.264, any frame can be marked as a reference frame that can be used for motion prediction for the following frames.
Suca ﬂexibility enables various motion-compensated prediction structures Figure 2.2.2.
The common prediction structure used in scalable extension of MPEG-4
AVC/H.264 is the hierchical-B structure, as shown in Figure 2.2.2. Frames are categorized into diﬀernt levels. B-frames at level i use neighboring frames at level i − 1 as references. Except for the update step, MCTF and hierarchicalBV have the same prediction structure. Actually at the decoder, the decoding process if hierarchical-B and that of MCTF wihout the update step is the same, Such a hierarchical prediction structure exploits both short-term and long-term termporal correlations as in MCTF. The other advantage is that such a structure can inherently provide multiple levels of temporal scalability. Other temporal scalability schemes compliant with MPEG-4 AVC/H.264 have been presented in  and are shown to provide increassed eﬃciently and robustness on error-prone networks.
To achieve SNR scalability, enhancement layers, which have the same motioncompensated prediction structure as the base layer, are generated with ﬁner quantization step sizes. At ech enhancement layer, the diﬀerential signal to the previous layer are coded. Basically, it follows the scheme shows in Figure ??.
To achieve spatial SNR scalability, the lower resolution signals and the higher signals are coded into diﬀerent layers. Also, coding of the higher resolution signalsuse bits for the lower resolution as prediction. In contrast to previous coding schemes, the MPEG-4 AVC/H.264 scalable extension can set a constraint on the interlayer prediction among diﬀerent resolutions in which only intra-coded macroblocks are reconstructuted to predict the higher resolution, whereas for inter-coded macroblocks, only the motion compensated residue signals are allowed to predict the correspoding resifue signals at he higher resolution. The advantage of such a constraint is that it reduces the decoding complexity because the decoder does not need to do motion compensation for the lower layer. The drawback is thah such constraint may have a coding performance penalty.
2.3 Prioritization of video packets
For real time multimedia streaming applications, packet prioritization is performed in such a way to reﬂect the inﬂuence of each stream or packet to the end-to-end delay. Packets will be classiﬁed by the context aware applications in the granularity of session, ﬂow, layer and packet. The most important QoS parameters, rate, delay and error are used to associate priority for delay and loss.
The bandwidth (rate) is usually mapped with the layered coding mechanism such as MPEG-4 FGS.
Most of the available prioritization techniques are based on granularity of session, ﬂow and layer. The per-ﬂow prioritization is based on the user-based allocation within an access network. Lots of prioritization for the unequal error protection (UEP) is mapped better with the layered diﬀerentiation as described in  with object scalability. The session-based prioritization is a better way to prioritize packets based on delay. Since the video application context has a critical role in delay prioritization the Relative Delay Index (RDI) is kept constant during the session.
According to , each video stream of an application can be classiﬁed according to its importance to receive low delay and loss packet delivery service from the network. For a videoconferencing application, for example, low delay is most important. Each packet is identiﬁed by a relative priority index (RPI), which is composed by two components the relative delay index (RDI) and relative loss index (RLI). These two components indicate the eﬀect of data segment’s loss and delay on the perceived quality of the application.
As it is mentioned above, the level of a video stream’s importance for receiving low delay network service depends on the application type and context.
Considering diﬀerent levels of importance for receiving low delay for diﬀerent packets within a stream, the requirements for delay are dependent with the layered coding of video compression. For example the I, P and B frames of MPEG4 have varying requirements with regard to delay and packet loss. This impact is also similar for the spatial-scalable, SNR-scalable and data-partitioned layers of MPEG4 and H.264.
The most widely used scheme, in order to packetize MPEG-4 video stream is ﬁxed-length packetization. In this scheme video packets of a similar length are formed. Because a smaller packet requires a higher overhead and is more resilient to errors, the packet size of the MPEG4 video stream is related to eﬃciency and error resiliency. Improving error resiliency, a discrete optimization mechanism to minimize distortion, can be used in the packetization of embedded stream . Each packet is identiﬁed with a priority according to its impact on endto-end visual delay. The priority can be also divided into the RLI and RDI. If the assigned priority reﬂects the impact of each packet on end-to-end quality, a graceful quality degradation can be achieved by dropping packets based on priority index.
2.4 Network-Adaptive Media Transport
Internet packet delivery is characterized by variations in throughput, delay and loss, which can severely aﬀect the quality o real-time media. The challenge is to maximize the quality of audio or video at the receiver, whille simultaneously meeting bit-rate limitations and satisfying latency constraints. For the best endto-end performance, Internet media applications must adapt to changing network characteristics; it must be network adaptive. It should be also be media aware, os that adaptation to changing network conditions cab be performed eﬃciently.
A typical streaming media system comprises four major components that
should be designed and optimized in concert:
• The encoder application compresses video and audio signals and uploads them to the media server.
• The media server stores the compressed media streams and transmits them on demand, often serving hundreds of clients simultaneously.
• The transport mechanism deliverys media packets from the server to the client for the best possible user experience, while sharing network resources fairly with other users.
• The client application decompresses and renders the video and audio packets and implements the interactive user controls.
To adapt to network conditions, the server receives feedback from the client, for example, as positive or negative acknoledgments. More sophisticated client feedback might inform about packet delay and jitter, link speeds or congestion.