«End-to-End QoS Provision over Heterogeneous IP and non IP Broadband Wired and Wireless Network Environments A dissertation submitted in satisfaction ...»
Our architecture integrates the concepts of scalable video streaming, prioritized packetization based on content and DiﬀServ/UMTS classes coupling. The proposed architecture is depicted in Figure 1. It consists of three key components: (1) Scalable video encoding (MPEG-4 FGS and Scalable extension of H.264/MPEGAVC), (2) simple prioritized packetization according to the type of content (I, P, B frame type), and (3) DiﬀServ/UMTS classes coupling in order to achieve QoS continuity of scalable video streaming traﬃc delivery over DiﬀServ and UMTS network domains. Each one of these components is discussed in detail in the following subsections.
4.2.1 Scalable Video Coding
Scalable Video Coding should meet a number of requirements in order to be suitable for multimedia streaming applications. For eﬃcient utilization of available bandwidth, the compression performance must be high. Also, the computational complexity of the codec must be kept low to allow cost eﬃcient and real time implementations. When compared against other scalable video coding schemes,
the ﬁne granular scalability coding method is outstanding due to its ability to adapt to changing network conditions more accurately.
184.108.40.206 MPEG-4 FGS Scalable Video Coding MPEG- 4 FGS scalable video coding constitutes a new video coding technology that increases the ﬂexibility of video streaming. Similar to the conventional scalable encoding, the video is encoded into a BL and one or more ELs. For MPEG4-FGS, the EL can be eﬃcient truncated in order to adapt transmission rate according to underlying network conditions. This feature can be used by the video servers to adapt the streamed video to the available bandwidth in real-time (without requiring any computationally demanding re-encoding). In addition, the ﬁne granularity property can be exploited by the intermediate network nodes (including base stations, in case of wireless networks) in order to adapt the video stream to the currently available downstream bandwidth.
In contrast to conventional scalable methods, the complete reception of the EL for successful decoding is not required . The received part can be decoded, increasing the overall video quality according to the rate-distortion curve of the EL as described in . The overall video quality can also improve since the error concealment method is used. In our architecture, when a frame is lost, the decoder inserts a successfully previous decoded frame in the place of each lost frame. A packet is also considered as lost, if the delay of packet is more than the time of the play-out buﬀer. (For the experiments discussed in the next Section III, this time is set to 1sec).
220.127.116.11 FGS Scalable extension of H.264/AVC
In order to provide FGS scalability, a picture must be represented by an H.264/AVC compatible base representation layer and one or more FGS enhancement representations, which demonstrate the residual between the original predictions residuals and intra blocks and their reconstructed base representation layer. This basic representation layer corresponds to a minimally acceptable decoded quality, which can be improved in a ﬁne granular way by truncating the enhancement representation NAL units at any arbitrary point. Each enhancement representation contains a reﬁnement signal that corresponds to a bisection of the quantization step size, and is directly coded in the transform coeﬃcient domain.
For the encoding of the enhancement representation layers a new slice called Progressive Reﬁnement (PR) has been introduced. In order to provide quality enhancement layer NAL units that can be truncated at any arbitrary point, the coding order of transform coeﬃcient levels has been modiﬁed for the progressive reﬁnement slices. The transform coeﬃcient blocks are scanned in several paths, and in each path only a few coding symbols for a transform coeﬃcient block are
4.2.2 Prioritzed Packetization I deﬁne two groups of priority policies, one for BL and one for EL. These policies are used from the Edge Router of the DiﬀServ-aware underlying network to map the packets to the appropriate traﬃc classes. The packetization process can aﬀect the eﬃciency as well as the error resiliency of video streaming. Fixed length packetization scheme is adopted for both BL and EL streams as proposed by the MPEG-4 speciﬁcation for transmitting MPEG-4 video bitstreams.
Based on the content of each packet, I assign priorities according to the anticipated loss impact of each packet on the end-to-end video quality (considering the loss impact to itself and to dependencies). Each layer has a priority range, and each packet has diﬀerent priority according to its payload. The packets that contain data of an I-frame are marked with lowest drop probability, the packets which contain data of a P-frame are marked with medium drop probability and the packets which contain data of a B-frame are marked with high drop probability.
Note that MPEG-4 FGS and H.264/AVC FGS speciﬁcations assume guaranteed delivery to BL (base representation) and best-eﬀort one to EL. In our framework, I use EF for transmitting BL and AF with diﬀerent priorities for the
EL based on the frame type. With assigned priorities, the packets are sent to the underlying network and receive diﬀerent forwarding treatments. TABLE I.
depicts the relation between the type of the EL content and the corresponding DiﬀServ classes. The ﬁrst digit of the AF class indicates forwarding priority and the second indicates the packet drop precedence.
4.2.3 DiﬀServ/UMTS Classes Coupling
The proposed scalable video streaming traﬃc delivery framework adopts three diﬀerent couplings of DiﬀServ/UMTS classes approaches depicted in TABLE II.
Note that the actual QoS that can be obtained heavily depends on the traﬃc engineering for both UMTS and DiﬀServ networks.
4.3 Framework Evaluation
This section evaluates the performance of the proposed architectural framework through a set of experimental cases. A NS2- based simulation environment with the appropriate Enhanced UMTS Radio Access Network Extensions for ns-2 (EURANE)  package extensions for simulating a UMTS network is adopted. I
Figure 4.2: Simulation Setup
study the performance of our framework by enabling or disabling scalable video coding and/or by enabling or disabling prioritized transmission. The quality gains of scalable video coding in comparison with non-ﬁne grain SNR scalable video coding and the quality gains of prioritized transmission in comparison with nonprioritized transmission applying three diﬀerent DiﬀServ/UMTS traﬃc classes mapping approaches are discussed in detail.
Figure depicts our simulation setup, which includes a DiﬀServ-aware autonomous system of a single 512Kbps wired link and a single UMTS cell of 1M bps with the following rate allocation for the supported traﬃc classes: 200Kbps for the Conversional class, 300Kbps for the Streaming class, 200kbps for the Interactive 1 class, 100kbps for both Interactive 2 and 3 classes, and 200Kbps for the Background class. For the DiﬀServ-aware network the buﬀer management is considered to be Weighted Random Early Detection (WRED). The qualitative remarks being the outcome of our experiments can be also applied over more complex heterogeneous IP/UMTS infrastructures.
Several YUV Quarter Common Intermediate Format (QCIF) (176x144) raw
video sequences consisting of 300 to 2000 frames are used as video sources. The Microsoft MPEG-4 FGS and the scalable extension of H.264/AVC encoder/decoder are used for encoding YUV sequences . A number of background ﬂows are also transmitted in the simulated network in order to ﬁll in the respective DiﬀServ/UMTS class capacity in the link. The background traﬃc is increased from 210Kbps to 540Kbps leading the system in congestion.
In order to measure the improvements in video quality by employing H.264/MPEGAVC, we use the Peak Signal to Noise Ratio (PSNR) and the Structural Similarity (SSIM)  metrics. P SN R is one of the most widespread objective metric for quality assessment and is derived from the Mean Square Error (MSE) metric, which is one of the most commonly used objective metrics to assess the application level QoS of video transmissions .
Let’s consider that the video sequence is represented by v(n, x, y) and vor (n, x, y), where n is the frame index and x and y are the statial coordinates. The average P SN R of the decoded video sequence among frames at indices between n1 and
n2 is given by the following equation:
where V denotes the maximum greyscale value of the luminance. The average M SE of the decoded video sequence among frames at indices beteen n1 and n2
is given by:
Note that, the P SN R and M SE are well-deﬁned only for luminance values.
As it mentioned in , the Human Visual System (HVS) is much more sensitive to the sharpness of the luminance component than that of the chrominance component, therefore, we consider only the luminance P SN R.
SSIM is a Full Reference Objective Metric  for measuring the structural similarity between two image sequences exploiting the general principle that the main function of the human visual system is the extraction of structural information from the viewing ﬁeld. If v1 and v2 are two video signals, then the SSIM
is deﬁned as:
where L is the dynamic range of pixel values and K1 = 0.01 and K2 = 0.03, respectively.  deﬁnes the values of K1 and K2.
4.4 Results The validation of the quality gains oﬀered by the proposed framework concerns four simulation cases consisting of a number of experiments referring to eight diﬀerent source video sequences transmissions over an all-IP network consisting of a DiﬀServ-aware IP core network and a UMTS access network.
The ﬁrst simulation case refers to a single layer video stream transmission video encoding. The video frames are sent every 33ms for 30fps video. For this simulation scenario, I use EF for transmitting I frames and AF12 and AF13 for transmitting P and B frames respectively. The mapping of DiﬀServ classes to the UMTS ones is done according to Table 4.2.
The second simulation case concerns a scalable video stream transmission consisting of two layers. For MPEG-4, the BL packets are encoded using the MPEG4-FGS codec with MPEG2 TM5 rate control at 128kbps and the EL ones are encoded at 256kbps. For H.264, a scalable version of H.264/MPEG-4 AVC provided by , is used. For this simulation case, mapping is a direct application of Tables 4.2 and 4.1.
The third simulation case concerns a scalable video stream transmission consisting in one BL and two ELs, i.e., EL1 and EL2. The encoding of BL packets remains at 128kbps as in the second simulation case, while the encoding of packets of both ELs is at 128kbps. For this simulation scenario, I use EF for transmitting BL, AF11 for transmitting EL1, and Best Eﬀort (BE) for transmitting EL2. The mapping of DiﬀServ classes to the UMTS ones follows Table 4.2.
The fourth simulation case adopts the setup of the third case, while it applies the prioritized packetization scheme of the second case to the packets of the ﬁrst EL, i.e., for this simulation scenario, I use EF for transmitting BL, Table 4.1 for
transmitting EL1, and Best Eﬀort (BE) for transmitting EL2.
Tables 4.3 to 4.
8 depict the simulation results in terms of PSNR and SSIM video quality metrics for eight diﬀerent YUV video sequences for all simulation cases (1 to 4) for the three settings (I to III) concerning Diﬀserv/UMTS classes coupling and for the two ﬁne grain scalable video encoders. For Setting I, each conﬁguration case increases the video quality and the gain increment that oﬀers each case is around 2db in terms of PSNR. For Setting II, the Cases 3 and 4 produce the same results. As you can see from the 4.3 to 4.8, the scalable version H.264 has better quality gains, compared to MPEG-4 FGS, between 0.7 - 1.2db, due to encoding/decoding and transmission eﬃciency of H.264. Especially for the case 3 and 4 where the BE is used to transmit the EL2, it is observe the highest quality gains due to beneﬁts imposed by H.264/AVC MPEG-4 use.
For the Highway video sequence, I measure the packet/frame losses for I, P, and B frames for the four simulation cases for the three settings (I to III) concerning Diﬀserv/UMTS classes coupling. For Cases 3 and 4 the depicted measurements concern EL1. The results presented in Tables 4.9 to 4.11 are in accordance with the ones depicted in Tables 4.3 to 4.8. For Setting I, each case improves the previous one and Case 4 oﬀers the best video quality gain as it experiences the lower packet/frame losses. For Settings II and III, Case 2 oﬀers the best video quality.
As an overall remark of the above results, I could note that Case 4 of Setting I could oﬀer almost the same video quality as Case 2 of Settings II and III, without however employing conversational class. In the H.264 scalable extension, the motion-compensated prediction (MCP) is performed by only using the base
layer representation of the reference picture, increasing the performance of the encoder/decoder, compared to MPEG-4 FGS, where the MCP is always done in the SNR base layer. By providing the same quality at lower bit rates network and service providers can increase the number of consumers, and also provide more demanding multimedia services to the consumers.
Nowadays, continuous media applications over heterogeneous all-IP networks, such as video streaming and videoconferencing, become very popular. Several approaches have been proposed in order to address the end-to-end QoS both from the network perspective, like DiﬀServ and UMTS QoS traﬃc classes, and from the application perspective, like scalable video coding and packetized prioritization