Features

July 2008

Special Focus: Conferencing

UC driving protocol convergence

The road to SIP visual communications is paved with challenges-and benefits-for end-users.

by Stefan Karapetkov

 CN
There are two ways to bridge the SIP and H.323 networks: through a signaling gateway and through a conferencing server/multipoint control unit.

With the emergence of the unified communications (UC) concept, organizations started morphing their voice, video, instant messaging and presence systems into one. This trend creates an interesting technical challenge.

Telephony call-control servers have started the migration from proprietary protocols to standard session initiation protocol (SIP), and there are already a large number of SIP standard-based implementations, some of them open source. Even the remaining proprietary IP PBX systems on the market provide some level of SIP interoperability and allow third-party equipment to connect to the IP PBX. In addition, many presence and instant messaging systems support SIP via the SIP for instant messaging and presence-leveraging extensions (SIMPLE) protocol.

The technical challenge that UC poses is how to connect all of these elements into one system that provides the full range of services to users. SIP is a functional common denominator that can interconnect the different applications within the organization. SIP also meets the requirements for scalable distributed visual communications, and has already been deployed in certain scenarios. A major step in the transition of visual communication networks from H.323 to SIP is replication of the features developed specifically for use in visual communications.

Dual-video streams allow a second "presentation" (sometimes also called content) audio-video stream to be created in parallel to the primary live audio-video stream. The second stream is used to share any type of content (e.g., slides, spreadsheets, X-rays, video clips).

The function is particularly appropriate for multiscreen setups (video endpoints can support up to four monitors). The benefit of this functionality is that users can share not just slides or spreadsheets but also moving images (e.g., Flash video, movie clips, commercials). The presentation channel has flexible resolution, frame rates and bit rates, and it supports dynamic images in high definition (HD). Another benefit of using a video channel for content sharing is that the media is encrypted, and once the firewall and network access translation traversal works for the live stream, it works for the presentation stream as well.

Dual-video streams

In H.323 networks, the dual-video streams function is standardized by H.239. The first issue with supporting dual-video streams in SIP is describing the content/presentation stream. In a SIP environment, the session description protocol (SDP) is used to describe media stream parameters. SIP endpoints and conferencing servers have to support RFC 4574, which defines the "label" attribute in the SDP, and the RFC 4796, which defines the content attribute. Next, the content stream has to be associated with a live stream. This can be done by supporting RFC 3388 grouping of media lines in the SDP.

The remaining issue is how to identify who is sending the content and who is receiving it. This is usually done by tokens (the party that has the token can send content), and token-management protocols ensure there is only one token in the session, and that anyone can request and receive the token. The RFC 4582 binary flow control protocol (BFCP) defines the token-management mechanism, and can be used for dual-video stream implementation in SIP. Since everything has to be described in SDP, a way also is needed to describe the BFCP streams in SDP. This can be done by supporting the RFC 4583 SDP format for binary floor control protocol streams.

Video-channel control is embedded in H.245, a sub-protocol in the H.323 family. The protocol allows sending messages like "flow control" from the receiver of live and presentation streams back to the sender of these streams, and telling the sender to modify the bit rate, usually to reduce the bit rate when the receiver detects high packet loss. By sending a "fast update" message, the receiver asks the sender to resend a full or intravideo frame(s), usually when a video frame is lost in transmission.

There is still no standard solution for replicating the video channel control functionality in SIP. One method is to use the SIP INFO message because it allows easy mapping of the H.245 messages into SIP. Several vendors in the market have embraced this approach, mainly because interworking between H.245 and SIP INFO is simple to implement and only touches the H.323-SIP signaling.

Far-end camera control (FECC) is a popular feature in visual communications. If H.323 terminals A and B are on a call, the feature allows terminal A to control the camera of terminal B. The assumption is that terminal B has a pan, tilt and zoom camera, and has the FECC feature enabled.

Using the conferencing server as a gateway has been considered an alternative concept for H.323-SIP interworking.

In a group conferencing setting, the key FECC benefit is that users can adjust the image that they get from the remote site, focus on a particular person or a group of people, and then move to another part of the room. In personal video settings, the feature can be used to adjust the camera if the remote party is sitting too close or too far from the camera.

In H.323, FECC is implemented via two standards: H.281 defines the binary data that is transmitted between terminal A and B to control the camera, while H.224 defines the format of the frames that carry the binary data.

In SIP, RFC 4573 MIME type registration for RTP payload format for H.224 registers the H.224 media type, and defines the syntax and the semantics of the SDP parameters needed to support FECC protocol using H.224 in SIP. In effect, RFC 4573 creates a tunnel through the SIP-based network, and allows video endpoints to exchange H.224/H.281 information exactly as they do in H.323-based networks.

Interworking the protocols

Interworking between the two protocols becomes an important issue. In general, there are two ways to bridge the SIP and H.323 networks: through a signaling gateway and through a conferencing server/multipoint control unit.

SIP and H.323 are different protocols with different message formats but they both can be used in similar ways. Comparing the call flows shows a lot of similarities in the call setup and call tear down, as well as in the mechanisms to spontaneously exchange information during the call.

A signaling gateway is a piece of software that takes incoming SIP messages, extracts the communication parameters, creates H.323 messages and sends them to the H.323 network. It also takes the incoming H.323 messages, extracts the communication parameters, creates corresponding SIP messages, and sends them to the SIP network. The gateway, therefore, looks like a SIP user agent to the SIP network and like an H.323 terminal to the H.323 network.

Both SIP and H.323 rely on the same protocols (real-time transport protocol and real-time conferencing protocol) for transmitting media streams. The signaling gateway can then focus on mediating between the H.323 and SIP signaling but does not need to touch the media. This is important, as media processing is resource intensive. While signaling messages generate traffic in the magnitude of a few kilobits per second, video media streams can be in the megabits per second.

There are, however, several issues with the signaling gateway approach. First, media security is lost because H.323-based video networks use the advanced encryption standard, while SIP refers to SRTP for encryption. The encryption algorithms of these two standards and the key exchange procedures are incompatible. The consequence is that deploying a signaling gateway would result in failure of the media encryption (i.e., the audio and video streams will be transmitted unencrypted).

Another issue is the approach that requires the use of real-time control protocol (RTCP), which is associated with RTP media. This concept goes against the concept of a signaling-only gateway because H.245 messages must be mapped into RTCP messages. There are currently no implementations where RTCP is independent from an RTP media stream, so media has to traverse the gateway in order to follow this approach.

The third issue is that signaling gateways only address the SIP-H.323 interworking; ISDN and PSTN have different media and ISDN/PSTN users cannot use this gateway to connect to the SIP network.

Due to these limitations, using the conferencing server as a gateway has been considered an alternative concept for H.323-SIP interworking. Conferencing servers can originate and terminate H.323 and SIP calls, and have sufficient processing power to handle the media. They already support AES, and can add support of SRTP encryption.

Mechanisms for video channel control that use RTCP can be accommodated, as well, since RTP and RTCP streams go through the conferencing server. The main disadvantage of this approach is that it creates a bottleneck-even point-to-point calls between SIP and H.323 domains have to go through the conferencing server-and the associated high cost of additional conferencing server ports to support SIP-H.323 interworking.

Stefan Karapetkov is an emerging technologies director at Polycom, Pleasanton, Calif.

For more information (click here)