Special Focus: Conferencing
UC driving protocol convergence
The road to SIP visual communications is paved with challenges-and benefits-for end-users.
by Stefan Karapetkov

There are two ways to bridge
the SIP and H.323 networks:
through a signaling gateway
and through a conferencing
server/multipoint control unit.
With the emergence of the
unified communications (UC) concept,
organizations started morphing their voice,
video, instant messaging and presence
systems into one. This trend creates an
interesting technical challenge.
Telephony call-control
servers have started the migration from
proprietary protocols to standard session
initiation protocol (SIP), and there are
already a large number of SIP standard-based
implementations, some of them open source.
Even the remaining proprietary IP PBX
systems on the market provide some level of
SIP interoperability and allow third-party
equipment to connect to the IP PBX. In
addition, many presence and instant
messaging systems support SIP via the SIP
for instant messaging and
presence-leveraging extensions (SIMPLE)
protocol.
The technical challenge
that UC poses is how to connect all of these
elements into one system that provides the
full range of services to users. SIP is a
functional common denominator that can
interconnect the different applications
within the organization. SIP also meets the
requirements for scalable distributed visual
communications, and has already been
deployed in certain scenarios. A major step
in the transition of visual communication
networks from H.323 to SIP is replication of
the features developed specifically for use
in visual communications.
Dual-video streams allow
a second "presentation" (sometimes also
called content) audio-video stream to be
created in parallel to the primary live
audio-video stream. The second stream is
used to share any type of content (e.g.,
slides, spreadsheets, X-rays, video clips).
The function is
particularly appropriate for multiscreen
setups (video endpoints can support up to
four monitors). The benefit of this
functionality is that users can share not
just slides or spreadsheets but also moving
images (e.g., Flash video, movie clips,
commercials). The presentation channel has
flexible resolution, frame rates and bit
rates, and it supports dynamic images in
high definition (HD). Another benefit of
using a video channel for content sharing is
that the media is encrypted, and once the
firewall and network access translation
traversal works for the live stream, it
works for the presentation stream as well.
Dual-video streams
In H.323 networks, the
dual-video streams function is standardized
by H.239. The first issue with supporting
dual-video streams in SIP is describing the
content/presentation stream. In a SIP
environment, the session description
protocol (SDP) is used to describe media
stream parameters. SIP endpoints and
conferencing servers have to support RFC
4574, which defines the "label" attribute in
the SDP, and the RFC 4796, which defines the
content attribute. Next, the content stream
has to be associated with a live stream.
This can be done by supporting RFC 3388
grouping of media lines in the SDP.
The remaining issue is
how to identify who is sending the content
and who is receiving it. This is usually
done by tokens (the party that has the token
can send content), and token-management
protocols ensure there is only one token in
the session, and that anyone can request and
receive the token. The RFC 4582 binary flow
control protocol (BFCP) defines the
token-management mechanism, and can be used
for dual-video stream implementation in SIP.
Since everything has to be described in SDP,
a way also is needed to describe the BFCP
streams in SDP. This can be done by
supporting the RFC 4583 SDP format for
binary floor control protocol streams.
Video-channel control is
embedded in H.245, a sub-protocol in the
H.323 family. The protocol allows sending
messages like "flow control" from the
receiver of live and presentation streams
back to the sender of these streams, and
telling the sender to modify the bit rate,
usually to reduce the bit rate when the
receiver detects high packet loss. By
sending a "fast update" message, the
receiver asks the sender to resend a full or
intravideo frame(s), usually when a video
frame is lost in transmission.
There is still no
standard solution for replicating the video
channel control functionality in SIP. One
method is to use the SIP INFO message
because it allows easy mapping of the H.245
messages into SIP. Several vendors in the
market have embraced this approach, mainly
because interworking between H.245 and SIP
INFO is simple to implement and only touches
the H.323-SIP signaling.
Far-end camera control
(FECC) is a popular feature in visual
communications. If H.323 terminals A and B
are on a call, the feature allows terminal A
to control the camera of terminal B. The
assumption is that terminal B has a pan,
tilt and zoom camera, and has the FECC
feature enabled.
Using the conferencing server as a gateway has been considered an
alternative concept for H.323-SIP interworking.
In a group conferencing
setting, the key FECC benefit is that users
can adjust the image that they get from the
remote site, focus on a particular person or
a group of people, and then move to another
part of the room. In personal video
settings, the feature can be used to adjust
the camera if the remote party is sitting
too close or too far from the camera.
In H.323, FECC is
implemented via two standards: H.281 defines
the binary data that is transmitted between
terminal A and B to control the camera,
while H.224 defines the format of the frames
that carry the binary data.
In SIP, RFC 4573 MIME
type registration for RTP payload format for
H.224 registers the H.224 media type, and
defines the syntax and the semantics of the
SDP parameters needed to support FECC
protocol using H.224 in SIP. In effect, RFC
4573 creates a tunnel through the SIP-based
network, and allows video endpoints to
exchange H.224/H.281 information exactly as
they do in H.323-based networks.
Interworking the protocols
Interworking between the
two protocols becomes an important issue. In
general, there are two ways to bridge the
SIP and H.323 networks: through a signaling
gateway and through a conferencing
server/multipoint control unit.
SIP and H.323 are
different protocols with different message
formats but they both can be used in similar
ways. Comparing the call flows shows a lot
of similarities in the call setup and call
tear down, as well as in the mechanisms to
spontaneously exchange information during
the call.
A signaling gateway is a
piece of software that takes incoming SIP
messages, extracts the communication
parameters, creates H.323 messages and sends
them to the H.323 network. It also takes the
incoming H.323 messages, extracts the
communication parameters, creates
corresponding SIP messages, and sends them
to the SIP network. The gateway, therefore,
looks like a SIP user agent to the SIP
network and like an H.323 terminal to the
H.323 network.
Both SIP and H.323 rely
on the same protocols (real-time transport
protocol and real-time conferencing
protocol) for transmitting media streams.
The signaling gateway can then focus on
mediating between the H.323 and SIP
signaling but does not need to touch the
media. This is important, as media
processing is resource intensive. While
signaling messages generate traffic in the
magnitude of a few kilobits per second,
video media streams can be in the megabits
per second.
There are, however,
several issues with the signaling gateway
approach. First, media security is lost
because H.323-based video networks use the
advanced encryption standard, while SIP
refers to SRTP for encryption. The
encryption algorithms of these two standards
and the key exchange procedures are
incompatible. The consequence is that
deploying a signaling gateway would result
in failure of the media encryption (i.e.,
the audio and video streams will be
transmitted unencrypted).
Another issue is the
approach that requires the use of real-time
control protocol (RTCP), which is associated
with RTP media. This concept goes against
the concept of a signaling-only gateway
because H.245 messages must be mapped into
RTCP messages. There are currently no
implementations where RTCP is independent
from an RTP media stream, so media has to
traverse the gateway in order to follow this
approach.
The third issue is that
signaling gateways only address the
SIP-H.323 interworking; ISDN and PSTN have
different media and ISDN/PSTN users cannot
use this gateway to connect to the SIP
network.
Due to these limitations,
using the conferencing server as a gateway
has been considered an alternative concept
for H.323-SIP interworking. Conferencing
servers can originate and terminate H.323
and SIP calls, and have sufficient
processing power to handle the media. They
already support AES, and can add support of
SRTP encryption.
Mechanisms for video channel control that
use RTCP can be accommodated, as well, since
RTP and RTCP streams go through the
conferencing server. The main disadvantage
of this approach is that it creates a
bottleneck-even point-to-point calls between
SIP and H.323 domains have to go through the
conferencing server-and the associated high
cost of additional conferencing server ports
to support SIP-H.323 interworking.
Stefan Karapetkov is an emerging technologies director at
Polycom, Pleasanton, Calif.
For more information
(click here)