IP Networking and Audio Clarity
Because of the nature of IP networking, voice packets sent via IP are subject to certain
transmission problems. Conditions present in the network might introduce problems such
as echo, jitter, or delay. These problems must be addressed with quality of service (QoS)
mechanisms.
The clarity (that is, the “cleanliness” and “crispness”) of the audio signal is of utmost
importance. The listener must be able to recognize the identity and sense the mood of
the speaker. The following factors can affect clarity:
■ Fidelity: Fidelity is the degree to which a system, or a portion of a system, accurately
reproduces at its output the essential characteristics of the signal impressed upon
its input, or the result of a prescribed operation on the signal impressed upon its
input (definition from the Alliance for Telecommunications Industry Solutions
[ATIS]). The bandwidth of the transmission medium almost always limits the total
bandwidth of the spoken voice. Human speech typically requires a bandwidth from
100 to 10,000 Hz, although 90 percent of speech intelligence is contained between
100 and 3000 Hz.
■ Echo: Echo is a result of electrical impedance mismatches in the transmission path.
Echo is always present, even in traditional telephony networks, but at a level that
cannot be detected by the human ear. The two components that affect echo are
amplitude (loudness of the echo) and delay (the time between the spoken voice and
the echoed sound). You can control echo using suppressors or cancellers.
■ Jitter: Jitter is variation in the arrival of coded speech packets at the far end of a
VoIP network. The varying arrival time of the packets can cause gaps in the recreation
and playback of the voice signal. These gaps are undesirable and annoy the
listener. Delay is induced in the network by variation in the routes of individual
packets, contention, or congestion. You can resolve variable delay by using dejitter
buffers.
■ Delay: Delay is the time between the spoken voice and the arrival of the electronically
delivered voice at the far end. Delay results from multiple factors, including distance
(propagation delay), coding, compression, serialization, and buffers.
■ Packet Loss: Voice packets might be dropped under various conditions such as an
unstable network, network congestion, or too much variable delay in the network.
Lost voice packets are not recoverable, resulting in gaps in the conversation that are
perceptible to the user.
■ Side tone: Side tone is the purposeful design of the telephone that allows the speakers
to hear their spoken audio in the earpiece. Without side tone, the speaker is left
with the impression that the telephone instrument is not working.
■ Background noise: Background noise is the low-volume audio that is heard from the
far-end connection. Certain bandwidth-saving technologies can eliminate background
noise altogether, such as voice activity detection (VAD). When this technology
is implemented, the speaker audio path is open to the listener, while the listener
audio path is closed to the speaker. The effect of VAD is often that speakers think
the connection is broken because they hear nothing from the other end. Therefore,
VAD is often combined with comfort noise generation (CNG) to prevent the illusion
that the call has been disconnected.
The following sections cover some of these in more detail.