google search engine

Google
 

Voice Over Internet Protocol

ABSTRACT


The Voice-Over Internet Protocol (VoIP) technology allows the voice information to pass over IP data networks. This technology results in huge savings on the amount of physical resources required to communicate by voice over long distance. It does so by exchanging the information in packets over a data network.

The basic functions performed by a VoIP include – signalling, data basing, call connect and disconnect, and coding/decoding. The steps involved in originating and internet telephone call are the conversion of the analogue voice signal to digital format and compression/translation of the signal into internet protocol (IP) packets for transmission over the internet; the process is reversed at the receiving end. VoIP software’s like Vocal TEC or Net 2 Phone are available for the user. With the exception of phone to phone, the user must posses an array of equipment which should at minimum include VoIP software, an internet connection, and a multimedia computer with a sound card, speakers, a microphone and a modem.

The VoIP network acts as a gateway to the existing PSTN network. This gateway forms the interface for transportation of the voice content over the IP networks. Gateways are responsible for all call origination, call detection, analogue to digital conversion of voice, and creation of voice packets.


INTRODUCTION

The development of very fast, inexpensive microprocessors and special-purpose switching chips, coupled with highly reliable fibre-optic transmission systems, has made it possible to build economical, ubiquitous, high-speed packet-based data networks. Similarly, the development of very fast, inexpensive digital signal processors (DSPs) has made it practical to digitise and compress voice and fax signals into data packets. The natural evolution of these two developments is to combine digitised voice and fax packets with packet data, creating integrated data-voice networks. The voice-over-Internet protocol (VoIP) technology allows voice information to pass over IP data networks. Primarily, the cost savings that accrue from operating a single, shared network have motivated this convergence of telecommunications and data communications.

INTRANET TELEPHONY PAVES THE WAY FOR INTERNET TELEPHONY

Although progressing rapidly, Internet telephony still has some problems with reliability and sound quality, due primarily to limitations both in Internet bandwidth and current compression technology. As a result, most corporations looking to reduce their phone bills today confine their Internet-telephony applications to their intranets. With more predictable bandwidth available than the public Internet, intranets can support full-duplex, real-time voice communications. Corporations generally limit their Internet voice traffic to half-duplex asynchronous applications (e.g., voice messaging).

Internet telephony within an intranet enables users to save on long-distance bills between sites; they can make point-to-point calls via gateway servers attached to the local-area network (LAN). No PC–based telephony software or Internet account is required.

For example, User A in New York wants to make a (point-to-point) phone call to User B in the company's Geneva office. He picks up the phone and dials an extension to connect with the gateway server, which is equipped with telephony board and compression-conversion software; the server configures the private branch exchange (PBX) to digitize the upcoming call. User A then dials the number of the London office, and the gateway server transmits the (digitized, IP–packetized) call over the IP–based wide-area network (WAN) to the gateway at the Geneva end. The Geneva gateway converts the digital signal back to analog format and delivers it to the called party.

BASIC FLOW OF VOIP NETWORK

The VoIP networks replace the traditional public-switched telephone networks (PSTNs), as these can perform the same functions as the PSTN networks. The functions performed include signaling, data basing, call connect and disconnect, and coding-decoding.

Signaling: - Signaling in a VoIP network is accomplished by the exchange of IP datagram messages between the components. The format of these messages is covered by the standard datalink layer protocols.

Database services: - Database services are a way to locate an endpoint and translate the addressing that two networks use; for example, the PSTN uses phone numbers to identify endpoints, while a VoIP network could use an IP address and port numbers to identify an endpoint. A call control database contains these mappings and translations.

Calls connect and disconnect (bearer control): - The connection of a call is made by two endpoints opening communication sessions between each other. In the PSTN, the public (or private) switch connects logical channels through the network to complete the calls. In a VoIP implementation, a multimedia stream (audio, video, or both) is transported in real time. The connection path is the bearer channel and represents the voice or video content being delivered. When communication is complete, the IP sessions are released and, optionally, network resources are freed.

CODEC operations: - Voice communication is analogue, while data networking is digital. Analogue waveforms are converted into digital information by using a coder-decoder (CODEC).


VOICE GATEWAY

The VoIP network acts as a gateway to the existing PSTN network. This gateway forms the interface for transportation of the voice content over the IP network.

Gateways are responsible for call origination; call detection, analogue-to-digital conversion of voice, and creation of voice packets (CODEC functions). Voice (analogue and/or digital) compression, echo cancellation, silence suppression, and statistics gathering are their optional features. The gateways must also perform some of the database services, such as phone number translations, host lookup, and signaling. The extent of gateway functionalities is based on the VoIP-enabling products used. Fig. 1 shows the architecture of a typical gateway.

The DSP in a gateway is responsible for signal processing functions such as analogue- to-digital conversion of voice signals, voice compression, echo cancellation, and voice-activity detection. The functions like call origination, call detection, signaling, and phone number translations are performed by the microprocessor. Gateways exist in several forms; for example, the gateway could be a dedicated telecommunication equipment chassis, or even a generic PC running VoIP software.

VOIP QOS ISSUES

The transport of voice packets is affected by several factors, such as the amount of bandwidth available in the network connection, the delay that the packet experiences, and any packet loss or corruption that occurs. The ability of the network to deliver the voice packets quickly and consistently is referred to as Quality of Service (QoS).

Bandwidth and CODECs

In addition to performing the analogue-to digital conversion, CODECs compress the voice data stream. Compression of the voice waveform results in bandwidth savings. The output from the CODECs is a data stream that is put into IP packets and transported across the network to an endpoint. The endpoints must use the same standards as well as a common set of CODEC parameters. Use of different standards or parameters at the endpoints will lead to unintelligible communication.

The table below shows some of the coding standards that are covered by the International Telecommunications Union (ITU). Use of complex coders with higher compression ratios reduces the bandwidth consumption. But there is a price to be paid for reduced bandwidth consumption: increased conversion delay. Another way to save bandwidth is the use of silence suppression, in which voice packets aren’t sent between the gaps in human conversations. The voice-activity detection technique allows the monitoring of silence in speech data.



Packet delay

VoIP quality is also affected by the packet delay. The end-to-end packet delay in a network is a result of the incremental delays in the connection path. The use of voice CODECs adds a small amount of processing delay. A delay greater than 100 ms will interfere with normal conversation. Longer delays can cause echoes. The network delays can be reduced through a careful network architecture, equipment selection, and configuration.

The following are sources of delay in an end-to-end, voice-over-packet call:

Accumulation Delay

This delay is caused by the need to collect a frame of voice samples to be processed by the voice coder. It is related to the type of voice coder used and varies from a single sample time (.125 microseconds) to many milliseconds. A representative list of standard voice coders and their frame times follows:

· G.726 adaptive differential pulse-code modulation (ADPCM) (16, 24, 32, 40 kbps)—0.125 microseconds

· G.728 LD–code excited linear prediction (CELP)(16 kbps)—2.5 milliseconds

· G.729 CS–ACELP (8 kbps)—10 milliseconds

· G.723.1 Multirate Coder (5.3, 6.3 kbps)—30 milliseconds

Processing Delay

This delay is caused by the actual process of encoding and collecting the encoded samples into a packet for transmission over the packet network. The encoding delay is a function of both the processor execution time and the type of algorithm used. Often, multiple voice-coder frames will be collected in a single packet to reduce the packet network overhead. For example, three frames of G.729 code words, equaling 30 milliseconds of speech, may be collected and packed into a single packet.

Network Delay

This delay is caused by the physical medium and protocols used to transmit the voice data and by the buffers used to remove packet jitter on the receive side. Network delay is a function of the capacity of the links in the network and the processing that occurs as the packets transit the network. The jitter buffers add delay, which is used to remove the packet-delay variation to which each packet is subjected as it transits the packet network. This delay can be a significant part of the overall delay, as packet-delay variations can be as high as 70 to 100 milliseconds in some frame-relay and IP networks.

Jitter

The delay problem is compounded by the need to remove jitter, a variable interpacket timing caused by the network a packet traverses. Removing jitter requires collecting packets and holding them long enough to allow the slowest packets to arrive in time to be played in the correct sequence. This causes additional delay.

The two conflicting goals of minimizing delay and removing jitter have engendered various schemes to adapt the jitter buffer size to match the time-varying requirements of network jitter removal. This adaptation has the explicit goal of minimizing the size and delay of the jitter buffer, while at the same time preventing buffer underflow caused by jitter.

Two approaches to adapting the jitter buffer size are detailed below. The approach selected will depend on the type of network the packets are traversing. The first approach is to measure the variation of packet level in the jitter buffer over a period of time and incrementally adapt the buffer size to match the calculated jitter. This approach works best with networks that provide a consistent jitter performance over time, such as ATM networks. The second approach is to count the number of packets that arrive late and create a ratio of these packets to the number of packets that are successfully processed. This ratio is then used to adjust the jitter buffer to target a predetermined, allowable late-packet ratio. This approach works best with the networks with highly variable packet-interarrival intervals—such as IP networks.

In addition to the techniques described, the network must be configured and managed to provide minimal delay and jitter, enabling a consistent QoS.


Lost-Packet Compensation

Lost packets can be an even more severe problem, depending on the type of packet network that is being used. Because IP networks do not guarantee service, they will usually exhibit a much higher incidence of lost voice packets than ATM networks. In current IP networks, all voice frames are treated like data. Under peak loads and congestion, voice frames will be dropped equally with data frames. The data frames, however, are not time sensitive, and dropped packets can be appropriately corrected through the process of retransmission. Lost voice packets, however, cannot be dealt with in this manner. Some schemes used by voice-over-packet software to address the problem of lost frames are as follows:

· interpolate for lost speech packets by replaying the last packet received during the interval when the lost packet was supposed to be played out; this scheme is a simple method that fills the time between non-contiguous speech frames; it works well when the incidence of lost frames is infrequent; it does not work well if there are a number of lost packets in a row or a burst of lost packets

· send redundant information at the expense of bandwidth utilization; this basic approach replicates and sends the nth packet of voice information along with the (n+1)th packet; this method has the advantage of being able to correct for the lost packet exactly; however, this approach uses more bandwidth and also creates greater delay

· use a hybrid approach with a much lower bandwidth voice coder to provide redundant information carried along in the (n+1)th packet; this reduces the problem of the extra bandwidth required but fails to solve the problem of delay


Echo Compensation

Echo in a telephone network is caused by signal reflections generated by the hybrid circuit that converts between a four-wire circuit (a separate transmit and receive pair) and a two-wire circuit (a single transmit and receive pair). These reflections of the speaker's voice are heard in the speaker's ear. Echo is present even in a conventional circuit-switched telephone network. However, it is acceptable because the round-trip delays through the network are smaller than 50 milliseconds and the echo is masked by the normal side tone every telephone generates.

Echo becomes a problem in voice-over-packet networks because the round-trip delay through the network is almost always greater than 50 milliseconds. Thus, echo-cancellation techniques are always used. ITU standard G.165 defines performance requirements that are currently required for echo cancellers. The ITU is defining much more stringent performance requirements in the G.IEC specification.

A new concept for echo control was invented at Bell Laboratories in 1964, commonly called echo cancellation. Echo cancellation was a revolutionary departure from the previous technique of opening (temporarily disconnecting) the speech path to prevent echo signals from being returned over the long distance circuit.

Echo is generated toward the packet network from the telephone network. The echo canceller compares the voice data received from the packet network with voice data being transmitted to the packet network. The echo from the telephone network hybrid is removed by a digital filter on the transmit path into the packet network.

Two major types of information must be handled to interface telephony equipment to a packet network: voice and signaling information.

As shown in Figure, VoIP software interfaces to both streams of information from the telephony network and converts them to a single stream of packets transmitted to the packet network. The software functions are divided into four general areas.

Voice Packet Software Module

This software, also known as the voice-processing module, typically runs on a digital-signal processor (DSP), prepares voice samples for transmission over the packet network. Its components perform echo cancellation, voice compression, voice-activity detection, jitter removal, clock synchronization, and voice packetization.

Telephony-Signaling Gateway Software Module

This software interacts with the telephony equipment, translating signaling into state changes used by the packet protocol module to set up connections. These state changes are on-hook, off-hook, trunk seizure, etc. This software supports ear, mouth, earth, and magneto (E&M) Type I, II, III, IV, and V; loop or ground start foreign exchange station (FXS); foreign exchange office (FXO); and integrated services digital network (ISDN) basic rate interface (BRI) and primary rate interface (PRI).

Packet Protocol Module

This module processes signaling information and converts it from the telephony-signaling protocols to the specific packet-signaling protocol used to set up connections over the packet network (e.g., Q.933 and voice-over-frame relay signaling). It also adds protocol headers to both voice and signaling packets before transmission into the packet network.

Network-Management Module

This module provides the voice-management interface to configure and maintain the other modules of the voice-over-packet system. All management information is defined in American National Standards Institute (ANSI).1 and complies with signaling network-management protocol (SNMP) V1 syntax. A proprietary voice packet management information base (MIB) is supported until standards evolve in the forums.

The software is partitioned to provide a well-defined interface to the DSP software usable for multiple voice packet protocols and applications. The DSP processes voice data and passes voice packets to the microprocessor with generic voice headers.

The microprocessor is responsible for moving voice packets and adapting the generic voice headers to the specific voice packet protocol that is called for by the application, such as real-time protocol (RTP), voice over frame relay (VoFR), and voice telephony over ATM (VToA). The microprocessor also processes signaling information and converts it from supported telephony-signaling protocols to the packet network signaling protocol [e.g. H.323 IP, frame relay, or ATM signaling].

This partitioning provides a clean interface between the generic voice-processing functions, such as compression, echo cancellation, and voice-activity detection, and the application-specific signaling and voice protocol processing.



VOIP APPLICATIONS

A wide variety of applications are enabled by the transmission of VoIP networks. This tutorial will explore three examples of these applications.

The first application, shown in Figure 1, is a network configuration of an organization with many branch offices (e.g., a bank) that wants to reduce costs and combine traffic to provide voice and data access to the main office. This is accomplished by using a packet network to provide standard data transmission while at the same time enhancing it to carry voice traffic along with the data. Typically, this network configuration will benefit if the voice traffic is compressed as a result of the low bandwidth available for this access application. Voice over packet provides the interworking function (IWF), which is the physical implementation of the hardware and software that allows the transmission of combined voice and data over the packet network. The interfaces the IWF must support in this case are analog interfaces, which directly connect to telephones or key systems. The IWF must emulate the functions of both a private branch exchange (PBX) for the telephony terminals at the branches, as well as the functions of the telephony terminals for the PBX at the home office. The IWF accomplishes this by implementing signaling software that performs these functions.



A second VoIP application, shown in Figure 2, is a trunking application. In this scenario, an organization wishes to send voice traffic between two locations over the packet network and replace the tie trunks used to connect the PBXs at the locations. This application usually requires the IWF to support a higher-capacity digital channel than the branch application, such as a T1/E1 interface of 1.544 or 2.048 Mbps. The IWF emulates the signalling functions of a PBX, resulting in significant savings to companies' communications costs

A third application of VoIP software is interworking with cellular networks, as shown in Figure 3. The voice data in a digital cellular network is already compressed and packetized for transmission over the air by the cellular phone. Packet networks can then transmit the compressed cellular voice packet, saving a tremendous amount of bandwidth. The IWF provides the transcoding function required to convert the cellular voice data to the format required by the public switched telephone network (PSTN).




No comments:

google search engine

Google