Ha Noi open university
Center For International training Co-operation
Thesis:
Teacher : Nguyễn Thái Nguyên
Group 3 : Đồng Xuân Thắng -Cap
Lê Trọng Nghĩa
Nguyễn Xuân Tư
Mai Trọng Dũng
Bùi Thanh Nhàn
Ngô Thị Nhàn
Hà Nội ngày 15/1/2003
Glossary
ATM : Asynchronous Trasfer mode
ACELP : Algebraic Code Excited Linear Predictive
ARQ : Automatic Rrepeat Request
ACF : Admission Confirm
DES : Data Encryption Stadard
PSTN : Public Switched Telephone Network
PC : Personal Computer
77 trang |
Chia sẻ: huyen82 | Lượt xem: 1603 | Lượt tải: 0
Tóm tắt tài liệu Developping Service VOIP, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
PCM : Pulse Code Modulation
IP : Internet Protocol
ITU : International telecommunication Union
IETF : Internet Engineering Task Force
ISUP : ISDN User Part
INAP : Intelligent Network Application Part
ITSP : Internet Telephony Service Provider
MAP : Mobile Application Part
MGCP : Multimedia Gateway Control Protocol
MTP : Message Trasfer Part
MP : Multi point
MCU : Media Control Unit
OLC : Open Logical Channel
QoS : Quality of Service
RC : Report Court
RSVP : Resource Reservation Protocol
RTCM : Real Time Control Mode
RTP: Real Time Post
SIP : Session Initiation Protocol
SS7 : Signal No.7
SCCP : Signaling Connection Control Part
STP : Signaling Transfer Point
TCP : Transmission Control Protocol
TCAP: Transaction Capabilities Application Part
UDP : User Data Package
VAD : Voice Activity Detector
VoIP : Voice over Internet Protocol
General of the thesis
VoIP -Voice over Internet protocol
VoIP ( Voice over IP- that is, vioce delivered using the Internet Protocol) is a term used in IP telephony for a set of faccilities for managimg the delivery of voice information using the Internet Protocol(IP). In general, this means sending voice information in digital form in discrete packets rather than in the traditional circuit – committed protocols of the public switched telephone network (PSTN). A major advantage of VoIP and Internet telephony is that it avoids the tolls charged by ordinary telephone service.
VoIP, now used somewhat generally, derives from the VoIP Forum, an effort by major equipment providers, including Cisco, Vocltec, 3 Com, and Netspeak to promotethe use of ITU-T H.323, the standard for sending voice (audio) and video using IP on the public Internet and within anintranet. The Forum also promotes the user of directory service standard so that user can locate other users and the use of touch-tone signals for automatic call distribution and voice mail.
In addition to IP, VoIP uses the real-time protocol (RTP) to help ensure that packets get delivered in a timely way. Using public networks,it is currently difficult to guarantee Quality of Service (QoS). Better service is possible with private network managed by an enterprise or by an Internet telephony service provider (ITSP).
A technique used by at least one equipment manufacturer, Netspeak, to help ensure faster packet delivery is to Packet Internet or Inter- Network Groper (Ping) all possible network gateway computeres that have access to the public network and choose the fastest path before establishing a Transmission Control Protocol (TCP) sockets connection with the other end.
Using VoIP, an enterprise positions a “VoIP device” (such as Cisco’s AS5300 access server with the VoIP feature) at a gateway. The gateway receiver packetixed voice tranmissions from users within the company and then routes them to othe parts of its intranet (local area or wide area netnork) or using a T- carrier system or E-carrier interface, sends them over the public switched telephone network (PSTN)
Chapter1:Voice over IP (VoIP) Technology
1. Fundamental features of channel switching network and Internet:
1.1. Fundamental features of channel switching network:
The channel switching network is designed for rapid connect and eliminating the ineffectiveness of time-consume on connecting. In the channel shifting network, the user is provided a conductive channel to exchange information together. When the exchange completed, the conductive channel is released. This could lead to loss because of limits of conductive channel. The utility is low but ensures the calling quality because a two-way 64 kbps channel is set aside for caller and receiver. The channel shifting network is designed optimum for real transmission time with high service quality. In the channel switching network, all terminal equipment and switch board are inserted a fixed number so no need to enter address for information exchanging process. The switching system in channel switching network will base on the address of called subscriber to define the conductive line. Because the band width is ensured not be changed during calling, calling fee of channel switching network is based on distance and calling time.
1.2. Fundamental features of Internet:
Internet is the package switching network suitable with applications that are not exchanged according to the real time; Package delay doesn’t effect strongly on service quality like email and file transmission. Package switching networks don’t set aside a fixed line between two users, so, not ensure the service quality. All information on the network are divided into packages, these packages contain the destination address and its order.
Channel fixer and host on the network will send these packages to the targeted address. On Internet, all packages are treated the same with out distinguishing their contents. When packages to the destination address, they will be arranged according to the initial number. By form of package information transmission, the utility is maximum. However, real time applications will be greatly effected on service quality. The fee is not calculated on distance or time but on used band width. On Internet, on address of package is marked by IP address, the IP address will be named for the host and terminal stations. Channel fixing will be controlled by the IP destination address. To create a understandable, convenient address type for the IP address by name like service of regional name or email address.
Because the limit of IP address, the users are temporarily inserted IP while dialing. The IP address is only for one terminal equipment while connecting Internet and deleted while not connecting. The deleted IP address will be used for another connecting on the network.
1.3. Advantages of VOIP against PSTN:
The users will pay for used time of PSTN if more time for call establishment, more increased fee to be paid. At one time, they can contact to one person. But with VoIP, the time for call establishment is independent to subscriber’s fee. One subscriber could have calls to different ones and exchange data, dialogue, pictures, paintings and video with other subscribers.
Figure 1: The basic structure of telephone network by IP
1.4. Outlook of VoIP technology:
+ Some technical features of IP telephone:
By analysis of fundamental features of channel switching network and Internet, we see that it is typical to accumulate real time signal into the package switching network and IP telephone. Firstly, we should classify IP telephones. All IP telephones change according to 3 characters: type of terminal equipment, position of gateway, between IP and PSTN networks and main transmission equipment.
a. Terminal equipment and gateway: There are 03 main types of IP. They are PC to PC, PC to Phone, Phone to Phone.
+ PC to PC is the first model of IP telephone. Users at two ends of PC to PC should have 1 PC that is equipped audio, a software and connected to Internet. This service no need gateway and PCTN because PCTN never switch these calls, the main transmission tool is public Internet. Due to sound quality and complexity of use, the PC to PC has a litter affect on traditional telephone service.
+ PC to Phone expands the number of users but for exploiters, the call of PC to Phone is more complex than that of PC to PC.
+ Phone to Phone is very important market including mainly commercial services, because, people prefer to communicate by phones. However, the 3rd model of IP requires more investment capital because it needs input gateway to PSTN near places providing service. Services of Phone to Phone are nearly similar to that of traditional telephones.
b. Transmission equipment: The classification between IP and VoIP telephone is based on the nature of main transmission equipment. IP telephone is for voice transmission, fax and services relating to package switching networks on IP. Internet phone and VoIP are basic types of IP. Internet phone is IP in which the main transmission network is public Internet (global super-network).
Voice over IP is IP in which the main transmission network is private-used one basing on IP.
Besides, being the replacing tools for distance and international phone, the IP technology creates a plenty of other services that can transmit every service by IP. This part only mentions the technology of VoIP and interests in the terminal equipment that is telephone on the channel switching network (Phone to Phone).
Figure 2: IP call: Phone to Phone
+ Special features of VoIP:
a. Adjustable quality: The quality of VoIP depends on each part (coding and low speed re-coding for each part). Internet is not specific service network, the exchanging methods are entirely selected by terminal systems. Thus, the terminal systems can control the compressed volume on the network bandwidth or content for transmission.
b. Security: Using SIP to order a password and confirm messages indicating the terminal. RIP make and the password to be the password of transmission method. Therefore, all program is coded to secure transmission.
c. Users interface: Terminal systems of VoIP have plentiful indications and can give out instructions and various graphic interface.
d. Connecting telephone and computer: Available to solve these complex connections.
1.5. Conclusion:
The VoIP technology has potential for future development, ability to replace the existing PSTN network. Due to differences in features of channel switching network and Internet, to apply VoIP for users of channel switching network (Phone to Phone), these differences should be solved. Concretely, there should be address changes, indication of two networks and proper inter-code for application of time transfer on network.
2. Problems relating to VoIP technology and talk quality on VoIP:
Using the traditional channel switching telephone network will cost much when at distance, to reduce expenses for distant calls, use public data network or private data network for communication. The package switching network that applies IP is example. Using the package switching network by IP to transmit the talking signals. Voice over IP-VoIP is good basis to design global multi-instrument transmission system that can replace the infrastructure of existing network. Accumulating Audio, Video, data, fax... into a single common network on IP technology. It is possible to apply the Frame relay or unsynchronous transmission technology ATM to replace IP technology. The VoIP is more economic for distant call, because the fee is calculated by the width of bandwidth, not by distance. In IP, it uses talk compressing technology to save band width leading to cost reduction but the IP’s quality not as good as that of PSTN.
The biggest difference when applying into the multi-instrument network is actual time service non-actual one. With actual time service and like Audio, Video... not allow over-delay on the network; in non-time network like email, file transmission, the delay is not worthy worrying. So, to carry out VoIP, special compressing and coding methods should be used to reduce the speed of talk signals that can’t be use 64 tps like channel switching.
2.1. Coding techniques and talk signal compression:
In talk transmission, voice is usually numberidized and coded PCM by Rule A or U with speed of 64 Kps recorving sound rather actual. For some specific applications such as transmitting talk signals on TP network, sounds are transmitted with lower speed, so, there should have coding techniques and talk signal compression to lower speed according to standard of ITU and ETSI like G723.1; G729; G729A; GSM.
+ Standard G7213. According to the standard of ITU, the coding has 5.3Kbps and 6.3Kbps. The compression technicque uses MP-MLQ for high bit speed; for coding with low bit speed using ACELP. Delaying against algorithm is 67.5ms.
+ Standard G.729. According to the ITU standard, this coding has speed of 8 Kbps. This compression techniques uses algorithm predicting coded linear linked structure algebra excitation. Delaying against algorithm is 25ms.
+ Standard GSM06.10. According to ETSI, this code has 13Kbps. This compression technique is regular pulse excitation and long-term predictor. Delaying against algorithm is 40ms.
2.2. Voice Activity Detector (VAD):
VAD is carried out by numeric signal processor to reduce the talk intensity that is transmitted by automatically detecting the dead space on the talk and stopping transmitting at that time. There are space approx. 50-60% of almost talks. This always occurs because when one speaking, the other must listen to. VAD allows band width for dead space saved for reserving other data.
VAD actives by controlling power of talk signals; power change is change of talk signal frequency. The difficult of VAD is to define the exact time of talk ending and of talk signal. The double VAD is nearly 200ms after recognizing talk signals and stop and detect package processing. This top prevent VAD from missing the end talk or in the middle of small interrupt in talks.
2.3. Number and address:
Due to cooperation between IP and SCN networks, there will be 2 types of address: address in CSN and in IP.
a. Numbering on SCN network:
On the channel switching network, all terminal and switchboard are fixed a number. Number E164 is telephone numbers subject to the structure and numbering program that were described on the proposal E164 by International Telecommunication Union. The line fixing process on the channel switching network is controlled by the address system of E164. Before dialing, the users of channel switching network have to dial E164 and callee’s number.
+ Local number:
Code of Access Caller + National Post + National Destination Code +
+ Subscriber number.
+ For international numbers, we can use 03 following structures:
Code of Access Caller + International Post + Country Code + Identification Code + Subscriber number.
Code of Access Caller + International Post + Country Code + Destination Code + Subscriber.
Code of Access Caller + International Post + Country Code + Global Subscriber’s Number.
b. Numbering on IP:
+ Prefix is an identifier including one or more numbers allowing the used numerical types, network and service and can be used to select service provider, type of service in a nation.
+ Selecting service provider including numbers that allow to select service by IP network or SCN and there of to select appropriate switching.
+ Selecting service provider can be done by ways: pre-select by user or dialing, password.
Incase, the Gateway connects to SCN where there are a lot of service providers, both Gateway and Gatekeeper should be able to identify and process the selected code of service provider. Incase, a lot of service providers on IP network, Gatekeeper is able to identify and process the selected code of service provider.
To get the most common address types on Internet, it can use name address like email address: user@domain, user@host, user@IP-address, phone-number@gateway.
2.4. Fee:
To ensure the effectiveness of network, the fee calculating will be done by a separate host system. The fee-calculator host will be responsible for collecting and reserving all detail of call from gateway or MGC. These data are used to make invoices for customers. Customers ca access into the host for their fee details on the website. The fee will be calculated by the used time. The fee calculating system should be able to calculate on 2 types of service: pre-paid and post-paid. This software must be able to carry out some following function.
- Accepting call.
- Informing the amount of account.
- Fee calculating based on pre-fixed level for different directions.
- Informing the maximum time of call.
- Updating account’s amount after calling.
2.5. Signal cooperation:
The standard of signal communication of IP Phone to PSTN is suggested to be signal No. 7 (SS7). The SS7 is used to transmit following information:
- Information o call establishment.
- Information about call control.
- Property and application.
The signal communication between 2 IP networks and signal network 7 of PSTN is carried out by signal Gateway. The signal gateway connects to STP on the SS7 as a SP and transfer signals fully. The signal Gateway should support signal news ISUP and SCCP/TCAP.
Using the signal communication No. 7, IP telephone network will bring benefits as follows:
- Fully connecting to PSTN.
- Supplying additional services.
- Improving call control.
- Improving maintaining property for trunk.
- Speeding up call establishment.
Although new signaling, ssuch as H.323 ans SIP, exist for VoIP net works the standard in traditional telephony and in mobile networks is SS7. Therfore, if a VoIP based network is to communicate with any traditions network, not only must it network at the media level through media gateways, it must also interwork with SS7. To support this, the IETF has developed a set of protocols known as Sigtran.
In order to understand Sigtran, it is worth considering the type of inter working that needed to occur. Imagine, for example, an MGC that control one or more media gatways. The MGC is a call control entity in the network and, such as uses call control signaling to and from other call control entities. If other call control entities use SS7 then the MGC must use SS7 at least to the extent that the other call control entities can communicate freely with it. This means that the MGC does not necessarily need to support the whole SS7- just the necessary application protocols.
Consider figure 3 which shows the SS7 stack. The bottom three layer are called the Message Transfer Part (MTP). This is set of protocols responsible for getting a particular SS7 message from the source signaling point to the destination signaling point. Above the MTP we find either the Signaling Connection Control Part (SCCP) or the ISDN User Part (ISUP). ISUP is generally used for the establishment of regular phone calls. SCCP can also be used in the establishment of regular phone calls but it is more often used for the transport of higher layer applications, such as the GMS Mobile Application Part (MAP) or the Intelligent Network Application Part (INAP). In fact most such application use the services of the Transaction Capabilities Application Part (TCAP) which in turn uses the services of SCCP.
Application Part
ISDN User Part
(ISUP)
Transaction Capabilities
Application Part (TCAP)
Signaling Connection Control Part (SCCP)
MTP Level 3
MTP Level 2
MTP Level 1
Figure 3: SS7 Stack
SCCP provides an enhanced addressing mechanism to enable signaling between entities even when those entities do not know each other’s signaling addresses (known as point codes). This addressing is known as global title addressing. Basically it is a means wherby some other address, such as a telephone number, can be mapped to a point code, either at the node that initiated the message or some other node between the originator and destination of the message
Figure 3 provides some examples of communication between different SS7 entities. Consider scenario A. In this case, the two entities, represented by point code 1 and point code, communicate at layer 1. At each layer, a peer to peer relationship exists between the two entities. Scenario B has a peer to peer relationship at layer1, layer 2, and layeer 3 between point codes 1 and 2, 2 and 3, and 3 and 4. At the SCCP layer, a peer to peer relationship exists between point codes 1 and 2 and between point codes 2 and 4.
At the TCAP and Application layers, a peer to peer relationship can only take place between point codes 1and 4. In other works, the application at point code 1 is only aware of the TCAP layer at point code 1 and application layeer at point code 4.Similarly the TCAP layer at point code 1 is aware only of the application layer above it, the SCCP layer below it, and the corresponding TCAP layer at point code 4. It is not aware of any of the MTP layer. Equally, if we consider communication between point code 2 and point code 4, the SCCP layer at each point code knows only about the layeer above (TCAP), the layer below (MTP3), and the corresponding SCCP peer. As far as the SCCP layers are concerned, nothing else exists. Therefore, SCCP neither knows nor eares that point code 3 exists. Consider Scenario C, where ppoint code 3 is replaced by a gateway that supports standard SS7 on one side and an IP based MTP emulation on the other side. Point code 4 does not supportr the lower SS7 layeers at all- just an MTP emulation over IP. Provided that the MTP emulation at point code 4 appears to the SCCP layer as standard MTP, then the SCCP layer does not care, not do any of the layers above SCCP. Equally the SCCP layers at point code 1 and 2 do not care. Consequently, it is possible to implement SS7 based applications at point code 4 without implementing the whole SS7 stack. This is the concept behind the Sigtran protocol suite.
ISUP
MIP3
MIP2
MIP1
ISUP
MIP3
MIP2
MIP1
Point code 1 Point code 4
Scenario A - Communication Between Adjacent Signaling Points
Application
Application
ICAP
ICAP
Sccp
sccp
sccp
MTP3
mtp3
MTP3
MTP3
MTP2
mtp2
MTP2
MTP2
MTP1
mtp1
MTP1
MTP1
Point code 1
Point code 2
Point code 2
Point code 4
Scenario B - Communication Between non- Adjacent Signaling Points
Application
Application
ICAP
ICAP
sccp
sccp
sccp
MTP3
mtp3
MTP3
MTP
MTP
MTP2
mtp2
MTP2
emulation
emulation
MTP1
mtp1
MTP1
over IP
over IP
Point code 1
Point code 2
Point code 2
Point code 4
Figure 4 Example SS7 Communication Scenarios
2.6. Confidence:
The IP service active on the base of IP switch, the requirement of confidence is very important for:
+ Protecting exploiters from bad activities.
+ Protecting exploiters from network troubles by faults of network components.
+ Protecting users from bad activities.
To ensure above targets, the network should protect for 5 following services:
- Confirmation.
- Acceptance.
- Refuse.
- Privateness.
- Security.
An IP system can have one oral above services depending on each specific case and even each specific subscribe. For network exploiers, protecting important information from illegal access is put on top. Below are some suggestions:
+ Data Coding: This is the most effective practical method to protect information that are transmitted through different networks. Regularly, the data is compressed by different standards by Gateway, may be, no need to code the data. If necessary, information on network are advised to code by DES (Data Encryption Standard) with the key of minimum 56 bit long.
+ Anti-virus: Virus can cause significant consequences to the software of all system. Virus could be spread from other system or customers’. This also carries significant meaning when the system operates on base of the dispersion processing structure. Anti-virus software should be installed on Gateways and hosts of gatekeeper.
+ Using Firewall: This is important method to protect the network of exploiter. There are 2 basic mechanism of Firewall are to stop information and allow Firewall information to-
- Stop all coming data except the resource is confirmed.
- Release all data except for propaganda and regional checking data.
Even, using firewall is effective, to ensure high confidence, coding and confirming methods should be used.
+ Confidence for distant access:
To control distant access, there are following methods:
- Confirmation: Distant subscribers should be controlled.
- Access Limit: Fixing each distant subscriber a specific position on server.
- Time limit: Fixing connecting time, if it is over, connecting will be cancelled.
- Connecting limit: Limiting on connecting times and starting points of connect.
+ Confidence policy:
Confidence plan should include following elements:
- Definition of access levels regulating user to access into relevant resource.
- How a subscriber on subscriber group access into the network.
- Access Regulations: Time, place and how to use services.
- Instructions for fee calculation.
- Requirement on network accessing and connecting.
- Ability to strengthen confidence methods in specific cases.
- Instruction on confidence for users.
2.7. Troubles relating to calls quality:
+ Delay:
- Algorithms delay: This is caused by Codec and naturally - created by coding algorithm.
- Package delay: This is necessary time to delivery a IP package. And also suffer from delay when passing saving equipment and transition equipment, for example, passing line fixer or switcher.
- Wave transmission delay: This is necessary time for optical or electric signals on transmission environment to certain geographic distance.
- Structuring delay: This is delayed time created by different components in a transmission system. For example, a frame across a line fixer should move from Gate to Door across server body. There is a minimum delay through server body and changeable delay by in line and processing of line fixer.
+ Echo suppression.
The first trouble caused by the delay is echo impact. The echo can be occurred on a talk network by chain-jointing between the listening and speaking parts of the complex. This delay is called auscosic delay. This also occur when a part of power energy is reflected to the speaker by a exotic line in PSTN, that called echo.
If the time of one-way delay or terminal delay is short, every echo created by talk line is back to the speaker rapidly and non-noticeable. In reality, no need echo suppression if one-way delay is smaller 25ms. However, the one-way delay of VoIP almost over 25ms, so the echo suppression is required.
+ Superposition of voice
If the best ability of echo suppression, switching 2-way talk become very difficulty when the delay is too long causing voice superposition. This occurs when one party reduces voice of the other when the delay is too large.
+ Jitter - Changeable delay.
While phone services require to transmit according to the fixed delay, the data network that badly transmit and can’t supply the fixed delay because different packages have different delay, so, different delay frame. Resources create regularly frames, the Destination gate can’t collect these frames regularly because of Jitter. Jitter interrupts the call and difficult the talk content. To remove the changeable delay, it should receive frames and keep them for enough time. So that the latest frames come timely for reading in order. The buffer can remove the fitter. No worry on this for PSTN, because, the bandwidth is fixed. Volume of Jitter is more big, the longer frame kept on the buffer and create more time delay. If the Jitter is small, use small buffer. If Jitter increased by increase of loading, the size of buffer will automatically increase.
Packages will destine after some fixed time (for example, after 20ms). Incase of Jitter, this is not true. The figure below illustrates the Package 1 (P1) and package 3 (P3) coming timely; but Package 2 (P2) and Package 4 (P4) late for 12ms and 5ms against expected relatively.
Figure 5: Jutter description
+ Package loss:
The IP network doesn’t ensure to fully and orderly distribute packages. Package will be lost if blocking (be broken by transmission line or insufficiency in capacity). Due to, the sensitive of talk transmission, the transmission rules are based on TCP, it will be no effective. If talk sample is lost on the terminal talk, ignore the gap at this line. If too much package are lost, the voice will be broken. To cover, replaying previous packages. This is only done if some samples are lost. In case of group faults, take interpolation by using previous packages and re-coding set will product what lost package is. In reality, to apply IP network for high service like video, mobile and high-quality talks, another signal system is required to solve this, it is signal system No. 7.
+ Bandwidth:
A traditional talk uses a 64Kbps flow. When the talk flow is on IP network, it will be compressed and numericalizied by Digital signal processor. This compression reduce speed of talk to 5.3Kps for a talk, then, packed into IP network, IP/UDP/RTP starters are added. This large the band width for each call (about 40 byte for each package). However, technology for example, for compressing the RTP starter may reduce the IP starter to 2 bytes. The bandwidth depends on byte coding speed and talk package size. The private IP network has more advantages than Internet does because of more bandwidth so, voice quality is better. Defining the bandwidth on the network, number of call at peak time. VoIP can reduce the bandwidth by talk signal compression and dead suppression.
3. Transfer modes:
TCP and UDP are two modes for data transmission on IP network.
+ TCP is good protocol for data transmission that can control flow and block, protect from over-loading on the network. However, there are some unfavorable matters when using the TCP mode. Due to the reliability of leyte service and retransmission of lost packages increasing the delay of network. TCP has a lot of properties and complexity, this is not benefit for VOIP technology. When transmitting talk signals, they should be distributed to users at the same time. On TP network, there should have effectiveness for distributing multi transmit-feedback data, however, TCP can’t supply this. If the data are distributed to destinations on TCP, single TCP will be required to connect causing cost of bandwidth.
+ UDB is protocol simpler than TCP, just an expanded ID mode, only used when no requirement for high quality service. This protocol has advantage that no waste of time for re-transmission of lost packages. ._.
It can use the property of multi-transmit and feedback and save bandwidth when data sent to a lot of destination. UD Palso has disadvantages, no synchronous mechanism and no means to control flow and block. To solve this matter, cooperate UDP and modes controlling the real time.
3.1. Real time mode:
3.1.1. Real Time Post:
RTP can distribute among terminals of real time services like audio, vide. The typical RTP is used to transmit data through UDP (User’s Data Package). RTP and UDP supply functions of protocol transfer. UDP supplies multi-elements and error checking service. RTD is also used with other transfer protocol. When a host desires to send a package, it should know transmission measure to make package shape, add the specific transmission measure into the title of package to pre-decide the RTP’s title and put into the lower layer transmission measure. Then, send to network by multi transmit-feedback or single transmit-feedback ways to other participants.
Format of RTR fields are described as follows:
P header
20 bytes
DP
8 bytes
TP header
12 bytes
CODEC sample
Figure 6: News on real time Post Mode
Fields of RTP header are:
+ Version (V, 2 bytes) defines version of RTP.
+ Padding (P, 1 byte). If padding is installed, a package contains one or more Octet padding adding to the terminal that not belong to pay load. The final Octet of padding includes number of ignored octet padding. Padding may need more other coding algorithms with changeable sizes of block or bring some RTP packages in low layer data unit mode.
+ Extension (X, 1byte). If X byte is fixed, Fixed Header will allow Header have an extension.
+ CSRC Count (CC, 4bytes) CCRS Count include some CSRC defining quantity of resource participants, shown on Fixed Header.
+ Marker (M, 1byte) Marker is defined by a profile, it means to allow signal, events like marking frame margin on information package. M Byte supplies information to re-create and release package in case of defining the first package on released voice.
+ Payload (PT, 7bytes). Fixing the transmission measure (Editing, changing the bandwidth to be sufficient for transmission on each travel). RTP and detailed description.
+ Sequence number (16 bytes). Sequence number increases each value for each data package sent by RTP, and search for lost packages and recover them in order.
+ Time stamp (32 bytes). Time Stamp feedback a sample for the first octet on RTP data package. This sample should be taken from a information package by a simple o’clock and linear in a period for synchronization.
+ SSRC (32 bytes). In case SSRC defines, show out the synchronous resources, this definition is selected at random to avoid two synchronous resources in one RTP session.
+ CSRC list (0 - 15times, 32 bytes/field): CSRC list defines, show resources for load (volume) in information package. The quantity of fixed sets is recorded on the CC field. If there are more than 15 resources for information package, only 15 set are defined. CSRC show out and insert, using SSRC to define contributing resources.
+ RTR Header Extension (variable length). An optional extended mechanism is supplied with RTP allow each implementation to test new functions requiring more information on RTP Header.
3.1.2. Real Time Control Mode:
The RTCP is the basic to control continuous transmitted packages to participants on communication session by using the same distribution mechanism for data packages. The low modes must pill up data packages and control by using different port number and UDP. Functions of RTCP are described as follows:
+ Supplying feedback on the distributed data quality. This is major part of RTR; the protocol transports and relates to the flow controlling function and block of other controlling mode. Feedback are very useful for controlling coding sets. However, testing with IP multicast also give out results against transmission of feedback from the receiving end to diagnosing distribution errors. Sending a feedback to all supervising points to define problems, errors by local or central. By distribution mechanisms like IP multicast, can do for each unity as service providers and not be attracted into other aspects on communication sessions, receive feedback and act like the 3rd representative to diagnose network errors.
+ Bring a fixed load to RTP resource called C.Name. When SSRC is defined to be changeable if a conflict is found or the program is reset; required receiving points of C.Name keep way for terminal. Receiving points also require C.Name conjugate to data lines from each giving point in mutual relations on RTP session, for example, audio, and video.
+ Two first functions require all points participating into communication session send RTCP, so, the speed must be controlled by RTP to arrange a great number of communication points. Each communication point can send information package to other points, each point can supervise independently to others.
+ An optional function for a minimum post session, like fixed communication points to display on user’s interface. It seems very suitable, useful on loosely control sessions where communication points in and out don’t need member controlling measures or negotiation parameters. RTCP serves as a useful channel to reach to communication points. But it thinks that not necessary to satisfy all transmission control required by application.
There are 5 package identifiers:
- SR: Sender’s news is created by users, they also send transmission measures (RTP resources). They describe the sent data quality like correlation with time stamp, RTP sample and absolute time for synchronize different means.
- RR: Receipt’s news is to create components participating into RTP session. They’re receiving transmission measures. Each such news contains a block for each RTP. Each block describes a immediate coefficient and the fitter (like phase drift) from this sources. The tensioning block shows the final label and the delay from receiving sender’s report, allow resources estimate their distances.
- SDES: Resources labeled packages for controlling session. It include C.Name, the unique identification like frame an entail address. The standard name used to resolve conflict in synchronized source value and deferent combined communication protocol current is created by such user. SDES packages also identify members through its name, email and talk data, supplying simple control form .
- BYE. If an user leaves participating in own-self session with BYE message, so each member can know the total members participated in.
- APP: Specific applying elements (APP) to add give further specific information in to packages .
Identifying header park as follows:
0 8 16 31
V = 2
P
RC
PT
Length
Figure 7: Preface of RTCP
- Version (2 bit) determining version of RTCP. At present it’s installed equal 2.
- Padding (P.1 bit) when being installing, it will determine RTCP information package, including some octet added at end part of control information the latest octet of padding part, will count, how many octet added are left. Padding may be uses by some cipher algorithm with the sire of data block changed. In permissible RTCP information package, padding can be required on final information package because information packages will couple complete code.
- Reception Report Court (RC). Volume of reception report Court will lump all to RTCP package value is equal zero is legal. May have to 200 constant determinations of RTCPSR package.
- DT: Determining the load type is which information in 5 kinds of newscast.
- Length: The length of news package is a number of 16 bit, including header and added padding.
3.1.3. RSVP:
RSVP not provide separate transmission protocol but still use IR, RSVP is only control protocol to supply quality of service (QoS) ensuring to the application. The host transmit data that need to reach any QSS, it send the call to destination address owning to newscasts include information on character and flow. RSVP not is line-de fixing protocol, simple selects a most optimal line. This can’t give an ideal QsS. RSVP is an important instrument to QsS, but not resolve all necessary problems related to QsS. On the transmission to aim of RSVP allows router save information on transmission newscast, with this way, its used when prior keeping data from newscast sender on the transmission line. When the user receive, transmission line newscast, it can decide to receive data or not of the sender with QsS fixed. For meeting QoS fixed, RSVP will send periodically a requirement of prior keeping under prior keeping line of transmission line newscast.
Precise prior keeping line attained owing to information in RSVP. The prior keeping newscast includes 2 parts: containing QsS that collector wants to reach and describe data pack but will be received by that QoS. There are some prior keeping types are supplied by RSVP. A host receives data from some sources that can set forth on prior keeping requirement to distribute separate communication band to each source. It maybe one communication band is shared to every user in case of on-line discussion often only has 01 person talk at a timing date.
3.1.4. Conclusion:
This chapter has mentioned main issues in VoIP technology. To have a talk network in IP complete, it needs to have standards of multi-means telecommunication, it will mention in next chapter.
4. Introduction of standards:
4.1. Introduction of standards:
For standard of multi-means telecommunication bring pack base including both VOIP and standards related to telecommunication. International telecommunication organization (ITU-T): set forth recommendations H323. H2323 is complicated protocol and not ensure good quality of service (QoS).
The technical expert force of Internet (IETF) has set forth 2 standard of simple protocol standard more than SID and MGCP and H.248 of ITU will provide quality more ensure and more flexible MGCP and H248 will provide to H323 and STP.
The relationship between reference protocolls OSI with functions and protocol in VoIP.
Layer order
Name of class OSI
Functions and protocol in VOIP
7
Application
Contact with network
6
Presentation
Code
5
Session
H323/SIP/MGCP/H248
4
Transport
RTP/RTCP/TCP/UDP
3
Network
IP
2
Data Link
Frame relay, ATM, Ethernet
1
Physical
Bit flow
Figure 8: Relationship between OSI protocoll and protocols in VOIP
4.2. Standard H323:
4.2.1. Introduction on H323:
Standard H323 is a heart technology determines compositions, protocol and procedure, to provide to real time multimedia telecommunication service such as talk, image, data through pack switch network which relies on IP protocols. H323 can apply for multi point telecommunication and provide many kinds of service, so, it may be applied for many fields.
4.2.2. H323 elements:
Standard H323 includes 4 kinds of fixed elements, when they work together in network they will provide multi media telecommunication service from point to point, and point to multi points. In the terminal, Gateways, Gatekeepers, multi point controller unit (MCU). A Proxy H323 is the fifth composition may in the rate with protocol activity.
Figure 9: Structure H323
+ Terminal: Using to multi-media telecommunication two-way-real time. The terminal can an individual computer or independent equipment are using H323. All terminal equipment must implement the transmission of sound to fundamental service provided by the terminal H323 being talk telecommunication. All terminal equipment must by provided H245, Q931, RAS, KTP, H245 with capacity to control means flow, H225 (origin from Q931) use to control the signal of call, establishing and deleting a call. RAS is used at the terminal to register/permit/state, to be a protocol used to contact to gatekeeper. RTP / used as a protocol transferring means for bringing talk flow. The compositions not oblige of terminal H323 are video pressed standards T120, conference data protocols and MCU properties.
+ Gateway. A gateway connects to 2 different networks Gateway H323 provide the connection between a network H323 and a network that not be H323 (for example PSTN). For implementing the connection between two different networks, it need to convert protocol for the establishment and deletion of call, converting information and means through gateway. No need to use gateway to the telecommunication between 2 terminal in H323 network. Gateway includes both hardware and software aiming to implement the duty of code and decode the talk signal and sign to packing talk signal into IP pack, to establish calls through gatekeeper, to treat connections with PSTN network.
Main functions of gateway:
- Communication PSTN: Provide many different protocols, to meet reference connection of PSTN network.
- Communication ID: Provide communications with ID network through partial network and wide network.
- Treating talk signal, including function of pressing talk signal under many standards such as G729, G729A, and then to pack signal into ID pack.
- Sign: Only use sign R2, SS7 to connect PSTN and IP gateway.
- Destroying echo: Provide the function of destroying echo under standard G165 of ITU.
+ Gatekeeper: Implementing the control and fixing line of call, being a very important part of telephone network ID, it play a role to supervise and control all activities of network. Gatekeeper must have ability to provide almost function of PSTN. In additional it must also have high rank functions on the basis of characteristics of network IP functions.
- Converting address: Is ability to move digital of public telephone network and separate network to addresses ID and verso versa.
- Managing network: Managing, supervising the state and parameters of network compositions. To allow executive person setting forth adjustments on time.
- Controlling the access: No permit machines that no register to use natural sources of system.
- Treating the call: Treating the calls between subscribers and gateways such as establishing, connecting, deleting connecting, notifying to move the call.
- Data base: Archives information of each subscriber and system configuration...
- Mobil describer: To allow describers to connect and use network from any point on the network IP.
- Counting fee: From data base to count fee for each subscriber at the same time to provide information on fee and service to customer.
+ Elements of control and treatment of polypoint MCU. A MCU will provide the capacity to keep multi media dialogue with many components participated in. MCU is the combination 2 basic system components to permit polypoint telecommunication. Multi control (MC) and treating multi point (MP).
- MC provides the control of means flow such as dialogue code and establishing sessions of transmitting signal real time, multi-transmitting and feedback, single transmitting-feedback through signal H245 when a connection (terminal or gateway) taking part in dialogue, it must establish a connection H245 with MC.
- MD sending and receiving protocol flow (for example: talk form in PTD Parks) arrive and members who take part in the dialogue. MD may convert protocol between different identifications, with ability to combine different protocols (such as mixing sound from many sources).
+ H323 proxy: To be established to the protocol H323. H323 proxy also act as other normal proxy, it’s often placed at firewall and management as well as display all words called H323 between partial network and Internet Proxy also ensure that only have connect void H323 it will be passed firewall. Proxy operate at layer of application and control package between 2 telecommunication applications. Proxy may determine the aim of call and implement the connection requested. Proxy may manage less with RSVP. Proxy H323 must satisfy requirements of gateway H323 and point out interfaces with the functions that have the presence of gateway.
4.2.3. H323 structure:
This structure uses UDP unreliable telecom protocol like audio, video and registering packs. Reliable protocol but later than TCD is used to data and control packs in call signal, protocol T120 uses to conference data.
Data
Signal and control the call
Audio and video
Registration
Call signal
Control under standard H245
RTP/RTCP
H225 RAS
TCP
UDP
Layer of ID network
Combined data
Layer of physic
Figure 10: Structure of protocol H323
4.2.6. Signal and control system in H323:
H323 provides 3 main protocols on control. Call signal H225/Q931, H325/RAS and control protocol. H225/Q931 is used to sign a call H225/RAS uses to establish the call from place that sends to the host received after the call is established, H245 will be used to read just protocol flow.
+ H225/RAS implementing functions like registration, to allow service, change the wide degree of frequency band, notifying activity state between terminals and gatekeeper, using protocol of unreliable transfer UDP.
In LAN network without gatekeeper, RAS signal channel, not exist RAS signal channel. In LAN network, with gatekeeper, RAS signal channel will be established between a terminal and gatekeeper.
+ Call signal H225, this channel uses to bring control newscast H225, using to establish the connection between 2 terminals H323. This signal channel is independent to RAS signal channel and H245 control channel. In a system without gatekeeper, call signal channel is established between 2 terminals taking part in the call. And in system with gatekeeper, call signal channel will be established between terminals and gatekeeper or between 2 terminal of each other. The selection of project on establishing signal channel is depend on the decision of gatekeeper.
+ Control on protocol H245. After establishing the call, system of using H245 protocol control measure, to exchange the capacity, on, off logic channel, requesting on priority regime, controlling flow, ordering and instructing.
This protocol is also used to implement the function like to decide masta/slave which aims to avoid the conflict occurred when 2 terminals implement simultaneously the same things but only have a thing permits taking place at a timing date
4.2.5. Establishing the call in H323
Figure 11: Treating the call in H323
1. Connecting end H323 registering with gatekeeper.
2. When the user picking up the receiver and dial need to contact, this requirement will be sent to gatekeeper through RAS newscast.
3. If accepted, the call, the gatekeeper will answer IP address of describer/caller and send information band requested to the call.
4. Starting the call by sending newscast that establishes the call to cable side through newscast H225.
5. Describer/cable receives information with the call come, the terminal H323 (telephone) ring a bell.
6. The two parties discuss owing to the capacity of control channel H245, which aims to ensure the signs transmitted are the signs but the terminal received will have capacity to solve.
7.8. RSVP requirement will be sent to cable, later on RTP is opened between 2 sides, the call is established.
In figure 11, two terminals (H.323 end points) need to establish a VoIP call between them, and different gatekeepers control the two terminals. As a first step, the calling terminal requests permission from its gatekeeper to establish the call. This is done with the Admission Request (ARQ) message. The terminal indicates the type of call in question (two-partyor multi-party), the endpoint’s own identifier, a call identifier (a unique string), a call reference value (an integer value also used in call signaling messages for the same call), and information regarding the other party or parties to participate in the call. The information regarding other parties to the call includes one or more aliases and/or signaling addresses. One of the most important mandatory parametes in the ARQ is the bandwirth parameter. This specifies the mount of bandwidth required in units of 100 bps.
Note that the endpoint should request the total media stream bandwidth needed, excluding overhead. Thus, if a two-party call is needed, with each party sending voice at 64 Kbps, then the bandwidth rerquired is 128 Kbps, and the valuecarried in the bandwidth parameter is 1280. The purposeof the bandwidth parameter is to enable the gatekeeper to reserve resources for the call.
The gate keeper indicates a successful admission by responding to the endpoint with an Admission Confirm(ACF) message. This includes many of the same parameters that are included in the ARQ. The difference is that when a given parameter is used in the ARQ, it is simply a request from the endpoint, whereas a given parameter value in the ACF is a firm order from the gatekeeper. For example, the ACF includes the bandwidth parameter, which may be a lower value than that requested in the ARQ, in which case the endpoint must stay within the bandwidth limitations imposed by the gatekeeper.
Another parameter of particular interest in both the ARQ and the ACF is the call Model parameter, which is optional in the ARQ and mandatory in the ACF. In the ARQ call Model indicates whether the endpoint wants to send call signaling directly to the other party, or prefers that call signaling be passed via in the gatekeeper.In the ACF, it represents the gatekeeper’s decision as to whether call signaling is to pass via the gatekeeper or directly between the terminals. In the example of figures 11, the calling gatekeeper has choosen not to be in the path of tha call signaling.
The Setup message is the first call signaling message sent from one-terminal to the other to establish the call. The message must contain the Q.931 Protocol Discriminator, a Call Reference Setup, a Bearer Capability , and the User-User information element. Although the Bearer Capability information element is mandatory, the concept of a bearer, as used in the circuit switched world , does not map very well to an IP network. For example, no B-channel exists in IP and the actual agreement between endpoints regarding the bandwidth requirements is done as part of H.245 signaling, where RTP information such as the payload type is exchanged. Consequently, many of the fields in the Bearer Capability information element, as defined in Q.931, are not, used in H.225.0. Of those fiejds that are used in H.225.0, many are used only when the call has originated from outside the H.323 network and has been received at a gateway, where the gateway performs a mapping from the signaling received to the appropriate H.225.0 messages.
A nember of parameters are include within the mandatory. User-to-User information element. Those include the call identifier, the call type, a conference identifier, and information about the originating endpoint. Among the optional parameters, we may find a source alias, a destination alias, an H.225.0 address. The User-to-User information element is included in all H.225.0 call signaling messages. It is the inclusion of this information element that enables Q.931 messages, originallydesigned for ISDN, to be adapted for use with H.323.
The Call Proceeding message may optionally be sent by the recipient of a Seup message to indicate that the Setup message has been received and that call establishment procedures are underway. When sent, it ususlly precedes the Alerting message, which indicates that the called device is “ringing” Strietly speaking, the Alerting message is optional .
In addition to Call Proceeding and Alert, we may also find the optional Progress message(not shown). Ultimately, when the called party answersthe called terminal returns a connect message. Although some of the message from the called party to the calling party, such as Call Proceeding and Alerting, are optional, the connect message must be sent if the call is to be completed. The User-to -User information elementcontains the same set of parameters as defined for the Call Proceeding, Progress, and Alert message , with the addition of the Conference Identifier. These parameters are also used in a Setup message and their use in the Connect message is to correlate this conference with that indicated in a Setup. Any H.245 address sent in a Connect message should match that sent in any earlier. Call Proceeding, Alerting, or Progress message. In fact, the called terminal must include at least an H.245 signaling address to which H.245 message must be sent because H.245 message are used to establish the media (that is voice) flow between the parties.
In the example of figure 11, H.245 message exchange begins after the Connect message is returned. This message exchange could. In fact, occur earlier than the Connect message. It is important to note that H.245 is not responsible for carrying the actual media. For example, there is no such thing as an H.245 packet containing asample of coded voice. That is the fob of RTP. Instead, H.245 is a control protocol that message the establishment and release of media sessions. H.245 does this through messaging that enables the establiment of logical channels, where a logical channel is a unidirectional RTP stream from one party to the other.
A logical channel is opened by sending an Open Logical Channel (OLC) request message. This message contains a mandatory parameter called forward Logical Channel Parameters, which relates to the media to be sent in the forward drection, that is, from the endpointissuing this command. It contains information such as the type of data to be sent, an RTP session ID, an RTP payload type, and an indication as to whether silince suppression is to be used. If the recipient of the message wants to accept the media to be sent, then it will return an Open Logical Channel Ack message containing the same logical channel number as received in the request and a transport address to which the media stream should be sent.
Strictly speaking, a logical channel is unidirectional. Therefore, in order to establish a two-way conversation, two logical channel must be opened-one in each direction. According to the description just presented, this requires four messages, which is rather cumbersome. Consequently, H323 defines a bidirectional logical channel. This is means of establishing two logical channel, one in each direction, in a slightly more efficient manner. Basically, a bidirectional logical channel really means two logical channels that are associated with each other. The establishment of these two channels can be achieved with just three H.245 message rather than four. In order to do so, the initial OLC message not only contains information regarding the media that the calling endpoint wants to send, but it also contains reverse logical channel parameters . These indicate the type of media that the endpoint is willing to receive and to where that media should be sent.
Upon receipt of the request, the far endpoint may send an Opne Logical Channel Ack message containing the same logical channel number for the forward logical chanel, a logical channel number for the reverse logical channel, and descriptions related to the media formats that it iswilling to send. These media formats should be chosen from the options originallyreceived in the request, thereby ensuring that the called and will only send media that the calling end supports.
Upon receipt of the Open Logical Channel Ack, the originating endppoint responds with an Open Logical Channel Confirm message to indicate that all is well.RTP stream and RTCP message can now flow in each direction
5. The Session Initiation Protocol (SIP)
The Session Initiation protocol (SIP) is considered by many to be a powerful alternative to H.323. It is considered to be a more flexible solution, simpler than H.323, easier to implement, better suited to the support of intelligent user devices, and better suited to the implementation of advanced features. Although H.323 may still have a larger installed base than SIP, most people in the VoIP community believe that the future of VoIP revolves around SIP. In fact 3GPP has endorsed SIP as the session management protocol of choice for 3GPP. Release 5 albeit with some enhancements.
Like H.323, SIP is simply a signaling protocol and does not earry the voicce packets itself. Rather, it makes use of the services of RTP for the transport of the voice packets (the media stream).
5.1 The SIP Network Architecture
SIP defines two basic classes of network entities- clients and servers. Stricetly speaking, a client, also known as a user agent client, is an application program that sends SIP requests. A server is an entity that responds to those requests. Thus, SIP is a client-server protocol. VoIP calls using SIP are originated by a clien t and terminaled at a servers. A client may be found within a user’s device, which could be, for example, a SIP phone. Clients may also be found within the same platfoem as a server. For example, SIP enables the use of proxies, which act as both clients and servers.
Four different types of servers are available- proxy servers, redirect servers, user agent srevers, and registrars. Proxy server acts similarly to a proxy server used for Web access from a corporate local area network (LAN). Clients send requests to the proxy, which either handles those requests itself or forwards them on to ether servers, perhaps after performing some translation. To those other servers, it appears as though the message is coming from the proxy rather than some entity hiden behind it. Given that a proxy both receives requests and sends requests, it incorporates both server and client functionality. Figure 12 shows an example of the operation of a proxy servers . It does not take much imagination to realize how this type of functionality can be used for call forwarding/ follow-me services.
A redirect server is a srev._.t and an entry containing 'c' is added to the dictionary. If 'i' is not zero, the concatenation of the dictionary entry at position 'I' and the character 'c' is made. This new string is added to both the decompression output and the dictionary.
4.2.2 Huffman coding
For a given set of possible values and the frequency with which each value occurs, the Huffman algorithm determines a way to encode each value binary. More important, it does this in such a way that an optimal encoding is created. In this section, I will only present the algorithm itself. Proof that it indeed constructs an optimal encoding can be found in which is also the source of the information in this section.
The Huffman algorithm produces a binary string for each value that is to be encoded and these strings can have an arbitrary length. However, the decoding process has to know when a certain binary string has to be replaced by the appropriate value. To be able to do this, the strings must have the following property: none of the binary strings can be a prefix to another string.
Binary strings with this prefix property can be represented by a binary tree in which the branches themselves contain the labels zero and one. The strings which are created by traversing the tree from the root to the leafs are all strings with the prefix property.
The Huffman algorithm constructs such a tree in which the leafs are marked with the values that need to be encoded. To find out by which binary code a value has to be replaced, you only need to follow the path from the root to the leaf containing that value.
The decompression routine is also easy. The algorithm sets the current position at the root node and starts reading bits. For each bit the appropriate branch is followed. At intermediate nodes the same thing is done, but when a leaf has been reached, the value at that leaf can be output and the current position is reset to the root node. The algorithm then repeats itself.
As you can see, once the tree has been constructed, the algorithm itself is fairly easy. To construct the tree, the algorithm starts with a number of separate nodes, one for each value that needs to be encoded. With each node, the frequency of occurrence of the corresponding value is also associated. For example, if we have got a file in which only the five characters a, b, c, d and e occur with certain frequencies.
Next, the algorithm looks for the two nodes with the smallest associated frequencies. These two nodes are removed from the list of nodes. A new node is then added to the list and the two removed nodes are its children. The new node's associated frequency is the sum of the frequencies of its children.
In this new list of nodes, the algorithm starts it search again for the two nodes with the smallest associated frequencies and the previous step is repeated. Ill the example. When there is only one node left in the list of nodes, this node is the root node of the tree and the algorithm stops. .
4.3 Waveform coding
Waveform coding tries to encode the waveform itself ill an efficient way. The signal is stored ill such a way that upon decoding, the resulting signal will have the same general shape as the original. Waveform coding techniques apply to audio signals ill general and not just to speech as they try to encode every aspect of the signal.
The simplest form of waveform coding is PCM encoding the signal. But a signal can be processed further to reduce the amount of storage needed for the waveform. In general, such techniques are lossy: the decoded data can differ from the original data. Waveform coding techniques usually offer good quality speech requiring a bandwidth of 16 kbps or more.
4.3.1 Differential coding
Differential coding tries to exploit the fact that with audio signals the value of one sample can be somewhat predicted by the values of the previous samples. Given a number of samples, the algorithms in this section will calculate a prediction of the next sampled value. They will then only store the difference between this predicted value and the actual value. This difference is usually not very large and can therefore be stored with fewer bits than the actual sampled value, resulting in compression. Because of the use of a predicted value, differential coding is also referred to as predictive coding
4.3.1.1 Differential PCM (DPCM)
Differential PCM merely calculates the difference between the predicted and actual values of a PCM signal and uses a fixed number of bits to store this difference. The number of bits used to store this difference determines the maximum slope that the signal can have if errors are to be avoided. If this slope is exceeded, the value of a sample can only be approximated, introducing an amount of error.
In the applications I developed, I have tested a DPCM compression scheme. It used uniformly quantised PCM data as input and produced DPCM Output. The predicted value I used was simply the value of the previous sample. Personally, I found that using five bits to store the difference still produced very good speech quality upon decompression. Even with only four bits, tile results were quite acceptable.
4.3.1.2 Adaptive DPCM (ADPCM)
An extension to DPCM is adaptive DPCM. With this encoding method, there are still a fixed number of bits used to store the difference. In contrast to the previous technique which simply used all of those bits to store the difference, ADPCM uses some of the bits to encode a quantisation level. This way, the resolution pf the difference can be adjusted.
4.3.1.3 Delta modulation (DM)
Delta modulation can be seen as a very simple form of DPCM. With this method, only one bit is used to encode the difference. One value then indicates an increase of the predicted value with a certain amount, the other indicates a decrease.
A variant of this scheme is called adaptive delta modulation (ADM). Here, the step size used to increase or decrease the predicted value can be adapted. This way, the original signal can be approximated more closely.
4.3.2 Vector quantisation
With vector quantisation, the input is divided into equally sized pieces which are called vectors. Essential to this type of encoding is the presence of a 'codebook', an array of vectors. For each vector of the input, the closest match to a vector in the codebook is looked up. The index of this codebook entry is then used to encode the input vector.
It is important to note that this principle can be applied to a wide variety of data, not only to PCM data. For example, vector quantisation could be used to store an approximation of the error term of other compression techniques.
4.3.3 Transform coding
When we are considering PCM data, we are in fact looking at a signal in tile time domain. With transform coding, the signal is transformed to its representation in another domain in which it can be compressed better than in its original form. When the signal is decompressed, all inverse transformation is applied to restore an approximation of the original signal.
One of the domains to which a signal could be transformed is the frequency domain. Using information about human vocal and auditory systems, a compression algorithm can decide which frequency components are most important. Those components can then be encoded with more precision than others. Examples of transformation schemes which are, used for this purpose are the Discrete' Fourier Transform (DFT) and the Discrete Cosine Transform (DCT).
Personally, I have experimented with transform coding using a wavelet transformation. A wavelet decomposition can be used to write a signal as a linear combination of certain wavelet basis functions. The coefficients used in the linear combination then form the wavelet representation of the signal. Using these coefficients for a certain wavelet basis, the original signal can be reconstructed.
A complete explanation of wavelets falls beyond the scope of this thesis. However, the theory of wavelets is very interesting and if you would like more information about it, a good introduction can be found in . This reference is also the source on which I based the implementation of my compression scheme.
The wavelets which 1 used to transform the signal are called Haar wavelets. They form a wavelet basis. Using these wavelets, decomposition arid reconstruction can be done quite fast. They also possess a property called orthonormality which allows us to determine very easily which components of the transformed signal are most important.
In the scheme that 1 have tried, a uniformly quantised PCM signal was decomposed into its wavelet representation. Next, a number of coefficients with little importance were set zero and the other coefficients were quaiitised. The resulting sequence of coefficients were then run-length emcpded2 and finally, the data was compressed using Huffman coding.
For this last step, I first tried LZ78 compression, but this resulted in little extra compression. Sometimes the resulting code was even larger than before. With Huffman coding, the extra compression was much better. Typically, the code was compressed to sixty to eighty percent of its
Unfortunately, the results are riot very spectacular. If good speech quality is to be preserved, this scheme cannot achieve good compression ratios. In fact, I have had better compression results using simple DPCM compression. 1 believethat this is probably due to the type of wavelets I used.
4.4 Vocoding
Waveform coding methods simply try to model the waveform as closely as possible. But we can exploit the fact that we are using speech information to greatly reduce the required storage space. Vocoding techniques do this by encoding information about how the speech signal was produced by the human vocal system, rather than encoding the waveform itself.
The term vocoding is a combination of 'voice' and 'coding'. These techniques can produce intelligible communication at very low bit rates, usually below 4.8 kbps. However, the reproduced speech signal often sounds quite synthetic and the speaker is often not recognisable.
4.4.1 Speech production
To be able to understand how vocoding methods work, a brief explanation of speech production is required.
To produce speech, the lungs Pump air through the trachea. For sonic sounds, this stream of air is periodically interrupted by the vocal cords.
The resulting airflow travels through the socalled vocal tract. The vocal tract extends from the opening in the vocal cords to the mouth. A part of the stream travels through the nose cavity.
The vocal tract has certain resonance characteristics. These characteristics can be altered by varying the shape of the vocal tract, for example by moving the position of the tongue. These resonance characteristics transform the flow of air originating from the vocal cords to create a specific Sound.
The resonance frequencies are called formants.
Basically, there are three classes of speech sounds that can be produced. Other Sounds belong to a mixture of the classes. These are the classes:
ã Voiced sounds are created when the vocal cords vibrate open and closed. This way, periodic pulses of air come Out of the opening of the vocal cords. The rate at which the opening and closing occurs, determines the pitch of the sound.
ã To produce unvoiced sounds, the vocal cords do not Vibrate, they are held open. Air is then sent at high velocities through a constriction in the vocal tract, creating a noiselike turbulence.
ã Plosive sounds result from building up air pressure behind a closure in the vocal tract and then suddenly releasing this air.
An important fact is that the shape of the vocal tract and the type of excitation (the flow of air Coining out of the vocal cords) change relatively slowly. This means that for short time intervals, for example 20 ms, the speech production system can be considered to be almost stationary. Another important observation is that speech signals show a high degree of predictability. Sometimes due to the periodic signal created by the vocal cords and also due to the resonance characteristics of the vocal tract.
4.4.2 Vocoding basics
Instead of trying to encode the waveform itself, vocoding techniques try to determine parameters about how the speech signal was created and use these parameters to encode the signal, To reconstruct the signal, these parameters are fed into a model of the vocal system which outputs a speech signal.
Since the vocal tract and excitation signal change only relatively slowly, the signal that has to be analysed is split into several short pieces. Also, to make analysis somewhat easier, the assumption is made that a sound is either voiced or unvoiced.
A piece of the signal is then examined. If the signal is voiced, the pitch period is determined and accordingly the excitation signal is modelled as a series of periodic pulses. If the speech signal is unvoiced, the excitation will be modelled as noise.
Like we saw in the previous section, the vocal tract has certain resonance characteristics which alter the excitation signal. In vocoders the effect of the vocal tract is recreated through the use of a linear filter.
Perhaps it is not entirely clear what a linear filter is. A filter is any system that takes a signal ft.0 as its input and produces a signal, g(x) as its output. The output of a filter is also referred to as the response of the filter to a certain Input signal. The filter is called a linear filter when scaling and superposition at the input results in scaling and superposition at the output.
A vocoding method will use a specific type of linear filter. The filter will contain certain parameters which have to be determined by the vocoder. This is so because the characteristics of the vocal tract change over time and the coder has to be able to model each state of the vocal tract approximately. Remember that the state of the vocal tract changes only relatively slowly, so for each piece of the input signal, the vocal tract can be considered to have fixed characteristics.
Due to this simple speech production model, speech can be encoded in a very compact way. On the other hand, this simple model is also the cause of the unnatural sounding speech which vocoders often produce.
Several types of vocoders exist, the oldest one being around since even 1939. They all use this simple representation of the speech production system. The main difference between the methods is the vocal tract model used. Below, I will only give a description of the Linear Predictive Coder (LPC) since this vocoder is often discussed in literature about VoIP.
4.4.3 Linear Predictive Coding (LPC)
The LPC coder uses the simple model described above. The excitation signal is considered either to be a periodic signal for voiced speech, or noise for unvoiced speech.
The vocal tract model which the LPC method uses, is an approximation of a series of concatenated acoustic tubes as figure 4.6 illustrates.
The LPC vocoder examines its input and estimates the parameters to use in the vocal tract filter. It then applies the inverse of this filter to the signal. The result of this is called the residue or residual signal and it basically describes which excitation signal should be used to model the speech signal as closely as possible. From this residual signal, it is relatively easy to determine if the signal is voiced or unvoiced and if necessary, to determine the pitch period.
To determine the parameters for the filter, the LPC algorithm basically determines the formants of the signal. This problem is solved through a difference equation which describes each sample being a linear combination of the previous ones. Such an equation is called a linear predictor, hence the name of the coder.
The LPC method can produce intelligible speech at 2.4 kbps. The speech does sound quite synthetic however, like with most vocoding techniques.
4.5 Hybrid coding
Waveform coders in general do not perform well at data rates below 16 kbps. Vocoders on the other hand, can produce very low data rates while still allowing intelligible speech. However, the person producing the speech signal often cannot be recognised and the algorithms usually have problems with background noise.
Hybrid coders try to exploit the advantages of both techniques: they encode speech in such a way that results in a low data rate while keeping the speech intelligible and the speaker recognisable. Typical bandwidth requirements lie between 4.8 and 16 kbps.
The hybrid coders that will be discussed in this section are RELP, CELP, MPE and RPE coders. Here, only a brief description is given. A more detailed one can be found in which is the main source for the information in this section.
The basic problem with vocoders is their simplistic representation of the excitation signal: the signal is considered to be either voiced or unvoiced. It is this representation that causes the synthetic sound of these coders. The coders discussed below try to improve the representation of the excitation signal, each in their owe way.
4.5.1 Residual Excited Linear Prediction (RELP)
The RELP coder works in almost the same way as the LPC coder. To analyse the signal, the parameters for the vocal tract filter are determined and the inverse of the resulting filter is applied to the signal. This gives us the residual signal.
The LPC coder then checked if the signal was voiced or unvoiced and used this to model an excitation signal. In the RELP coder however, the residual is not analysed any further, but will be used directly as the excitation for speech synthesis. The residual is compressed using waveform coding techniques to lower the bandwidth requirements. RELP coders can allow good speech quality at bit rates in the region of 9.6 kbps.
4.5.2 Codebook Excited Linear Prediction (CELP)
The CELP coder tries to overcome the synthetic sound of vocoders by allowing a wide variety of excitation signals, which are all captured in the CELP codebook. To determine which excitation signal to use, the coder performs an exhaustive search. For each entry in the codebook, the resulting speech signal is synthesised and the entry which created the smallest error is then chosen. The excitation signal is then encoded by the index of the corresponding entry. So basically, the coder uses Vector Quantisation to encode the excitation signal.
This technique is called an analysisbysynthesis (AbS) technique because it analyses it signal by synthesising several possibilities and choosing the one which caused the least amount of error.
This exhaustive search is computationally very expensive. However, fast algorithms have been developed to be able to perform the search in realtime. CELP techniques allow bit rates of even 4.8 kbps.
4.5.3 Multipulse and Regular Pulse Excited coding (MPE and RPE)
Like the previous method, NIPE and RPE techniques try to improve the speech quality by giving a better representation of the excitation signal. With MPE, the excitation signal is modelled as a series of pulses, each with its own amplitude. The positions and amplitudes of the pulses are determined by an AbS procedure. The MPE method can produce high quality speech at rates around 9.6 kbps.
The RPE technique works in a similar fashion, only here the pulses are regularly spaced, as tile name suggests. The GSM mobile telephone system uses a RPE variant which operates at approximately kbps.
4.6 Other compression techniques
The compression principles discussed above cover pretty much the whole speech compression domain. Due to this fact I was unable to find much information about compression techniques which do not fall into the categories of either waveform coding, vocoding of' hybrid coding
But there is one technique which I find worth mentioning here, namely the use of artificial neural networks for speech compression. At this moment, there is not much information to be found about this particular use of neural networks, but there are documents which describe how neural networks can be used for lossy image compression. It is possible that similar techniques, can be used for the compression of speech.
To do this, there are several ways in which artificial neural networks can be used. A neural net could be trained to predict the next sample, give a number of previous samples. This way, the network could perform the predictive function in differential coding schemes. If this prediction is done more accurately than regular predictive techniques this would result ]it better compression.
Another possible application is to use the neural network in a similar way as a vector quantiser. The network could be trained to map a number of inputs to a specific output. Then, either using a table lookup or another neural network, this number could be used to retrieve an appropriate waveform.
Perhaps a neural network could also be used to perform a speech analysis function which in turn could be used together with some vocoding or hybrid coding technique.
I realise that there is a lot of speculation in this section and unfortunately I did riot have the time to conduct experiments using these techniques. However, I strongly believe that neural networks have great potential in a wide variety of applications, including speech compression.
4.7 Delay by compression
Like we saw in the previous chapter, to be able to preserve good communication quality, the overall delay has to be kept as low as possible. This means that we have to take the delay caused by compression and decompression into account: even if we are able to compress the signal in an excellent way, It has little use for realtime communication if it introduces all Unacceptable ain0Ltilt of delay.
Delays during the compression stage can generally be divided into two categories. First of all, there is always some delay due to the calculations which need to be done. This amount of delay depends much on the capabilities of the system performing the compression.
Some compression techniques introduce a second type of delay: to compress a part of the speech signal, they need a portion of the signal which follows the part being handled. The amount of 'lookahead' needed determines the amount of delay introduced. For a specific algorithm this delay is fixed and does not vary among systems.
Decompressing the signal can usually be done much faster than compressing it. Of the compression schemes discussed in this chapter, transform coding probably introduces the most delay during decompression since, like during the compression stage, the signal has to undergo a transformation.
With computers becoming ever faster and specialised hardware becoming available, the fixed delay during the compression stage is probably the most 1111portant to consider.
4.8 Voice compression standards
To make interoperability between applications possible, it is important that standards are established. The most widely known standards in the VolP domain, are the G. standards of the ITUT3. Other well known standards are the ETS14 GSM standards. Here is a list of some standards:
Standard
Description
Bit rate
MOS
G.711
Pulse Code Modulation using eight bits per sample, sampling at 8000 Hz
64 kbps
4.3
G.723.1
Dual rate speech coder designed with low bit rate video telephony in mind [41]. The G.723.1 coder need a 7.5 ms lookahead and used one of these coding schemes:
- Multipulse Maximum Likelihood Quantisation (MP-MLQ)
- Algebraic CELP (ACELP)
6.3 and 5.3 kbps respectvely
4.1
G.726
Coder using ADPCM. Contains obsolete standards G.724.32 and G.723
16,24,32 and 40 kbps
2-4.3
G.727
Five, four, three and two bits per sample embedded ADPCM. The encoding allows bit reductions at any poit in the network without the need for coordination between sender and receiver [10]
16,24,32 and 40 kbps
2-4.3
G.728
Low Delay CELP (LD-CELP)
16kbps
4.1
G.729
Conjugate Structure ACELP (CS-ACLP)
- Annex A: Reduced complexity algorithm
- Annex D: Low rate extension
- Annex E: High rate extension
8 kbps (CSACELP), 8 kbps (Annex A), 6.4 kbps (Annex D) and 11.8 kbps (Annex E)
4.1 (CSACELP) and 3.7 (Annex A)
GSM 0,6.10
Full rate speech transcoding using Regular Pulse Excitation Long Term Prediction (RPE-LTP)
13 kbps
3.71
GSM 06.20
Half rate speech transcoding using Vector Sum Excited Linear Prediction (VSELP)
5.6 lbps
3.85
GSM 06.60
Enhanced full rate speech transcoding using ACELP
12.2 kbps
4.43
Some remarks have to be made at this point. First of all, unfortunately 1 was not able to find MOS information about some coders. Second, the Mean Opinion Scores are rather subjective and it is probably due to this fact that the MOS values often differ according to different sources. Sometimes these differences are even quite large. For example, in [40] it was mentioned that G.729 annex A had a MOS of 3.4 while 1321 claimed that it was 4.0. In this particular case 1 chose to make a compromise and took the value of 3.7
4.9 Summary
For telephone quality communication using digitised speech, a bandwidth of 64 kbps is needed if the speech data is left uncompressed. But speech data can often be greatly compressed and this can reduce the amount of required bandwidth.
Some compression schemes do not take the nature of the data into account. Such techniques offer some compression, but usually they do not result in high compression ratios. However, they can be used to further reduce the amount of storage needed when another compression technique has already compressed the voice information.
Waveform coding techniques assume that the data is an audio signal, but in general they do riot exploit the fact that the signal contains only speech data. They just try to model the waveform as closely as possible. This results in good speech quality at relatively high data rates (16 kbps or above).
Vocoders do exploit the fact that the data is in fact digitised speech. They do not encode the waveform itself, but an approximation of how it was produced by the human vocal system. such techniques, allow very high compression ratios while still providing intelligible communication (at rates of 4.8 kbps or below). However, the 7reproduced speech usually sounds quite synthetic.
A combination of waveform coding and vocoding techniques is used in hybrid coding schemes. They still] rely oil a speech production model, but they are able to reproduce the original signal much more closely through the application of waveform coding techniques. These methods call give good speech quality at medium data rates (between 4.8 and 16 kbps).
Compressing and decompressing speech data introduces a certain amount of delay into the communication. Because computers are becoming ever faster and because specialised hardware is becoming, available the amount of lookahead that a compression scheme requires is probably the most important delay component.
To be able to provide interoperability between different applications, it is important that standards are established. Well known compression standards in the VolP world include the ITUT's G. series standards and the ETSI's GSM standards.
Index
Glossary
Generel of the thesis
Chapter I: Voice over Internet Protocol (VoIP) Technology
Fundamental of channel switching network and Internet
Fundamental features of channel switching network
Fundamental features of Internet
Advantages of VoIP against PSTN
Outlook of VoIP technology
+ Some technical features of IP telephone
Terminal equipment and gateway
Tranmission equipment
+ Special feature of VoIP
Adjustable quality
Security
User interface
Connecting telephone and computer
1.5. Conclusion
Problems relating to VoIP technology and talk quality on VoIP
Coding techniques and talk signal compression
Voice Activity Detector (VAD)
Number and address
+ Numbering on SCN network
+ Numbering on IP
Fee
Signal cooperation
Confidence
Troubles relating to calls quality
+ Delay
+ Echo suppression
+ Jitter changeable delay
+ Package loss
+ Bandwidth
Transfer modes
3.1 Real Time Mode
Real Time Post
Real Time Control Mode
RSVP
Conlusion
Introduction of standards
4.1 Introduction of standards
4.2 Standard H323
4.2.1. Introduction in H323
4.2.2. H323 Elements
+ Main functions of gateway
4.2.3. H323 Structure
4.2.4. Signal and control system in H323
4.2.5 Establishing the call in H323
The Session Initiation Protocol (SIP)
The SIP Network Architecture
SIP Call Establishment
Information in SIP Messages
The Resource Reservation Protocol (RRP)
Chapter II: Voice Communication
2.1 . Grabbing and reconstruction
2.1.1. Sampling and quantisation
2.1.2. Reconstruction
2.1.3 Mixing audio siganals
2.2. Communication requirements
2.2.1. Error tolerance
2.2.2. Delay requirements
2.2.3. Tolerance for jitter
2.3. Communication patterns
2.4. Impact on VoIP
2.4.1. Sampling rate and quantisation
2.4.2. Packet length
2.4.3. Buffering
2.4.4. Delay
2.5.5. Silence suppression
2.5. Summary
Chapter III: Voice Communication
Quick Concept
1.1 How traditional long distance works
1.2 How long distance works with VoIP
2 Overview
Chapter IV. Compression Techniques
4.1.Preliminaries
4.2.General compression techniques
4.2.1. Lempel-Ziv compression
4.2.2 .Huffman coding
4.3. Waveform coding
4.3.1. Differential coding
4.3.1.1 Differential PCM (DPCM)
4.3.1.2 Adaptive DPCM (ADPCM)
4.3.1.3 Delta modulation (DM)
4.3.2 Vector quantisation
4.3.3 Transform coding
4.4 Vocoding
4.4.1 Speech production
4.4.2 Vocoding basics
4.4.3 Linear Predictive Coding (LPC)
4.5 Hybrid coding
4.5.1 Residual Excited Linear Prediction (RELP)
4.5.2 Codebook Exciter Linear Prediction (CELP)
4.5.3 Multipulse and Regular Pulse Excited coding (MPE and RPE)
4.6 Other compression techniques
4.7 Dalay by compression
4.8 Voice compression standards
4.9 Summary
._.
Các file đính kèm theo tài liệu này:
- V0122.doc