Gsm codec rfc
Note: Reception of any unspecified parameter MUST be ignored by the receiver to ensure that additional parameters can be added in the future. Type name: audio Subtype name: GSM-HR Required parameters: none Optional parameters: max-red: The maximum duration in milliseconds that elapses between the primary first transmission of a frame and any redundant transmission that the sender will use. This parameter allows a receiver to have a bounded delay when redundancy is used.
Allowed values are integers between 0 no redundancy will be used and If the parameter is omitted, no limitation on the use of redundancy is present. Encoding considerations: This media type is framed and binary; see Section 4.
Interoperability considerations: The media subtype name contains "" to avoid potential conflict with any earlier drafts of GSM-HR RTP payload types that aren't bit-compatible. Transport within other framing protocols is not defined at this time. The "maxptime" parameter MUST be handled in the same way. For sendonly or sendrecv unicast media streams, the parameter declares the limitation on redundancy that the stream sender will use. For recvonly streams, it indicates the desired value for the stream sent to the receiver.
This information is likely to simplify the media stream handling in the receiver. This is especially true if no redundancy will be used, in which case "max-red" is set to 0. In this case, it means that the receiver MUST be prepared to allocate buffer memory for the given redundancy. More than one configuration may be provided if necessary by declaring multiple RTP payload types; however, the number of types should be kept small.
The number of frames encapsulated in each RTP payload highly influences the overall bandwidth of the RTP stream due to header overhead constraints. Packetizing more frames in each RTP payload can reduce the number of packets sent and hence the header overhead, at the expense of increased delay and reduced error robustness.
If forward error correction FEC is used, the amount of FEC-induced redundancy needs to be regulated such that the use of FEC itself does not cause a congestion problem.
The main security considerations for the RTP packet carrying the RTP payload format defined within this memo are confidentiality, integrity, and source authenticity. Confidentiality is achieved by encryption of the RTP payload, and integrity of the RTP packets through a suitable cryptographic integrity protection mechanism.
A cryptographic system may also allow the authentication of the source of the payload. A suitable security mechanism for this RTP payload format should provide confidentiality, integrity protection, and at least source authentication capable of determining whether or not an RTP packet is from a member of the RTP session.
It is dependent on the application, the transport, and the signaling protocol employed. This RTP payload format and its media decoder do not exhibit any significant non-uniformity in the receiver-side computational complexity for packet processing, and thus are unlikely to pose a denial-of-service threat due to the receipt of pathological data; nor does the RTP payload format contain any active content.
Many thanks also go to Tomas Frankkila for useful input and comments. References China EMail: duanxiaodong chinamobile. China EMail: wangshuaiyu chinamobile. Standards Track [Page 18]. By using this website, you agree to the use of cookies. For further information see our cookie policy page.
Learn more. What Is the Difference Between G. Changes from RFC Normative References Informative References See Section 10 for the changes made to this format in relation to RFC The payload format supports transmission of multiple channels, multiple frames per payload, the use of fast codec mode adaptation, robustness against packet loss and bit errors, and interoperation with existing AMR and AMR-WB transport formats on non-IP networks, as described in Section 3.
The payload format itself is specified in Section 4. In particular, in an N-channel session, a frame- block will contain N speech frames, one from each of the channels, and all N speech frames represents exactly the same time period.
The byte order used in this document is network byte order, i. The bit order is also the most significant bit first. This is presented in all figures as having the most significant bit leftmost on a line and with the lowest number. Some bit fields may wrap over multiple lines in which cases the bits on the first line are more significant than the bits on the next line.
Due to their flexibility and robustness, they are also suitable for other real-time speech communication services over packet-switched networks such as the Internet. Because of the flexibility of these codecs, the behavior in a particular application is controlled by several parameters that select options or specify the acceptable values for a variable. These options and variables are described in general terms at appropriate points in the text of this specification as parameters to be established through out-of-band means.
The method used to signal these parameters at session setup or to arrange prior agreement of the participants is beyond the scope of this document; however, Section 8. The AMR codec is a multi-mode codec that supports eight narrow band speech encoding modes with bit rates between 4.
The sampling frequency used in AMR is Hz and the speech encoding is performed on 20 ms speech frames. Therefore, each encoded AMR speech frame represents samples of the original speech. Particularly, the 6. AMR-WB supports nine wide band speech coding modes with respective bit rates ranging from 6. For example, in GSM it is possible to dynamically adjust the speech encoding rate during a session so as to continuously adapt to the varying transmission conditions by dividing the fixed overall bandwidth between speech data and error protective coding.
This enables the best possible trade-off between speech compression rate and error tolerance. To perform mode adaptation, the decoder speech receiver needs to signal the encoder speech sender the new mode it prefers.
Since in most sessions speech is sent in both directions between the two ends, the mode requests from the decoder at one end to the encoder at the other end are piggy-backed over the speech frames in the reverse direction. In other words, there is no out-of-band signaling needed for sending CMRs.
Every AMR or AMR-WB codec implementation is required to support all the respective speech coding modes defined by the codec and must be able to handle mode switching to any of the modes at any time.
However, some transport systems may impose limitations in the number of modes supported and how often the mode can change due to bandwidth Sjoberg, et al. For this reason, the decoder is allowed to indicate its acceptance of a particular mode or a subset of the defined modes for the session using out-of-band means.
For example, the GSM radio link can only use a subset of at most four different modes in a given session. Moreover, for better interoperability with GSM through a gateway, the decoder is allowed to use out-of-band means to set the minimum number of frames between two mode changes and to limit the mode change among neighboring modes only.
Section 8 specifies a set of media type parameters that may be used to signal these mode adaptation controls at session setup. Hence, the codecs have the option to reduce the number of transmitted bits and packets during silence periods to a minimum. The operation of sending CN parameters at regular intervals during silence periods is usually called discontinuous transmission DTX or source controlled rate SCR operation.
Support for Multi-Channel Session Both the RTP payload format and the storage format defined in this document support multi-channel audio content e. Although AMR and AMR-WB codecs themselves do not support encoding of multi-channel audio content into a single bit stream, they can be used to separately encode and decode each of the individual channels.
To transport or store the separately encoded multi-channel content, the speech frames for all channels that are framed and encoded for the same 20 ms periods are logically collected in a frame-block. At the session setup, out-of-band signaling must be used to indicate the number of channels in the session, and the order of the speech frames from different channels in each frame-block.
When using SDP for signaling, the number of channels is specified in the rtpmap attribute and the order of channels carried in each frame-block is Sjoberg, et al. This property has been exploited in cellular systems to achieve better voice quality by using unequal error protection and detection UEP and UED mechanisms. On the other hand, it is acceptable to have some bit errors in the other bits, i. The number of class A bits for the AMR codec Moreover, a damaged frame is still useful for error concealment at the decoder since some of the less sensitive bits can still be used.
This approach can improve the speech quality compared to discarding the damaged frame. Link-layer protocols exist that do not discard packets containing bit errors, e.
With the Internet traffic pattern shifting towards a more multimedia-centric one, more link layers of such nature may emerge in the future. The relationship between UDP-Lite's partial checksum at the transport layer and the checksum coverage provided by the link-layer frame is described in UDP-Lite specification [ 19 ]. Note, it is still important that the network designer pays attention to the class B and C residual bit error rate.
Though less sensitive to errors than class A bits, class B and C bits are not insignificant, and undetected errors in these bits cause degradation in speech quality. These disadvantages can be avoided, if needed, with some overhead in the form of a frame-wise CRC Approach 2. In problem a , the CRC makes it possible to detect bit errors in class A bits and use the frame for error concealment, which gives a small improvement in speech quality.
For b , when transporting multiple frames in a payload, the CRCs remove the possibility that a single bit error in a class A bit will cause all the frames to be discarded.
The choice between the above two approaches must be made based on the available bandwidth, and the desired tolerance to bit errors.
Neither solution is appropriate for all cases. Section 8 defines parameters that may be used at session setup to choose between these approaches. Robustness against Packet Loss The payload format supports several means, including forward error correction FEC and frame interleaving, to increase robustness against packet loss.
Another possible scheme which is more bandwidth efficient is to use payload-external FEC, e. With AMR or AMR-WB, it is possible to use the multi-rate capability of the codec to send redundant copies of a frame using either the same mode or another mode, e. We describe such a scheme next. This is done by using a sliding window to group the speech frame-blocks to send in each payload.
Figure 1 below shows us an example. Here, f n The use of this approach does not require signaling at the session setup. However, a parameter for providing a maximum delay in transmitting any redundant frame is defined in Section 8.
In other words, the speech sender can choose to use this scheme without consulting the receiver. This is because a packet containing redundant frames will not look different from a packet with only new frames. The receiver may receive multiple copies or versions encoded with different modes of a frame for a certain timestamp if no packet is lost. If multiple versions of the same speech frame are received, it is recommended that the mode with the highest rate be used by the speech decoder.
In most cases the mechanism in this payload format is more efficient and simpler than requiring both endpoints to support RFC in addition. There are two situations in which use of RFC is indicated: if the spread in time required between the primary and redundant encodings is larger than the duration of 5 frames, the bandwidth overhead of RFC will be lower; or, if a non-AMR codec is desired for the redundant encoding, the AMR payload format won't be able to carry it.
The sender is responsible for selecting an appropriate amount of redundancy based on feedback about the channel, e. The sender is also responsible for avoiding congestion, which may be exacerbated by redundancy see Section 6 for more details. Use of Frame Interleaving To decrease protocol overhead, the payload design allows several speech frame-blocks to be encapsulated into a single RTP packet.
One of the drawbacks of such an approach is that packet loss can cause loss of several consecutive speech frame-blocks, which usually causes clearly audible distortion in the reconstructed speech.
Interleaving of frame-blocks can improve the speech quality in such cases by distributing the consecutive losses into a series of single frame- block losses. However, interleaving and bundling several frame- blocks per payload will also increase end-to-end delay and is therefore not appropriate for all types of applications.
Streaming applications will most likely be able to exploit interleaving to improve speech quality in lossy transmission conditions. This payload design supports the use of frame interleaving as an option.
For the encoder speech sender to use frame interleaving in its outbound RTP packets for a given session, the decoder speech receiver needs to indicate its support via out-of-band means see Section 8. Bandwidth-Efficient or Octet-Aligned Mode For a given session, the payload format can be either bandwidth efficient or octet aligned, depending on the mode of operation that is established for the session via out-of-band means.
In the octet-aligned format, all the fields in a payload, including payload header, table of contents entries, and speech frames themselves, are individually aligned to octet boundaries to make implementations efficient. In the bandwidth-efficient format, only the full payload is octet aligned, so fewer padding bits are added. Note, octet alignment of a field or payload means that the last octet is padded with zeroes in the least significant bits to fill the octet.
Also note that this padding is separate from padding indicated by the P bit in the RTP header. Between the two operation modes, only the octet-aligned mode has the capability to use the robust sorting, interleaving, and frame CRC to make the speech transport more robust to packet loss and bit errors. This payload format is expected to be useful for both conversational and streaming services. Low delay is one very important factor, i.
Low overhead is also required when the payload format traverses low bandwidth links, especially as the frequency of packets will be high. For low bandwidth links, it is also an advantage to support UED, which allows a link provider to reduce delay and packet loss, or to reduce the utilization of link resources.
A streaming service has less strict real-time requirements and therefore can use a larger number of frame-blocks per packet than a conversational service. However, including several frame-blocks per packet makes the transmission more vulnerable to packet loss, so interleaving may be used to reduce the effect that packet loss will have on speech quality.
A streaming server handling a large number of clients also needs a payload format that requires as few resources as possible when doing packetization. The octet-aligned and interleaving modes require the least amount of resources, while CRC, robust sorting, and bandwidth-efficient modes have higher demands.
This is specified in Section 4. AMR's capability to do fast mode switching is exploited in some non- IP networks to optimize speech quality. Here, each frame is retransmitted once in the following RTP payload packet. The mechanism described does not really require signaling at the session setup. However, signalling has been defined to allow for the sender to voluntarily bounding the buffering and delay requirements.
If nothing is signalled the use of this mechanism is allowed and unbounded. For a certain timestamp, the receiver may receive multiple copies of a frame containing encoded audio data. The cost of this scheme is bandwidth and the receiver delay necessary to allow the redundant copy to arrive. This redundancy scheme provides a functionality similar to the one described in RFC , but it works only if both original frames and redundant representations are GSM-HR frames. When the use of other media coding schemes is desirable, one has to resort to RFC The sender is responsible for selecting an appropriate amount of redundancy based on feedback about the channel conditions, e.
The sender is also responsible for avoiding congestion, which may be exacerbated by redundancy see Section 9 Congestion Control for more details. This payload format uses the fields of the header in a manner consistent with that specification. The duration of one speech frame is 20 ms. The sampling frequency is 8kHz, corresponding to speech samples per frame. Each packet covers a period of one or more contiguous 20 ms frame intervals.
During silence periods no speech packets are sent, however SID packets are transmitted every now and then. To allow for error resiliency through redundant transmission, the periods covered by multiple packets MAY overlap in time.
A receiver MUST be prepared to receive any speech frame multiple times. The rules regarding maximum payload size given in Section 3. The RTP timestamp corresponds to the sampling instant of the first sample encoded for the first frame in the packet.
The timestamp is also used to recover the correct decoding order of the frames. The RTP header marker bit M SHALL be set to 1 whenever the first frame carried in the packet is the first frame in a talkspurt see definition of the talkspurt in section 4. The assignment of an RTP payload type for the format defined in this memo is outside the scope of this document. The RTP profiles in use currently mandates binding the payload type dynamically for this payload format.
The following diagram shows the general payload format layout:. The format of the ToC octet is as follows. Figure 3: The TOC element. The first bit b1 of the first parameter is placed in bit 0 the MSB of the first octet octet 1 of the payload field; the second bit is placed in bit 1 of the first octet and so on. The last bit b is placed in the LSB bit 7 of octet An application implementing this payload format MUST understand all the payload parameters that is defined in this specification.
Any mapping of the parameters to a signaling protocol MUST support all parameters. So an implementation of this payload format in an application using SDP is required to understand all the payload parameters in their SDP-mapped form.
0コメント