Early Media and Ringing Tone Generation in SIP

The concept of “early media” can sometimes confuse . In RFC 3960 defines it as:

   Early media refers to media (e.g., audio and video) that is exchanged
   before a particular session is accepted by the called user.  Within a
   dialog, early media occurs from the moment the initial INVITE is sent
   until the User Agent Server (UAS) generates a final response.  It may
   be unidirectional or bidirectional, and can be generated by the
   caller, the callee, or both.
   An UAC should develop its local policy regarding
   local ringing generation.  For example, a POTS ("Plain Old Telephone
   Service")-like SIP User Agent (UA) could implement the following
   local policy:

      1. Unless a 180 (Ringing) response is received, never generate
         local ringing.

      2. If a 180 (Ringing) has been received but there are no incoming
         media packets, generate local ringing.

      3. If a 180 (Ringing) has been received and there are incoming
         media packets, play them and do not generate local ringing.

         Note that a 180 (Ringing) response means that the callee is
         being alerted, and a UAS should send such a response if the
         callee is being alerted, regardless of the status of the early
         media session.

Simply ;

Early media is the exchange of information before establishment 
of a connection.

In RFC 3261 ;

21.1.2 180 Ringing

   The UA receiving the INVITE is trying to alert the user.  This
   response MAY be used to initiate local ringback.
...
21.1.5 183 Session Progress

   The 183 (Session Progress) response is used to convey information
   about the progress of the call that is not otherwise classified.
  • If you know that the phone is ringing (an ALERT Q.931 message, for instance) you send a 180 Ringing.
  • If you receive a notification indicating that the call is progressing, but you do not know for sure whether the user is being alerted or not, you send a 183 Session Progress message.
  • Both can indicate early media with SDP. If there is no SDP, the end device (softphone/gateway/etc.) has to generate the ringback tone or progress tone.
  • Usually you will see 180 without SDP while 183 with SDP. It is a good practice to leave the tone generation for the endpoints.
  • If you get 183 you should open media connection because there is audio ready for them to hear.
  • If you set ringback var and ignore_early_media, both 180 and 183 will trigger your fake ringing. If you set instant_ringback=true then it will not wait for 18x it will start fake ringback instant (asterisk mode).

 

A good example to clarify the Early Media scenario:

  • Part A picks up his phone, hears dial-tone, and enters a phone number
  • After a while, he hears ringing. (This is “early” media because the call hasn’t been answered yet)
  • Meanwhile part B’s phone starts to ring
  • After a few rings, part B picks up, and the call is established.
  • Part A and part B can now hear each other speak.

Another good example is when a busy signal. Using the same parties from the previous example:

  • Part A picks up his phone, hears dial-tone, and enters a phone number
  • After a while, he hears a busy signal. (This is “early” media – no call has been established)
  • Part A hangs up

The busy signal is an audible signal – a form of audio media if you will – that lets the calling party know that the call has not gone through. It is an unconnected call, but it still had sound. In a case of per-call billing, this call would not be billed (usually) because it was never connected. The same holds true for calls that are ring/no answer. It even holds true for calls to disconnected numbers where you hear the Special Information Tones (SIT) and a recorded message.