Transaction layer The SIP RFC divides the architectureinto layers. We actually went through two of the layers in thediscussion above: the first was the syntax and encoding layer thatdefines the message structure, and the second was the transport layer.Now it’s time to inspect the contents of the SIP message by taking alook at the transaction layer.
The SIP layers
Every SIP message is associated with a singletransaction. Similar to HTTP, messages are either requests orresponses, but unlike HTTP, matching responses to requests is notsimple. HTTP uses TCP as its transport, so you can match a responsebased on the order of the requests. But a SIP transaction can have morethan a single response, and, in some cases, more than one request. Whena SIP device sends a request, it acts as a user agent client (UAC). Therecipient of the request, the one that sends the response, acts as auser agent server (UAS). The layer above the transaction layer is named”transaction user” or TU. Let’s look at a SIP request that a UAC caninitiate:
REGISTER sip:arstechnica.com SIP/2.0
Via: SIP/2.0/UDP home.mynetwork.org;branch=z9hG4bKmq0Tgb
CSeq: 153 REGISTER
We have already seen that [i]REGISTER[/i] is the method (type of request), [i]sip:arstechnica.com[/i] is the request-URI, and [i]SIP/2.0[/i]is the version. All the headers above are the mandatory. At this point,we’ll cover the headers that are important to the transaction layer,and we’ll cover the rest of them when we get to the way proxies andregistrars work. First, let’s examine the Via header.
TheVia header has a parameter called “branch” with an odd value. The first7 letters (z9hG4bK) are fixed, and they help identify this as a SIPtransaction based on RFC 3261. These letters, often referred to as the”magic cookie”, would not appear with a request that is using theprevious SIP RFC, which has different transaction matching rules. We’llonly look the cases that have the magic cookie because it’s very raretoday to encounter an implementation that has not caught up with thelatest spec.
After the seven letters, the rest is just arandom string. Every time you see a different branch value it’s adifferent transaction; conversely, if both messages have the samebranch value, then they should be the same transaction. One exceptionto this rule is if the method of the CSeq header is different. This isbecause the CANCEL method uses the same branch value to identify whichtransaction you should cancel. So, to fully match two messages to thesame transaction, both the branch and CSeq method have to match.Naturally, this means that responses to a request will have matchingvalues.
Before moving on, one final note on the Via header.When we refer to this header, we actually refer to the first, ortopmost, Via header. Via is one of those headers that can appearmultiple times within a message. The reason for this will become clearin the proxy section, but it’s important to note that you always matchthe transaction based on the first Via header and ignore the rest.
TheUAS sends a response to an incoming request. SIP responses are dividedinto 6 different classes, and the first digit of the 3-digit responsecode identifies each class. A 1xx response means any response in therange of 100 to 199. The response types are:
[list][*]1xx – [b]Provisional response[/b], whichindicates that the request is handled, but without a final responseyet. For example, 180 Ringing is a common provisional response.[*]2xx – [b]Successful response[/b]. The most common one is 200 OK.[*]3xx – [b]Redirect response[/b].A client receiving this response would know the user moved to adifferent location. For example, a phone may redirect all its calls toa different address by responding back with a 302 Moved Temporarily.[*]4xx – [b]Client error[/b],which means that the request cannot be fulfilled and the sender shouldmodify its request. For example, you can send 401 Unauthorized if therequest does not contain the correct user credentials.[*]5xx – [b]Server error[/b],which usually indicates that the error is not related to the request,but to the state of the server or the server capabilities. For example,you would send 501 Not Implemented when receiving an unknown method.[*]6xx – [b]Global error[/b],which indicates the request cannot be fulfilled by any server. It wouldbe rare to receive such responses, as it requires having globalknowledge of the network.[/list] SIP dedicates special attention to making sure theresponse is sent back to the same source IP that sent the request. Thisis, in fact, one of the roles of the transport layer, not thetransaction layer. The transport layer does this by adding a “received”parameter to the top Via header of the request. Later, [url=http://tools.ietf.org/html/rfc3581]RFC 3581[/url]defined a new parameter called “rport” to ensure that the response issent back to the same originating port. Both of these additions wereaimed at making SIP work over NAT. SIP’s default behavior is to sendthe response back on the same connection of the request, but in case itfails to do so, it will attempt to open a new connection. Therefore,none of the layers can assume a single transaction uses a singleconnection. A possible SIP response to the request above is:
SIP/2.0 200 OK
Via: SIP/2.0/UDP home.mynetwork.org;branch=z9hG4bKmq0Tgb;received=172.16.75.2
CSeq: 153 REGISTER
The example shows a successful response, but a UASmay choose to send an error response, such as the well-known “404 notfound,” in an instance where the user is not known. Both the UAC andUAS maintain a state machine for each transaction, and each statemachine has timers. Timers are necessary in case the other side doesnot respond in time, and they’re also required in case the layer abovethe transaction layer does not send a proper event and leaves thetransaction open.
Ultimately, SIP has built each of itslayers to be as decoupled as possible from the other layers, and anerror in any one layer has minimal impact on the rest. This separationmakes it easy for programmers to separate their software into smallercomponents.
The protocol distinguishes between 4 types oftransactions, so it has 4 different types of state machines: clientINVITE, client non-INVITE, server INVITE and server non-INVITE. Wehaven’t mentioned the INVITE method yet, and for a good reason. INVITEis a method used to generate a call, and these lower layers do notmaintain the call state. However, this transaction is different becausecalls have a 3-way handshake that affects the state-machine. Let’sstart with a diagram of the client non-INVITE transaction state-machine:
The Non-INVITE client transaction
Most of the timers are for retransmissions in UDP, andthey are disabled in TCP. An additional timeout timer exists in case noresponse is received. Transactions normally exist for 32 seconds untilthey time out. The equivalent server state-machine is quite similar; itreceives a request, sends it to the TU, sends the response back, andhandles retransmissions if required. It should be noted that some ofthe non-INVITE transaction definitions were updated by [url=http://tools.ietf.org/html/rfc4320]RFC 4320[/url].
Let’scover the 3-way handshake. The UAC sending the INVITE waits for aresponse, but this time to complete the handshake it sends an ACKrequest back to the server. ACK has no response, as it’s the 3rdmessage in the handshake. This fact forces ACK to be an exception tomany of the rules.
When a client receives a successful(2xx) response type, it means a call was created and it will send theACK in a new transaction. A failure response (300-699) means the ACKwill be on the same transaction. The reason for this lies in thebehavior of the upper layers. We will see that proxies are not aware ofa call state, and those that are stateful maintain just the transactionstate. There are scenarios in which a proxy would need to ACK a failedresponse, but it cannot ACK a successful response because that wouldrequire understanding call-related information. The INIVITE clientstate machine is as follows:
The INVITE client transaction