attempts to determine the client and server and, in certain circumstances, also attempts to identify the payload
We'll talk about payload later
when we discuss AppId
's HTTP handler.
for a given Flow
Recall that each TCP and UDP packet belongs to a Flow. In the case of a TCP session, all of the packets in the TCP session belong to the same Flow. In the case of a UDP session, all of the packets coming from and going to the same IP addresses/ports belong to the same Flow.
. For example, if a PcAnywhere
client connects to one of the PcAnywhere servers, AppId
will (hopefully) determine that the client is a PcAnywhere client using the PcAnywhere service. As another example, if you are using a Mozilla web browser, AppId
will figure out that the client is a Mozilla browser.
AppId is used in rules. Let's look at a rule in which the appid option is used:
alert tcp any any -> any any (msg:"Someone is using PcAnywhere!"; appid: pcanywhere; sid:1000000; rev:1)
This rule will cause an alert to be generated if AppId determines that PcAnywhere clients and servers are communicating. Let's look at another rule:
alert tcp any any -> any any (msg:"Someone is using a Mozilla browser!"; appid: mozilla; sid:1000001; rev:1)
This rule will cause an alert to be generated if AppId determines that a packet was sent by the Firefox browser. AppId tries to determine the client and/or the service so that the Rule Detection Engine can match a rule with a given packet.
does whatever it can to determine a Flow
's client and/or service. If AppId
determines the client and/or service, it will store the client or service's unique id (its "appid
") in the Flow
I use the term "AppId" to describe the component of Snort++
and I use appid
(lowercase) to describe the unique integer assigned to each application. We'll see a little later
where these unique integers are found.
The Rule Detection Engine will then compare all of the elements of application_ids to each rule
that contains the appid
option. This means that if AppId
determines that either the client and/or the service is using the PcAnywhere protocol and a rule contains the appid: pcanywhere
option, then the Flow
matches with the option (although the packet must also, of course, match will all the other options of the rule for there to be a match between the packet and the rule).
So there are two questions that still need to be answered. First, where is the appid for each application defined and, secondly, how does AppId determine the appids for each Flow?
The first question is easy. The appid
s correspond to the first column of the appMapping.data
The other columns, if non-zero, are unique as well and correspond to the service, client, and payload (we'll talk about payload later). A zero value indicates that the application is not used as a service, client, or payload. For example, the Internet Explorer
browser is (obviously) only a client so its service and payload are 0. The elements of application_ids
are only set to the value of the first column, the application's appid
If any of the appids for a given rule (there can be more than one appid in a rule) matches the AppId of the service, client, or payload, then the option matches. For example, if AppId determines that the client is a PcAnywhere client, application_ids[APP_PROTO_CLIENT] will be set to 781. If AppId determines that the service is a PcAnywhere service, application_ids[APP_PROTO_SERVICE] will also be set to 781. If AppId determines that the client is either a PcAnywhere client or that the server is a PcAnywhere server, then the following rule:
alert tcp any any -> any any (msg:"Someone is using PcAnywhere!"; appid: pcanywhere; sid:1000000; rev:1)
will match and an alert will be sent.
The second (and much more difficult) question is - how does AppId
figure out the service, client, and (if the AppId HTTP handler
comes into play) the payload? Before we answer that question, let's talk about how AppId
is invoked. One of the things that I don't like about the figure from the manual
(shown at the top) is that it only shows AppId
as a subscriber (we'll talk about subscriber-publisher relationships soon). However, it can also be an Inspector
(more specifically, a network Inspector
AppId also tries to determine the client and service for UDP packet Flows. I discuss TCP Flows because it is the more interesting case.
that the network Inspector
s (including AppId
inspector is binder
. By calling Wizard
serves a similar purpose to AppId
. In a sense, AppId
compete with one is a substitute
compete with each other.
are invoked for all of the packets in a Flow
until a service Inspector
is identified by the Wizard
. (Note that an appropriate service Inspector
won't necessarily be found for a given Flow
. For example, there is no service Inspector
that corresponds to PcAnywhere. )
Once an appropriate service Inspector
for the Flow
is found (if one indeed exists), this service Inspector
can request that AppId
look at the various components (header, body, etc.) of the Flow
. This is how a subscriber/publisher relationship works. AppId
is, in fact, the most important example in Snort++ of a subscriber/publisher relationship. During Snort++'s initialization, AppId expresses an interest (i.e., "subscribes to") in HTTP header events
then publishes HTTP header events
after it has identified an HTTP header in an HTTP session
Note that only a handful of service Inspectors currently publish events - the most important being HttpInspect.
So, to recap, AppId can be invoked as a network Inspector and can also be invoked in response to events published by the service Inspectors (if AppId has registered itself as a subscriber to these events).
Returning to the second question - how does AppId identify services, clients, and payloads? It has several different approaches, some of which are used by AppId, the network Inspector, and some of which are used by AppId, the subscriber. Some (but not all) of these approaches rely on Detector objects. We'll talk about Detectors in a little bit.
AppId, the network Inspector
Let's talk first about AppId
, the network Inspector
uses various arrays, search engines, and hashes (which we'll loosely refer to as "databases") to make its decisions. (We'll talk a little later about how items are added to these databases
- this process is complicated by the fact that these additions are often made using Lua code). Some of these databases are used if the packet is coming from the server or client, some are only used if the packet is coming from the server, and some are used if the packet is coming from the client
It's important to understand that AppId, the network Inspector, does not determine the payload - only the AppId HTTP handler can do that.
Like the typical Inspector
is invoked by its eval()
method when used as a network Inspector
looks in these different databases. Here are the different databases that are used if a packet is coming from a server
As you can see above, a TCP packet from the server (i.e., a packet with a lower source TCP port number than the destination TCP port number) is compared against 6 databases. If a match is found in a database, the client_app_id
, and/or misc_id
fields of the packet's Flow
What is the AppIdFlowData class
is a FlowData
. A Flow
can have multiple objects of FlowData
es but only one of each specific FlowData
. These FlowData
objects are "used by various inspectors to store specific data on the Flow for later use"
. For example, a Flow
could have an associated single (i.e., not multiple) ReputationFlowData
object, a single AppIdFlowData
object, and a single HttpFlowData
object. As mentioned above, client_app_id
, and/or misc_id
are the "specific data" used by AppId
I will explain later how these AppIdFlowData fields are used to set the packet's Flow's application_ids mentioned above. Let's see now which databases affect which field in AppIdFlowData.
1) The IP address, (TCP or UDP) port, and protocol (TCP or UDP) is looked up in the host_port_cache hash
. If an entry is found matching these three parameters, the client_app_id
, and/or payload_app_id
fields are set. (I will discuss later
how this and other databases are populated.)
The first three databases (1-3) in the diagram above do not require validation. In other words, if an entry is found in one of the databases in the first three steps, the packet is not inspected to determine if there really is a match. For example, if an entry of "18.104.22.168", TCP port 443
exists in the host_port_cache
hash (and it should if using the default AppId
configuration) and AppId
sees a packet from "22.214.171.124", TCP port 443, then service_app_id
is unconditionally set to 4116 (which corresponds to Telegram
) without further validation. In other words, if the packet comes from "126.96.36.199", tcp port 443, the packet must not be scrutinized further to verify that the service is really Telegram.
The next three databases (4-6) for packets from the server to the client do require validation. At this point, we need to discuss Detector
s are also used in steps 1-3. However, they are used in a somewhat kludgey manner to load multiple entries into various databases. In the typical scenario, a Detector
is written for a single service or a single client. In the case of the Detector
s for databases 1-3, the Detector
s are written for multiple services or clients. For example, the payload_group_hootieandtheblowfish Detector
was written for an unrelated collection of clients, servers, and payloads (hence the silly and meaningless name of the file).
s are typically written for a single client or a single service (e.g., the PcAnywhere client
and service Detector
s can be written in C++ or Lua. The core service Detectors
(e.g., ssh, sip, http, smtp) are written in C++ while the more obscure services (PcAnywhere, BitTorrent) are written in Lua
The advantage of writing a Detector in C++ is speed. The advantage to writing a Detector in Lua is convenience (you don't need to recompile Snort++ if you change a Detector written in Lua). Most Detectors are written in Lua.
. There are two important tasks of a Detector
. The first important task of a Detector
is to insert entries into the different databases.
The figure above shows the different C++ and Lua functions required to insert entries into the different databases. We'll talk more about this a little later.
The second important task is to validate that the Detector
is indeed the appropriate Detector
for a given Flow
(as explained above, this is unnecessary in databases 1-3). This is typically done after a match with an entry in one of the databases that corresponds to the client or service for which the Detector
is responsible. For example, the SSH service Detector
during initialization inserts an entry of "SSH-"
in the tcp_services
search engine. Obviously, just finding "SSH-" in a packet isn't enough to be 100% certain that the packet is from an SSH service. So the SSH service Detector
will be asked to validate that the packet is indeed from an SSH service
Now that we understand a little about Detectors, let's look at databases 4 through 6. Here's the diagram again.
Aside from validation, databases 4-6 also differ from databases 1-3 in that databases 4-6 are found in the ServiceDiscovery object. In concrete terms, this means that the next three databases are specific to packets coming from the server; the first three databases are common both to packets coming from the client and packets coming from the server.
4) The packet's source port is looked up in the tcp_services
array. The corresponding element, if non-NULL, will point to the Detector
object that corresponds to the port. For example, the SshServiceDetector object corresponds to port 22
. If a packet is from port 22, this object's validate()
method is called to look at the packet and to verify that the packet is indeed originating from an SSH server. (Just because a packet is from port 22 doesn't mean it's an SSH session.) Upon validation, the Detector
method typically sets the serviceAppId
field. (A service Detector
could, however, set any of the fields.)
5) The packet is compared to patterns in the tcp_patterns
search engine that indicate a specific service. For example, the string "SSH-" indicates that a packet originates from an SSH service. Similar to step 4, if a match is found, the service Detector
method is called to verify and (if appropriate) set the serviceAppId field
. (Database 5 is not needed if a Detector
is found and successfully validated in step 4.)
6) The last step is desperation. If no Detector
s were found in Databases 1-5, the "brute force" approach
is then taken. This involves invoking the validate()
method for all of the service Detector
s. These service Detector
s are found in the tcp_detectors
If the packet is coming from the client, the approach is similar but abbreviated:
The first three databases are the same as for packets coming from the server. For a client-originated packet, only the tcp_patterns database is searched against. If a match is found and the validation of a client Detector is successful, validate() sets the client_app_id field. The main task of a client Detector is to identify the client application used. However, the client Detector also occasionally attempts to determine the service as well. If the client Detector determines (perhaps incorrectly) that a certain service is being used, it will set the client_service_app_id field. Note that the client_service_app_id field will only be used if serviceAppId field is not set. (The idea behind this precedence is that a service Detector is typically better able to determine the appropriate service for a Flow than a client Detector. We'll talk a little more about that in a moment.)
As mentioned above, AppId's goal is to set the four application_ids array elements of Flow so that the detection engine can try to find matches between the elements of this array and the appid options found in the rules. After the different databases are compared with a given packet (from the client or server), the client_app_id, serviceAppId, payload_app_id, portServiceAppId, client_service_app_id, and/or misc_id fields of the packet's Flow's AppIdFlowData fields may be set. So how do these fields map to application_ids?
Choosing the AppId
for the service is the only tricky one. If serviceAppId
was set after looking in the host_port_cache
hash or in the tcp_services
search engine, or the tcp_detectors
hash (brute force), then serviceAppId
is chosen, regardless of the values of client_service_app_id
is considered a weak candidate because the client Detector
, not the service Detector
, which is better suited to determining the service. All that is required for portServiceAppId
to be set is for a given port to be used. For example, if one of the ports of a Flow
is TCP port 22, then portServiceAppId
is set to 846
, SSH's appid
. Since this is so unreliable (it's trivial to switch the service-side port number of SSH as well as most other service), portServiceAppId
is considered a weak candidate
The use of third party modules is still under development. For this reason, I do not discuss the tp_app_id or tp_payload_app_id fields ("tp" stands for "third party"). When this code is finalized, these fields will play a role and I will document them.
Adding Entries To The Databases
I've introduced several databases but I haven't yet explained in detail how the databases are populated. These databases are populated by the initialization routines of the Detector
s. As I've already mentioned, Detector
s can be written in the C++ language or the Lua language. Here are the functions that populate the different databases
As you can see from the above diagram, the host_port_cache and lengthCache are populated only by Lua Detectors. There would be no advantage to writing a C++ Detector that would add entries to these two databases since these two databases do not require validation (validation functions scrutinize packets in real time so speed is critical).
When one of the Lua functions in the diagram above is called, the Lua function invokes a C++ function. This interaction requires the Lua/C++ API. The Lua/C++ API defines the interface between the Lua code and the C++ code. Calls to the Lua API are made during initialization and validation. During the initialization of a Lua Detector, C++ code calls the Lua Detector's Lua initialization functions and this Lua initialization function, in turn, calls C++ code to insert entries into the various databases that were described above.
AppId Http Handler
We've already discussed the AppId network Inspector
. Now it's time to talk about the AppId
HTTP handler. As I mentioned earlier, the AppId network Inspector
is called when the Binder/Wizard
combination is unable to find an appropriate service Inspector
for a Flow
. If HttpInspect
is chosen by the Wizard
, it will ask AppId
to look at the HTTP request headers to determine the client and payload (the service will be "HTTP")
Don't worry. I'll talk about what "payload" means shortly.
As discussed elsewhere
is able to break an HTTP session into its components (header, body, etc.). From an HTTP request header,
can determine the client and the payload. For example, if the User-Agent field contains "Firefox"
, the client is a Firefox browser. AppId
is also able to determine the "payload". I've mentioned "payload" before but I haven't discussed it in detail because the AppId network Inspector
is not able to determine the payload. The AppId
HTTP handler, on the other hand, is able to determine the payload. It determines this information from the Host
and (occasionally) from the path
fields of the HTTP request header. For example, if the Host field is set to "facebook.com" and the path is set to "/notes"
, the application_ids[APP_PROTOID_PAYLOAD]
element is set to 1360 (the appid for Facebook Notes)
Since the AppId HTTP handler at this time is only interested in HTTP request headers, the name "payload" is a little misleading. An HTTP header is certainly in a TCP session's payload but the term "payload" is pretty vague. The payload can be derived from the path and Host fields.
Let's talk about how HttpInspect gets this HTTP header information to AppId. During initialization, modules "subscribe" to specific events. Later, as Snort++ is processing packets, any Inspector (and - in theory at least - the Rule Detection Engine as well) can "publish" events.
Here's an HTTP header again:
All of these items (e.g., the host, the user-agent) are contained within the HttpEvent
More specifically, they are contained in norm_heads
field (which is a linked list containing all of the header fields) of the http_msg_header
The CHP search engine group is preferred by AppId
above all of its other search engines. In other words, if AppId
finds a CHP match, AppId
is content with the results and the packet is compared against no other search engines. There is a CHP search engine for the HTTP header fields as well as the body
. The most important HTTP header fields are the host (HOST_PT
) and uri (URI_PT
) fields. These CHP search engines are populated by Lua Detector
First, the Lua Detector
creates a CHPApp
and then calls the Lua CHPAddAction()
method to create CHPAction
s and populate the CHP search engines. CHPApp
s link the different CHPAction
s together. For example, there are 3 CHPAction
s created for the Facebook Detector
The strings from each CHPAction are added to the appropriate search engine with pointers to their respective CHPActions. Later, when AppId is invoked by the HttpInspect inspector, in order for there to be a match with this Detector, all three strings must be found: "facebook.com" must be found in the HTTP header's host field; "/login.php" and "email=" must be found in the path.
Furthermore, the email address can be extracted from the path
(it can be found immediately after the "email=" string)
How is the extracted email used by AppId? At least for now, it doesn't appear to be used for anything.
A CHPTallyAndActions struct keeps track of matches with strings in the CHP search engines.
Each time a string from one of the CHP search engines matches, its associated CHPAction is added to the appropriate element in the chp_matches array. For example, during initialization CHPAddAction() added "facebook.com" to the CHP HOST_PT search engine. If "facebook.com" exists in the Host field of an HTTP header of a pseudo packet that AppId is inspecting, then the CHPAction is added to the HOST_PTelement's linked list. In this way, AppId can determine if all of the required elements exist.
If no match is found in the CHP search engines, the last thing checked is the URL, which will include the host and the path. The layout of the search engines allow for permutations in the path. For example, "disqus.com" has 2 possible payload_id
's - "/" (payload_id=798)
and "/woopra" (payload_id=1001)
So if "disqus.com" is found in the search engine corresponding to the Host field, a second search engine will be searched for "/" and "/woopra".