Investigating Privilege Elevation on Linux

Benoit Hamelin — Wed, 12 Apr 2017 20:23:47 +0000

We’re talking about ways in order to elevate privileges on the Linux platform and the discussion today will also be of use for anybody who’s worried about privilege elevation on other UNIX platforms. Basically, whenever you’re worried with privilege elevation on UNIX, there are basically two paths that you should be looking at or for elevating privileges.

Watch a video version instead

The first such path is a kernel exploit. Innocence, your attacker will attempt gain execution in the kernel by hacking some user mode interface to a kernel service. Once that hacker has gained execution in the kernel, his goal will be to grant root privileges to a running shell or backdoor process through poking in the kernel’s data structures. That’s fortunately very easy to detect. Once this has happened we just have to have a scanner of all running processes and to take a look at their user IDs. Whenever the user ID of a running process changes by decreasing thereby suggesting that the process is increasing its own privileges it goes against a golden rule of the UNIX security model, which is that privileges for a given process can be reduced but can never be augmented. That’s pretty easy to detect that flashes is a big red light on our dashboards and there is little that a hacker can do in this space that will go beyond our surveillance.

The alternative from exploiting the kernel is to hack a highly privileged user mode application. In this case you want to gain execution in some daemon. Either through a good old memory exploit or more generally by exploiting a fault in the way this daemon handles it in books or a fault in the protocol for handling inputs either from the design or the implementation. An alternative on UNIX, you can hack the input parameters to a setuid program.

A setuid program is basically the tool of the devil. It’s on the other hand the only way for a user mode program to go from an unprivileged program to a high privilege program. In a sense, setuid programs, when they are started, get the privileges of the owner of the program rather than inherit the privileges of the parent process. This means that if a setuid program belongs to root when you run it, this program runs as root. Even if it was run by good old me who is not root. These things are evil, it’s the only way to properly implement the normal privileged elevation programs such as su or pseudo but many systems administrators are hell-bent on having a non delegating certain systems management tasks using setuid program and this is a very very bad idea and most privileged elevations go through these setuid programs.

Watch a video version instead

What you do when you are hacking a frivolous user mode application? Essentially, you coax it into forking a shell or a backdoor for you. This is hard to detect, surprisingly. Why? It’s not so much that it is difficult to detect whenever a certain program for child process that suddenly has a higher level of privileges, it’s just that this pattern of events is frequent. Many high privilege applications will forge into high privilege child processes on a regular basis thereby granting many false positives to any test checking that. What do you do in this case? All the alleged stuff you try to whitelist, you will inspect, you will audit your typical privilege elevation pathways on a system but you will only whitelist them at your own peril as the upcoming examples are about to show.

Two examples here. One is about a situation where we get the sense of a privilege elevation but there is none and the other is an actual good old privilege elevation problem.

Let’s take a look first at this investigation here. What we have here is this process called polkit-agent-helper-1 which is now running as root but has a parent that was polkit-gnome-authentication-agent-1 that was running as user Victor. User Victor was able to fork a new process called polkit-agent-helper-1 from the polkit-gnome-authentication-agent-1.

What’s at stake here? When you’re in the know as to how that gnome interface systems work, you quickly learn that gnome has its own the wrapper around the pseudo process. This wrapper basically handles the authentication checking and then once the identity of the user has been checked and been known to match the set of users who can run pseudo, then it has its own setuid helper process, in this case it’s called polkit-agent-helper-1 in order to get it to elevate privileges on a running gnome process.

When you go and look at what is happening here right so this is the raw data stream sent by the machine on which this elevated process was found to run.

When you go into the the past for this, you see that the user was running aptd likely through the systems update tools enabled by his Ubuntu machine.

When you’re running aptd and you call for a system update, then of course you need to elevate your privileges in order to get this updated system. This is what happens here and after that the system the starts downloading new packages and if we keep digging down and exclude everything that’s concerned with Chrome we see that stuff starts deleting an old version of libc and then it starts decompressing and unarchiving new packages, in this case, apparently, go1.8.linux-amd64.gar.gz and we see other traces through new processes that talk about the update of the system.

Watch a video version instead

In this investigation, it is clear that the process elevation from the gnome identification agent when the high privilege process created is polkit-agent-helper-1 this is clearly a legitimate setuid process granting legitimate privileged elevation.

All right, onwards to our next example. Here we have an investigation where the essential label is that privilege process ifstat with unprivileged parent bash. okay Let’s take a look at this guy.

Here we have a process in path /usr/local/bin/ifstat that’s been started by the good old shell from user hamelin, that was me yeah and that I did prepare for this it doesn’t look like it. This process is running as root.

What can we expect from this? What is the smell that I get? Well, clearly this program ifstat has been put into /usr/local/bin. Typically, people deploy programs in that directory when its programs that are not managed through the system’s packaging system. This means that likely this ifstat program is a homegrown piece of software or something built and put there through a hand managed package.

I go through to the systems administrator and ask, “What is ifstat?”. As the systems administrator I say, “It is a wrapper around ifconfig”. The ifconfig program for context is one by which you can query and modify the network configuration for a machine. So me again as the systems administrator tell you it’s because I wanted ifconfig to only be runnable by root and so I have written this wrapper around ifconfig called ifstat to enable users to use ifconfig only to query the status of network interfaces. “Ah” says I, “That is interesting. So how do you manage to get it to run ifconfig if your wrapper is meant to be run by lowly unprivileged users?” Me as a systems administrator say, “I have configured it as a setuid program. ” Then me as a hunter say to me a systems administrator, “You are a fool because look what this user have has done to you.”

Watch a video version instead

It has called your ifstat tool in order to get to query the status of the eth0 interface up to now that’s fine, but then what do you see? You see the typical signature of somebody hacking the inputs of a setuid program. In this case, the user has managed to introduce the possible and invocation of a shell. Basically, the inputs to this program is probably fed into the system libc function which starts a shell in order to run a child program and when you do so and do not properly sanitize the inputs that you set up to be run by system, then you run the risk of your users putting in inputs that do way more than you intended to do with your wrapper.

Let’s dig down the little from there in order to figure out whether the user actually managed to elevate his privileges in this instance. The same way I did with the first instance, I go to the raw data stream for this machine around the instance of the detected privileged elevation. Clearly this has happened soon after starting the system, so I go down I am looking for an invocation of the ifstat program. Here it is, ifstat.

This code identity piece of telemetry was about hashing this program and then here I see it, oh I see a new process here where a process which was the parent the parent was ifstat which we saw run from a shell initially and now this program ifstat is launching a shell in order to run ifconfig eth0 followed by bin/dash.

This does not smell good because what happens after that. It runs this shell through ifconfig who then runs ifconfig after that starts a dash process. Dash is a shell on the system and this dash process is indeed running as root.

There is indeed in this instance the actual elevation of a process through an illegitimate mechanism. This privilege elevation event requires immediate incident response because once this dash shell was launched, it had root privileges so it was able to do everything on the system including deploying a backdoor, including deploying a rootkit.

I thank you very much for your attention and we’ll get to make another show like this in about a month thank you very much cheers and goodbye.

Watch a video version instead

Insider Threats

Vytenis Darulis — Sat, 08 Apr 2017 00:16:14 +0000

The increasing connectivity and openness of today’s information systems often lets cyber-attackers find ways into a system across many different paths. Data from the 2016/2017 Global Fraud and Risk Report by Kroll shows that more than 85% of executives experienced a cyber incident over the past year. It’s important to say that an “incident” is not necessarily synonymous with a breach. The report summed up the type of incidents this way:

38% experienced theft or loss of intellectual property
33% reported virus attacks
26% experienced phishing attacks in email

It is worth to mention that many cyber incidents have the same origin: nearly half (44%) of respondents hold insiders responsible for cyber incidents and more than half (56%) say insiders were the main reason for the security problems. For a long time, the primary objective for security teams has been to protect the perimeter — the focus was on keeping outsiders from gaining access and doing harm. But many reports show that more risk exists within the organization.

In a 2016 report from the Ponemon Institute, researchers found that attacks from malicious insiders were among the most costly. On average, these cost companies an average of $4,000,000 per year. With many companies still failing to realize the full scope of this threat, this is a number that is likely to increase in the future.

The threats from inside are much more difficult to detect and prevent because the users are authenticated on the domain already. External attacks, however, must exploit a sophisticated security system. Insiders have access to sensitive information and may know how that information is protected. If they want to steal it or leak it, they can do so with far greater ease than outsiders.

Almost all big organizations have some employees who are unhappy at work. This means that there may be people who have access to sensitive data and who have a motive to sell it. Government agencies estimate that there is one insider threat for every 6,000 to 8,000 employees. Formerly, robbers came straight up with a weapon to steal money from the bank, but now things works differently: the attacker hires employees for such tasks, thus no one notices. Many insiders are actively recruited by criminals over the Dark Web. This platform is most popular to recruit employees from financial institutions, hospitals, government sectors and other organizations for offensive services. Some companies are hiring even cyber security specialists to monitor and track users on the Dark Web that are planning to inflict sundry harmful acts.

Companies have to leverage tools in order to monitor and detect such threats. Probably the best solution is permanent monitoring of employee activity. Arcadia’s SNOW EDR platform uses complex algorithms to find anomalous or dangerous behavior. It proactively searches through networks 24/7 to detect and respond to various advanced threats. At the same time, it provides a continuous collection of data for analysis and the layout of a timeline. Therefore, it is easy to track back what users have been doing. Investigators use the SNOWboard hunting platform where all investigation leads are collected. This platform offers very detailed information about what happened on the host and network: execution of binaries, loading modules, changes made to the file system and registry, as well as network connections.

You want to trust your employees, and you have probably done some verification to ensure that you can trust those who work with critical information. Unfortunately, this is not always enough. Advanced tools can help to find and stop insider threads before they wreak any meaningful damage.

Detection of Privilege Elevation by Malware on Linux

Benoit Hamelin — Tue, 04 Apr 2017 01:08:53 +0000

One of the hallmarks of targeted cyber attacks is to seek, from an execution toehold on a host, to increase its computational privileges in order to assert greater control of the system. Once the attacker has attained this position, it may become tremendously difficult to detect them, especially if they act and persist through a kernel rootkit. Fortunately, the privilege elevation process tends to be noisy, and can be detected prior to succeeding, if one looks for the proper clues. This article presents detection heuristics for privilege elevation on Linux systems.

There are multiple ways for a program to seek privilege elevation, but they all are based on one simple principle: high privileges are granted by other high privilege programs. Such programs can be classified across two categories:

the operating system kernel;
highly privileged user-mode programs.

The signs of exploitation differ depending on whether the attacker is exploiting the kernel or a user-mode program. Let’s start with the latter.

Exploiting high-privilege user-mode programs

There are basically two categories of exploits at play here:

forking off of a running highly privileged program;
starting a highly privileged program from a lowly privileged program.

Case (B) is the easiest to monitor for malware detection. The typical scenario considered here is that the program just started was a set-UID or set-GID program. Any process under UNIX is running as a certain user and group, and its set of privileges derive from them. The usual privilege transfer protocol stipulates that the user and group of a process are inherited from the parent process that forks it off; the child process then executes a new program, which does not alter this set of privileges. However, set-UID and set-GID executables change this protocol: after the child process has been forked off, when it executes this program, the user (set-UID) or group (set-GID) it runs as becomes that which owns the program. Therefore, if the executable is owned by a user with a high level of privilege — typically the root user — the process runs with high privileges.

Many common UNIX programs, such as su and sudo, necessarily leverage this privilege transfer rule. However, many systems administrators tolerate other set-UID or set-GID programs in order to facilitate the execution or delegation of certain tasks. This is a dangerous vulnerability, especially if the set-UID/GID programs are relatively complex. Indeed, many hackers know how to supply parameter sets to such programs in order to trick them to run arbitrary shell commands.

Case (A) is quite a bit harder to monitor. In this case, the exploit consists of submitting specially crafted input to a highly privileged program so as to either take advantage of a design flaw, or to exploit a vulnerable bug. In both cases, the end result is that the attacker can trick the program into forking a process from which the exploit provides means to run a chosen program. The child process from the forking operation naturally inherits its parent’s high privileges. This case is harder to monitor, because many service daemons run with a high level of privilege, and naturally spawn off legitimate highly privileged child processes. A good example is the sshd daemon, which handles SSH connections to the machine: whenever the root user is permitted to log on the machine through SSH, sshd is legitimately able to fork off root-privileged interactive shells and commands.

In both cases, the solution for detecting malware activity is to raise an alert whenever a privileged process spawns off a child process with lower or equal privileges. We can consider privileged any process that runs with a UID smaller or equal to N: on many servers, N can be 0 (root); certain specialized Linux systems (e.g. Android) attribute important privileges to non-root users up to UID = 9999. This effectively captures any set-UID program execution (we whitelist the common legitimate set-UID programs su and sudo), as well as any privileged process spawned from a privileged daemon. We obviously advise IT administrators in customer organizations to minimize the number of uncommon set-UID programs and “root-spawns-root” processes (which result in false positive alerts). Our strategy of choice is to locally whitelist the latter processes that the customer insists on deploying on his servers.

Exploiting the operating system kernel

While many hacks leverage vulnerabilities of user-mode processes, others attack kernel bugs to get execution therein. Since user-mode processes are a much easier place to implement common computing tasks, such as TCP/IP communications, kernel exploits often look to raise the privileges of a shell (or other program) already being run by the attacker. In the Linux kernel, this is rather simple, as the address to the list of task structures is made public to all user mode processes. Once simply has to walk this list and set the user and group of the attacker’s shell to 0 (root).

However, such an exploit breaks an important rule of the lifecycle of processes: a process can set its own UID in order to lower its privileges, but never to raise them. By setting a process’ UID or GID to a lower value than it already had, the attacker poses an uncommon gesture that is easy to look for. In this case, SNOW keeps a running snapshot of all processes and the user and group each is running as. It updates this snapshot at randomized time intervals, and when it does, it compares with the previous snapshot: if any of them bears a lower UID than it did previously, this raises an alert. Since the lowering of a process’ UID is not a legitimate metadata shift, this detection heuristic generates no false positive alert.

Summary

In essence, looking for privilege elevation requires tracking two things on a system:

the apparition of any child process with UID/GID <= N;
the lowering of the UID/GID of any running process.

Heuristic #1 is the only of the two that can generate false positive alerts. Most of these may be eliminated through more security-conscious system configuration and administrative processes. The few instances that cannot be discarded are hence easily whitelisted, but should be clearly identified as known potential security holes.

Detecting Malware Through Process Chain Analysis

Benoit Hamelin — Thu, 23 Mar 2017 02:53:11 +0000

Recording of the webinar right away! The webinar is recorded from here so if you lose your internet connection or need to step away from your keyboard, don’t worry we will send the recording out once the webinar is finished.

Watch a video version instead

I’m Benoit Hamelin, CTO at Arcadia and I’m standing in for Justin Seitz today. Basically the point and the goal of this webinar is to discuss process chains and how we can use them in order to detect malware so that we can respond to data breaches in good time. Let’s start with a short introduction to what I mean by process chains. It’s no secret to most users of modern computers, oh but I’m getting ahead of myself, maybe just a little plan for the meeting today and so my intent is basically to be chattering away like this for about 20 minutes and then will have a 10 minute and more time allocated for questions and answers. I will also be available for a half an hour past the beginning of the webinar, this means I will be available up until 2 pm EDT, and therefore just a bit of chatter on my end, just a bit of courses and then you drive the show by asking the questions.

I was talking about process chains and I was saying that basically it’s no secret to anybody familiar with the usage of computers or servers that programs are started by other programs. In a more technical parlance, processes are started by other processes and this who starts whom relationship is basically the note that in terms of parent-child relationships. There is an ancestral relationship between processes and walking on these ancestral links we can identify process chains basically built by a process related to all its parent processes down to the initial user mode process started by the kernel when the machine boots.

What is this with process change that can help us fight malware? When you run a computer you’re using a workstation of your own or you’re running a server that deploys a certain application or set of applications, the programs are generally started in rather regular patterns. If you take a look at process chains on a certain machine you can identify common process chains. So what happens when this machine is infiltrated? Malware will try to leverage the spawning of or forking of new processes. Why would it do that? Well, often your user mode processes will be running into certain sandboxes, so by exiting to another process malware can exit the sandbox. It will also do that in order to elevate its own privileges by pushing a computation unto privilege process. It can also use that in order to bootstrap code injection or just to run its malware activity away from the site of intrusion. These various activities that malware does, involve process change that do not follow the typical pattern on the host. It generates what we could consider anomalous process chains. This begets a detection heuristic stated by whenever we detect anomalous process chains we should be investigating the machine.

In order to make sense of this principle let us look at two cases of anomalous process chains that I have set up on this demo account on SNOWboard.

The first case concerns a Windows machine where we can we can see that malware has typical process chain patterns all its own and that is quite distinct from legitimate processing chain patterns. Here, let me open the investigation and now that I think about it I had it open here in full screen tab.

When we think about malware and the way it can show itself in terms of process chains, what do we think about? One of the very typical process chain patterns that malware does is the so-called drop execute pattern, by which a certain piece of malware will intrude through say a browser or PDF reader and the shellcode will contact in-situ server get a binary and write this binary to file just to execute it right away. This is the drop execute pattern.

Watch a video version instead

Another thing that it can do is to infiltrate a common program, then just spawn out of it by injecting a certain piece of code or by spawning an interpreted program through a shell of some sort. Very typical examples are once again the web browser, acrobat reader, Macromedia flash or the dreaded Microsoft Office macros. In principle, generally, these guys these very common workstation programs they are not supposed to be spawning shells. They’re not supposed to shell out.

Let’s take a look at this investigation here. A single lead has been generated for this investigation for this Windows machine that I control basically. So what happens on it, well the summary is that in a sense when Word.exe has a child process three generations over of WSCRIPT.exe. W.script exe as you may recall, is a JavaScript interpreter that lives in the windows system 32 directory.

Let us go over the details. In essence the root program of this process chains is a Microsoft Word, the WINWord.exe executable. When Word has a parent but we don’t really care about it, we care more about its children processes. Its first step child is something called AppData\Roaming\Microsoft\Word\MSword.exe.

This is something that spells out like a Microsoft Word executable, but that has been deployed in the user Foobar\AppData\Roaming directory. That does not look good at all. If we dig further, this MSWord.exe program has itself spawned a child process and that child process is the WScript.exe windows script host binary and we even see from the command line that it’s running a JavaScript process also deployed into the appdata roaming directory of the Foobar user. This is where things get really bizarre, it does that from the Xeyqyn directory and the name of the script is fuarh.js. I don’t know about you guys, but I know a few foreign languages and I’m pretty damn sure that seeking is not anything foreign at all. If anything, it’s probably the result of generating a random directory name.

Same thing with the fuarh.js word. So very likely the javascript file has been written there in a randomly generated directory in the AppData\Roaming directory of user Foobar. Now random directories and file names are common in the temporary directory on Windows, but not quite that common on C:\users\Foobar\AppData especially if it’s been started by this weird MSWord.exe binary that’s also been deployed into C:\users\Foobar\AppData. To me this smells really really bad. So, how do I go about investigating? If we are running a JavaScript file and where Microsoft Word is not supposed to be running a JavaScript file it should be interesting to just go grab this javascript file and see what’s in it. The first thing that we should do is open the command, actually we don’t do it that way, rather we select this and indicate that we want to get this file in this host.

Watch a video version instead

This takes a short while because this machine is not in constant contact with my interface, it beacons only at randomized regular intervals so this would take a little time if I were to send the command but as an all great cooking show, I already have done it beforehand and what I got from it is I actually got the file in the previous execution of this piece of malware and I got it from there. I’m then able to download this JavaScript file and try to reverse-engineer its purpose.

I could also do the same so if we go back to analyzing the lead, we saw that there were two levels of weird processes going on here. The first one spawned by WINWORD.exe and another one called MSWord.exe, I would also want to download it. In addition, if there is a smell a strong smell as it happens here of malware being deployed on this machine. I would get a full snapshot of everything that happens on this machine. How do I do that? I go to commands and I use the update info command, I just type in the investigation and host ID here, so the investigation is 677, the host ID is this, and then apply and the result is as we saw here, a bunch of fresh information about what’s going on on this machine.

So I get a full snapshot of all processes running on the machine, refreshed host information, list of services, list of drivers, and what kind of tickles me right now, a list of autoruns.

I’m not a big windows expert here, but what I can tell you is that every other autorun key on this machine is about stuff deployed on C:\windows\something\something except this one here which is also deployed into the AppData subdirectory for user Foobar. This again smells bad especially given that it’s something that calls itself as svchost update, I’m pretty sure that Microsoft whenever it updates its svchost will not go through the AppData local of some user. I would likely grab this as well and reverse-engineer it and find that this is actually a stage 0 component for the turla malware. If I go and take a look at these guys, I see here okay something about VMware, something about our ISO, Userinit, that’s smells good, stuff in when logon shell, isolated commands, yada yada yada, very common stuff, alternate shell, this is probably something legit.

In order to analyze these autorun dumps more quickly, we’re also working on a way of computing a diff of autorun snapshots between a previous version which presumably would have happened before the malware deployed itself and this version and this diff would likely show me this one entry added and would also tell me an idea of the time frame at a which when this new autorun has been added. That completes my investigation of the first case.

We saw that certain, not certain, but malware in general especially on Windows have a very common process chains of their own and so singling out those process chains help in detecting the threat.

The second case I wanted to go over, and I see that I will over spend my time, so please feel free to tell me to shut up if you really want to ask questions right now. The second case is about the fact that often especially on unix and linux systems, malware forks a new process in order to elevate its privileges. So here we’re confronted with a much simpler example. On unix there is such a thing that we called setuid programs. So what do they eat in winter? Typically when you run a program on unix or on linux this program runs with the privileges of the user that starts it. If I run an editor program this program runs at benoit hamelin with my limited privileges, but setuid program accepted this move instead of starting with the privileges of the invoker of the program, they start with the privileges of the owner of the program files and since most program files are owned by root on any given unix system, setuid programs owned by root will effectively start with root privileges even though they are not started by the root uses. This means that when these programs start they provide a free privileged elevation to the user that starts. In general it’s very well known that setuid programs are a huge security hole because many of these programs are more complex than the analyst or programmers will bother to remark and from this complexity they can be exploited to run arbitrary computations with root privileges. In order to make sense of what happens with this setuid program, let’s take a look at this investigation now.

Watch a video version instead

What I see here, let’s close those keys, we see that I have a parent process – a simple bash script that I was running on there and what did I do there, I just type the name of setuid program. Now project yourself as a hunter who just got an alert of this type to investigate. So it’s not me, it’s somebody that says okay, this user they’re called hamelin has been running this program here and this program has user ID 0, whereas its parent has user ID 1000.

If we take a look at usernames 1000 means that this user named hamelin was doing something and all of a sudden his computation runs as root. There’s two possibilities here, first possibility is that this guy is running a privilege elevation exploit. I would want to check out the content of this test privilege program.

The other possibility is that he’s running a setuid program. The first thing I will do here is simply go check the rights with which this file has been done and for that I sent a simple ls-la command. We see from here that yes, this file belongs to root and yes, from this little S here by the RW we see that it’s a setuid program.

This is a program that could be used against the owner of the work station or against the owner of the machine in order to elevate privileges if it can be exploited. Two things I want to do here first is I get a copy of this file to go and audit in order to you know if I can find an exploit for it myself. If that’s so, I will sound the alarm to IT that they have to take it out right away. In any case, I will go and warn IT of this security hole and suggest a better approach for providing elevated privileges when running this program just to certain users or under certain circumstances by using things like the pseudo program or something else like this. That also concludes the study of this second case.

In conclusion before I start taking questions, I want to do point out again that the typical usage of computers will result in common process chain patterns. Your malware will tend to generate off pattern process chains and we must look for these. What do these things look like? They will look like a typical parent processing like in our MSWord example. They will look like for instance, signed processes that spawned unsigned programs. We did not see an example about this. Another thing we can see is child processes having higher privileges than their parents processes. This is also something that should raise your attention, you should investigate as soon as you can. Right now we’re also working on an artificial intelligence engine to identify the most common process chains on all machines that we monitor and eventually being able to detect anomalous process chains more generally than we do already.

All right, thank you and on this I will start taking your questions.

Watch a video version instead

How To Test Malware Detection Capabilities

Charles Samson — Wed, 22 Mar 2017 01:29:56 +0000

From a software quality perspective comes the idea to verify our system detection capabilities. More specifically, we aim at verifying that the overall system is able to detect malware and hacker activities from the SNOW sensor installed on a host to the SNOWboard Command System used by the hunters. It’s a simple idea, but not so trivial.

First, a “detection capability” is not a feature of a single component, but is an emergent property coming from the interaction of many components. This constrains us to create an end-to-end infrastructure dedicated to testing. Second, there are some “unpredictable“ behaviours in some components induced by deliberate randomness and communication latencies between the components. This limits how far the testing environment can be fine-tuned.

Even without these concerns, how can we test that the system is truly able to detect malware and cyber attacks?

In fact, it’s not the malware itself, nor the hacker’s presence that are detected, but their behaviour. In simple cases, a behaviour is clearly suspicious. There are well-known malicious techniques, such as process hollowing, that are easily recognizable. In more complex cases, many clues must be combined together to expose a malicious intent. In any case, to test our detection capability, we must recreate scenarios we expect to stimulate our SNOW sensor and centralized detection algorithms.

To achieve this, we can use real malware samples, provided that we remain very cautious. The malware must be isolated so as not to invade our testing infrastructure. Also, we must clean up the damage these samples wreak using virtual machine snapshot capabilities, for example. Alternatively, we can use custom made harmless malware. Since it is the behaviour that is detected, it is not mandatory to cause real damage. Such pseudo-malware can be self-cleaning, avoiding the need to use snapshots. In practice, both real and harmless malware are valuable.

In the analytics central, we want to detect if an alert of the expected type has been raised within a reasonable time range. An easy way to proceed could be to search in the database. However, it does not guarantee end-to-end detection. Instead, we communicate with the SNOWboard RESTful API, retrieving the same data as used by the SNOWboard Command System.

The whole process is automated. Having malware and pseudo-malware in a pool, our automation platform can launch them and check if the alerts have been raised without any human intervention. A notification is sent when a problem is detected.

In summary, in order to test our detection capabilities, we prepare scenarios that act suspiciously and execute them on a dedicated, isolated testing infrastructure. After the maximum reasonable detection delay, we parse the same data that hunters request to check if the expected alerts have been raised. That way, we manage the quality of our solution and we ensure hunters have the information they need to provide a high-quality service.

Defending From Endpoint Agent Disablement Cyber Attacks

Benoit Hamelin — Fri, 10 Mar 2017 01:34:17 +0000

When actively monitoring endpoints to detect signs of cyber attacks, preserving visibility through the endpoint sensor is crucial. A likely attack scheme for malware stops the sensor process, does its malware deeds, then restarts the sensor process, or even leaves it dead. However, losing connectivity with a sensor is a likely event due to various systems actions and outside circumstances. This article discusses ways to distinguish various scenarios in case of endpoint sensor connectivity loss and figure out when to sound the red alert.

Houston, we’ve lost contact

The first step when trying to figure out why contact was lost is to examine all possibilities. Here are the likely scenarios:

The agent process is attacked:
1. it is killed.
2. it is suspended.
The agent process crashed.
The machine shut down
1. The machine was properly turned off (using OS-specific tools).
2. The machine has had software, hardware or power failure that brought it down.
3. The machine was virtual, and was interrupted.
Network failure…
1. … affecting only this one host or subnet.
2. … affecting the whole organization being monitored, or the Internet route between the organization and the remote analytics database where events are monitored from.

A few simple measures can help distinguish between most of these scenarios. Let’s start from the bottom of the list and go up.

Communication failures

Scenario 4b is easy to distinguish from all others — connectivity is thence lost for all other machines within the same organization. A phone call to the IT staff is in order, but sometimes the Internet is just that tough to deal with. So, it comes down to diagnosing a single machine going incommunicado.

The way Arcadia’s endpoint agents will handle this case is by having them in contact with one another across an ad hoc peer-to-peer network deployed on the customer’s network. In other words, each SNOW-defended endpoint within a LAN maintains a TCP connection with a subgroup of others (the size of this subgroup depending on the machine’s purpose and resource constraints). Should any of these TCP connections fail, its peers know it instantly and can report that fact back to the central analytics cloud. This set of peer-to-peer communication features are under development as this is written.

In addition, the agent is aware of its host’s connectivity problems. When the machine regains connectivity, it reports on how long it was down, so that we can corroborate this information with what was reported by its peers. This way, if an agent disablement is being disguised by the attacker as a network failure, there will be a gap in the offline time interval during which the agent was not alive. This is enough information to warrant deeper a investigation, hence to raise an alert.

Machine failures

Obviously, machine failures get detected as communications failures. However, many such failure modes can be documented using supplemental clues. With scenario 3a, the agent gets a message that the machine is being shut down, and is typically given time to stop of its own volition. It can take advantage of that moment to report a final telemetry beacon indicating the situation.

In scenario 3b, the agent is forcibly restarted when the machine goes down. Additionally, the uptime counter for the machine gets reset. Therefore, the restart of the agent reports the uptime counter: if it’s low enough, the diagnostic is complete. Alternatively, virtual machines (scenario 3c) being interrupted look like network failures, until they are resumed. When this happens, there is a gap in the telemetry stream that corresponds to the duration of the interruption. So when the agent beacons again, without any indication to being restarted and with no telemetry buffered within the realtime interval, we understand the machine was virtual and was interrupted. Further indications that the machine is virtual (such as the presence of VMware Tools or access to AWS local queries) confirm that the machine is virtual, further supporting the diagnostic.

Agent crashes and attack scenarios

The SNOW endpoint agent is programmed by mere humans, so yes, certain situations get it to crash. There! I said it. Two measures facilitate the detection and remediation of crashes.

First, agent processes are registered with the operating system so as to be restarted as soon as they go down. This way, following a crash, the restarted agent reports immediately to the central analytics cloud, carrying the forensics information it accumulated before the crash event.

Second, the endpoint agent is actually composed of two processes (both registered to the OS, as described above) tracking each other’s life cycle. If any of these two processes goes down, the other reports the fact over to the cloud, and repeatedly attempts to bring it back up. Therefore, effectively disabling the agent requires to bring down these two processes. For any one of these processes to crash is odd, but possible; for both of them to crash simultaneously is highly unlikely, enough to raise an alert regarding the possibility of deliberate termination.

Attack scenarios

The last point makes it clear that scenario 1a cannot go undetected: either the agent is restarted right away by the OS or its watchdog, so that either the malware actions get recorded despite the attacker’s intent, or the attacker generates so much noise that he raises the attention of hunters. Scenario 1b still is much more pernicious, as its fingerprint is very similar to that of scenario 3c. That said, some heuristics can work in our favor:

Non-virtual machines cannot be interrupted and resumed (a machine being put to sleep sends a message when it goes down and when it wakes up).
When a VM is interrupted, it is typically for a rather long period of time. If the attacker suspends the agent processes during mere seconds, this should raise an alert as an abnormal VM interruption pattern.
Peers within the local network, as discussed for mitigating communications failures, send ping-pongs at randomized regular intervals, and keep statistics of how quick the response comes back. A suspended agent process would not respond within the expected sub-second delay, unless the local network is extremely busy (an atypical condition).

A failure to check any of these heuristics should raise an alert to conduct a full investigation on the target host.

Authentication of agents

The last frontier in agent disablement attack is full agent communications spoofing: an attacker reverses the communications protocol between the agent, peers and central analytics cloud, then responds correctly to neighbor requests and plays dumb with the cloud. This is not an easy attack to pull off, as it takes lots of resources to perform this reverse engineering, and then communicate without tripping any behavior normality heuristic. However, it underscores a fundamental weakness of endpoint protection: up to now, it has been assumed that agents would communicate without any external injection of trust. Therefore, agents can authenticate the cloud, but agents cannot authenticate each other, nor can the cloud authenticate an agent. In other words, the cloud is never sure it is really speaking to an agent, or to a dog. We Arcadians are hard at work on this problem as this is written… stay tuned.

Looking for Cyber Threats Through Statistical Outliers

Justin Seitz — Fri, 17 Feb 2017 23:00:08 +0000

Welcome to our webinar. Today we’re going to talk about looking for threats through statistical outliers and statistical analysis. So a few housekeeping items off the top as some of you might have heard already. Number one, if you lose your connection or you need to step away from this webinar, it will be recorded and you’re going to receive a copy of that recording at the end. Number two, you’re all on mute so don’t worry you’re not going to make any accidental mistakes and talk into your microphone so that the whole world hears it, but that also means that you can use that question and answers box or the chat box.

Watch a video version instead

I prefer question-and-answer box if you need to ask anything or if you’re having trouble hearing or anything and I can address that, so please feel free to use that both throughout the webinar and towards the end when we get more into the questions and answers. We are only going to run this for about 15 to 30 minutes usually again depending on how much coffee I’ve had this morning really depends on how long the webinars will run and more than anything we really want to thank you for taking the time to come today. We know you’re all busy people you all have things to do so we really appreciate your time and we certainly intend to make the most out of it.

As I mentioned earlier we’re going to be talking about doing some statistical analysis using the SNOW platform from Arc4dia. Now how we do this is actually through this interface that you all should be seeing now which is called SNOWboard. SNOWboard is our window into all of the data that our SNOW agents are collecting every minute of every day across our customer networks and we even monitor our own networks and of course we monitor our test network that you’re going to see today.

For those of you who are new I’m going to walk you through a little bit about how SNOWboard works and kind of how the data comes in and how we look at it because that’s important but also going to show you how we move into more statistical analysis and how we can use it as well to look at the network at a higher level and also look at how infections might be spreading.

Hopefully you all can see my screen at this point. If not, we have major technical problems but you should be seeing is this kind of list of investigations and this is exactly what myself and the other hunters on Arc4dia do every single day as well as some of our customers is we get into SNOWboard and we take a look at what’s going on on our customer networks and each one of these is what’s called an investigation that’s pretty easy to understand.

If we take a look at just picking one here, what this investigation gets triggered by something that our agent has identified that looks suspicious. So this could be any number of things. In this case there’s process hollowing, which we covered in a previous webinar encourage you to go back and watch that. We also see some suspicious threads.

This investigation is going to embody a lot of information and it also includes a lot of tools that are useful for both hunting as an individual and hunting as a team. One of the things will often do is we’ll use the logging mechanism where we can actually say “Hey this is Justin, you know I pulled down a binary for analysis and Ida”. That’s going to tell my other co-workers “Hey, you know there’s no point be pulling on this binary, Justin is taking a look at it i’ll put my notes back in and we can have a conversation about what’s going on in this particular investigation.

This is also useful so that customers can actually collaborate with us so they can see what we’re doing maybe even provide input. Lots of our customers are actually really technical people who understand a lot of what we’re talking about. They might also be seeing things on other hosts that they’ll pop in and say “Hey, you know, are you guys noticing this, or I saw this on another host, or I saw something else they’re all alert. This is really useful.

The other thing that’s really powerful when working in this environment, is the notion of clustering. We have the ability as a team to look at a particular investigation and look at some of the data points which we’re going to get into and say well how can we actually save ourselves some time in the future by writing these powerful clustering rules that will automatically say okay if we see this particular process at this particular time on this host in the future we want to pull that into this investigation so that this investigation can kind of become the main point for investigating similar pieces of information. This is really powerful, this is so that we’re not duplicating work continually investigating the same things over and over. This is exactly what this clustering tab is for and again we’re probably going to do a webinar just on clustering at some point in the future.

The commands panel here you’re gonna see a bit of it in action later but we have the ability to speak to any one of our agents in the network and get them to perform tasks for us. This could be anything from retrieving a file, to retrieving memories, or performing other forensic operations and this is really powerful because it allows us to kind of expand our intelligence collecting and say okay if we have something going on this host we want to pull back some evidence, we want to collect some things. We also have the ability to trigger some of that stuff automatically, which you’re going to see very shortly.

The next thing down is the leads list. Just like if a police officer was investigating a homicide, you have an investigation which is the actual murder itself and then you have the leads that are coming in which are all the pieces of information and people that contribute to that investigation. As much the same thing and SNOWboard, that is kind of logically how we think is humans and we’ve applied that here. We say, okay we have these events that have happened around this time on this host, let’s pull them all into this investigation so we have a deeper view of what’s going on on this particular machine around this time.

We have the ability to look at these leads individually and we can see that each one of them has slightly different information. We see process following on the first three or four leads and then we see a suspicious thread found at the bottom. By clicking on either one of these, what were able to do is actually see highly detailed technical information about what is going on and why that lead was included in the investigation. What this means is that we have looked at “okay what is going on here and what is relevant” and so we also have the ability then to capture this information and begin to say well what else can we look at around this time or what else can we look at from other tools.

If we look at the top here we actually have the ability to also do some automated forensic collection. This means that when this particular lead was brought in it automatically triggered some follow-up friends of work to happen and this means reading some process memory to extract this particular thread that was injected. So we’ve actually extracted the code of memory so that we can now do analysis on it.

We’ve also triggered a process memory map, which is gonna tell us here is what the memory layout look like for this process at the time that this was triggered. This is really useful for us because it gives us that information, and as you know malware has the ability to clear memory or kill itself and do things very quickly and so we want to collect as much evidence and as much intelligence as we can, as fast as we can, so we have all the information we need in order to do some analytical work because once it’s destroyed it’s destroyed, and so that is really key. All of this is rolled into this lead.

One of the other powerful things we have the ability to do is say well if we have this particular event that’s occurred now, what’s happened around the time of this event. By clicking on these little arrows right here we actually pull up a really neat thing called the event stream. This is all of the events that agent has collected around the time that that particular lead came in.

You can see that we’ve triggered some there’s some TCP connections, we’ve seen the thread injection here. We see some more TCP connections. We’re also looking at what is the behavior both before and after this lead has come in. That’s gonna tell us potentially the infection vector so, are we seeing Internet Explorer being run and then we’re seeing some Russian malware domain being accessed and then we see all of a sudden now we have a thread injection and a bunch of suspicious stuff happening after that. It helps us to work backwards to figure out “okay where did this happen, what occurred, are we seeing a known vulnerability that’s being used one against one of our customers or did they download a binary that they shouldn’t have?” That really helps us to figure this stuff out and this is a really powerful and useful feature again we could probably do an entire webinar just on the event stream but we live and die going through our investigations looking at this stream to figure out what’s going on around the time of that infection or the time of that suspicious behavior.

This is kind of the main investigation view, again this will give you kind of the basis for where a lot of our investigations start, but today of course, we are looking at how do we look at the network or look at host at a larger level and say what is a statistical outlier or can we look at things that are suspicious just based on the fact that their reputation count is low, meaning we haven’t seen them around a lot, we haven’t seen a particular binary get executed a lot around the network.

How we do that is through the host view and I already have a host few loaded up here and so this is an actual host view for any one of our agents.

We have the ability to pull this information up for any one of our agents across the entire network. This is really useful because when we’re looking at infections or potential outbreaks, we also going to stand back and look at what are they targeting a particular windows version. In this case it’s a windows 7 machine, are they targeting a particular subnet inside of the network, is there a way that it can be contained that way. This host information although granular, meaning we can drive down into one host, it also allows us to kind of look at that 30,000 foot view and say are we seeing commonalities or patterns or are we seeing an exploit being used against only a certain version of windows or a certain set of users and this is really useful.

The other thing to note, is that our customers actually have the ability, when our agents are deployed to segments networks based on utility. So inside of our SNOW system, we have the ability to say anything in this particular subnet is actually the executive team, so it’s the CEO, the CFO, the COO and we can segment those away from the development teams, away from the servers, and so that also allows us to look at what’s going on in a particular subnet, what’s going on with a particular group of people, and those logical divisions are really important when you’re looking at kind of higher-level statistical analysis of threats.

If we start to take a look at what’s available inside the host view, one of the cool things is we have all the latest events from that host. It is literally a running tally, a running audit trail. Here’s all the things that’s been going on in that particular host, starting with the newest stuff at the top, going oldest down. We can see all the events and that means you know unique DNS requests, we can see that we ran some commands, or we can see a code identity event, which means that our agent is actually looked at a binary and extracted its code signing certificate to take a look at it. This is really useful for us because we can see kind of a high level view of what’s going on that particular machine.

We can look at the event stream again, which is what we looked at more specifically to an investigation. We can do the same for a host, or we can begin to drive into the host objects. Inside of the SNOW platform, one of the coolest features is the fact we have this object database. It’s this big database of all the processes, the files, everything we see across an entire network, all gets rolled into this database so that we can actually do analytical work on it and that also means so as you’ll see later, we can query for very specific certain pieces of information so that we can dig into them. This object database is extremely useful and again is something that’s right hunters find incredibly useful when they’re digging into a particular process and they want to say you know where else have we seen this or they want to cross-reference a particular binary against virustotal but you’re also going to see that some of the statistical analysis here is already done for us.

Watch a video version instead

You can already see this very handy outliers link. I have loaded it up in another tab and we’re going to click on that. What the outliers is, if you never heard the term before is basically anything that is statistically outside of the normal. If you’ve ever looked at those graphs where there’s a whole bunch of little thoughts following a line on a graph, and then there’s that one dot up all lonely to the left in the top left of a graph, that’s an outlier, that’s a typical out like Google for a graph outlier that’s typically what you’ll see. It’s stuff that doesn’t fit the normal baseline which is of course where a lot of us live and die in the security world. We want to look at what is the baseline, what are things that we see all the time, and what are things that don’t fit within that baseline.

The beautiful thing about SNOWboard is actually boils down into something that’s really easy for you to look at and understand. We can look at something as simple as the number of locations that we’ve seen that particular binary or dll and say, hey if we have a thousand machines that are all be actively monitor in this particular subnet and we only ever see this file once on all of those machines, we probably want to go take a look at it because this is statistically unusual.

How we know that is we can actually reverse sort this and begin to take a look at what are things that are common. We can see common DLLs that are being used, we can see common processes the further we scroll down so things like svchost which is a critical windows process that runs on every machine, we can see hey there’s you know over 1,300 examples of this particular process running on the subnet that’s not that suspicious. This is really critical and this is often where we will start when we’re looking at what’s going on a host from an outliers perspective. Is anything that’s got a low number of locations in it, is a potential hit for a piece of malware or other threat that could be analyzed.

We’re looking at things like, you know while the chrome updater, this was only executed twice could it be for example that this is just a brand-new version of Chrome that’s being rolled out well.Very easily have the ability to query the database as you will see very shortly to actually look and say well what is this binary, where we’ve seen it before and what’s relevant about it. This is one of the very first things in the fundamental things we do when we’re looking at statistical analysis is the outliers and this is really critical.

What else can we actually look at? Even if we identify that one of these things look suspicious or hey this low number of execution counts. For example, we see malware demo windows, if you ever see malware demo windows running on one of your machines you typically want to go look at it, unless you’re doing one of these webinars. What else can we take a look at? Another thing we can look at are the autoruns. Anything that is going to do some form of persistence is usually most commonly going to target the autoruns keys in the registry and say okay every time this machine boots up or every time this user logs in, we want to make sure that our malware continues to run, that’s really critical.

When we’re looking at autoruns again, we see scrolling down we see things that are really common, we see things that are starting up all the time, and then looking at this particular host we see that there’s some things that look pretty strange. When there’s only for executions out of a network of a thousand-plus machines, that’s pretty strange to see something in the autoruns that is only available on four machines. Again, if you have some logical separation your network, maybe it’s not that abnormal if you only have four machines because then it wouldn’t mean that that statistically irrelevant, it’s a hundred percent of them. In this case we have over a thousand machines, it is absolutely something that stands out.

What can we actually do to say okay what is this thing and what’s going on? The cool thing is we can just highlight this path and you’ll see this little fly out come up down here, we can click on the bullseye, and say search and objects.

I’ll close our little thing here. When this is done is actually query the database and said okay we want to know more about this particular path.

We actually have a couple of things that have come back from the database. One is a file path, so it’s saying hey we recognize this file path from somewhere. We can look at it and say okay there was four locations, interesting, we don’t have any associated parents of children for this particular file, but then we can actually see the hosts where this particular path showed up. This immediately is going to for us hunters, we’re gonna look at this and say uh oh, there’s four other machines here as it showed the location, we should probably include these roll these into the investigation because they too may have some suspicious behavior going on or they might already be in our investigation list we want to roll the stuff all together into one larger investigation. This might mean that we have a bit of an outbreak on our hands, or again this was for executives could also mean that there’s a spearfishing campaign that has been successful against all four executives and we want to go take a look at it. This is really useful that were able to query this database, take a look at it and say, well what is this thing, where we finding it and that also gives us the ability to go and take a look at what else we potentially need to do to contain the threat.

Same thing with autoruns, being able to click on that object in the database, look at the autoruns information and say, okay where else is it the auto runs? We see it’s the same for hosts, so not only are we seeing the same file path showing up, we’re also seeing that it’s also persisted, so it’s also inserted itself into the auto runs on these four hosts. We probably want to go take a look at those four machines double hard now because this is really really suspicious. Once we’ve reached this kind of consensus that something very strange is going on, how do we actually begin to react to this? What do we actually do for next steps?

As I mentioned before one of the great things that we can actually highlight this path and we can use it and just say, okay we want to go and retrieve this file just by using our commands box down here, I’m not going to do that today. We reach out and can grab that file back, we can meet of course, depending on the customer, we can begin to work with them to say. hey I think there’s an incident here let’s start talking about implementing your incident response plan, assuming you have one, hopefully everyone listening here today has one, or a number of other things. We can begin reverse-engineering, taking a look at that particular file, we can also begin to look at other things that are going on that host.

This is where we will also look at okay well what have we seen any weird domains coming up in the last while or at all time that looks suspicious or don’t seem to fit, yet there’s some weird stuff we’re seeing here what is going on? Often will also see stuff like people trying to access tor onion, like hidden services through a regular web browser and were often alerted like that’s really weird what is this, what is this person up, when they’re it’s nine o’clock in the morning at their day job. Stuff like that that were able to actually look at and say, well what other things are actually contributing to the overall kind of threat or the overall reason why we would raise kind of the alert level for this particular organization or these sets of hosts.

This is the great thing is that we can dive down into any one of these things on a host, begin to do some statistical analysis on it and then bring that back into more of the investigative viewpoint, so we can always start from an investigative point and work our way up. or we can start from a statistical standpoint and work our way down to an actual investigation. Often what we’ll find is that customers tell us, you know here are critical machines or they might even put them into that logical subnet and we’re able to say, okay you know let’s go out and just start looking at outliers and this is something myself and other hunters on the team will do and we’ll look at it and say, okay well here’s some interesting outliers on these machines, you know what’s interesting or not? We’ll hear this is a weird powershell execution that we don’t see anywhere else, let’s take a look at powershell script and see what it is, ok it’s a homegrown system men you know who’s a written a thing to kind of purge a mail queue or whatever it is. We have the ability to immediately look at those things and start that investigation from 30,000 feet and begin to drive downwards into okay how do we actually acquire some evidence here begin some reverse engineering or looking at you know what other what else is going on that machine.

That’s kind of the beauty of having this data at your fingertips is that you can work from both ends something has been triggered an investigation, work your way up to let’s look at the statistical analysis of this to see if it’s spread or to see if there’s other things that are causing it. You can also start from the other side and work down towards opening an investigation begin rolling some leads together to figure out what’s actually going on.

Watch a video version instead

Responding to Cyber Incidents

Justin Seitz — Wed, 18 Jan 2017 03:00:08 +0000

Today the topic is responding to incidents, which of course could be a 10 day class or longer, but we are going to try and distill it down into about 30 minutes. What I’m going to talk to you a bit about us hunters at Arc4dia approach investigations, how we use some of the tools that we have that Arc4dia built to respond to these incidents to investigate them and work with our customers to try to contain the incident.

SNOWboard is the interface to our agents that are deployed in our customer’s networks. This is where we, the hunters, spend all of our time looking at what’s going on in our customer networks, what investigations are happening, what kinds of stuff we see going on, and of course anything that looks interesting.

Watch a video version instead

Now today what we’re going to do is focus on a particular host that has had some interesting stuff going on. I’m going to walk you through why we would determine this to be interesting of course this is a piece of malware that we custom build and deploy that actually helps us to train people and helps us to make sure we know exactly how to respond to these things. That’s also what you get with the Arc4dia team, not only defensive mind of people, but offensive mind of people. People who have deep experience working on offensive operations and looking at these things from the other side because it’s absolutely critical to understand how an attacker thinks or how they’re going to write malware and even to determine potential next steps they would take.

We see here this is our investigation list. So, every morning this is exactly what an Arc4dia hunter is going to be looking at, although much larger – this is a demo network. We are going to start looking at things that look potentially suspicious or things that look out of place.

Looking at this host here, and when clicking on this investigation is going to open up this particular screen and this screen is actually our investigation view.

An investigation is made of leads. So just when you see a detective on TV investigating a murder, that murder case is going to have leads that come together to give context for entire investigation. This is exactly how we structure SNOWboard to display information to us that we have leads that make up this investigation.

One of the cool things we have the ability to do is collaborate with one another via log. We can actually work together to write messages, say it’s time to reverse engineer this thing we found, we can leave notes for one another and our customers. This allows us to all work collaboratively as a team to investigate these incidents.

Further down into the lead list. Starting at the bottom, take a look at what we’re seeing. We have some information on the left that’s saying at this particular time on this host we had one suspicious thread that was found. This is on its own, could be potentially malicious, it could be something that actually completely benign, but the important thing is that we want to dive into the technical information. So, we are not interested in looking at fancy pie graphs or other information that doesn’t give us context. We’re interested in the raw technical data that’s going to allow us to make judgement calls based on our experience as responders and as attackers to think about what does this actually mean.

We can take a look at the information on the right hand side and say Ok, there’s a suspicious thread in vmtoolsd.exe.

VM tools is a process that is used to help make your virtual machines run a little nicer when you’re using them. This is suspicious that we see something being injected into vmtoolsd.exe.

We can see some pretty technical information here about what was injected, how big the memory was of the thread, what process it was, the threat ID, and where this thread was actually started – the base address and start address of that particular thread.

What we’re also able to do is, which is critical in an incident, is look at the context around when this thread injection happened. We can do this by clicking the fancy arrows that will bring up what’s called the event stream.

The event stream allows us to take a look at what’s going on before this event and what’s going on afterwards. You might see someone spawn Internet Explorer, then shortly thereafter there’s another process that gets executed and then you see a chain of events happening after that. This is really important because without context it’s extremely difficult to respond to an incident because you are kind of operating in the dark. This little tool here allows us to very quickly scroll through and see what’s going on. This is a very detailed view.

If we want only wanted to, for example, see all of the new processes created, around this time, we can apply a filter very easily to show us the new processes and we can filter it out and see all the new processes we have in this time range before and after. We can also expand the time range or contract it and adjust it based on how we want to investigate this particular incident.

Scrolling through, we’re building up a picture of what’s actually going on and it allows us to really become familiar on what is happening on this machine at this particular time. Often if you don’t have this information it’s very difficult to make judgement calls because sometimes you will see things that might look suspicious initially, but after you dig into them and look through things that are going on around it, it’s really not that suspicious. There’s a lot of genuinely non-invasive software out there that does sometimes behave a little suspiciously.

Scrolling through and some point we see this THREAD_INJECTION occur.

We also see some other events like a READPROCESSMEMORY, and will get into that. That’s part of SNOW’s automatic forensics capabilities that’s going to automatically a hunter or an analyst by collecting information for us.

We can see the injection happens here

We can see that we trigger a SUSPICIOUS_FILE_COLLECTION that we’ve pulled down this particular file again.

We see some HOLLOWING after that.

If you were on our previous webinar, we talked about process hollowing where there’s a piece of malware that actually hollows out a chunk of memory in a process and shoves code in it and executes it.

We can see some other stuff like a TCP_CONNECTION

We’re starting to build a picture around this time that something definitely looks suspicious, something is definitely up with this machine.

When we’re talking about a machine, we’re talking about hosts in SNOWboard. What we would also want to do in this case is say, what else can we learn about this host to help aid in this investigation.

Clicking in the box, the host view will pop up.

Any host in the network that we have the SNOW agent deployed on, were able to dive in to see what’s installed on that machine and what investigations that machine has triggered. We can also take a look at things that are more detailed like what processes we see running, unique modules (dlls or binaries that are actually executing on that machine). We can see the installed services, which can be really useful when you see something that is using a Window service for persistence.

Often we’ll also see that there are things being installed in the autoruns of that particular machine.

This is where malware will use registry keys or other techniques to gain persistence so when you reboot the machine the malware is going to get restarted so that it can continue to speak back to its command and control or it can continue to suck up keyboard keystrokes or whatever it’s designed to do.

Clicking on that view actually has got a lot of data that might not be apparent at first. The first thing we always look at is the column on the left hand side which is the number of locations that we’ve seen this particular file being executed. When we see this and we see a very low number like four out of hundreds of thousands of deployed agents, that tells us that something is definitely weird with this that we haven’t seen this very often and we’re seeing this now on this particular machine.

As you scroll down, the reputation of these binaries is actually going up. That’s exactly what are number of locations is. It’s effectively a reputational score telling you that the more times you’ve seen this thing, the more likely it’s OK or that we can easily explain where it comes from. The lower the number the worse it is for that particular host.

Immediately we’re looking at this and we’re saying, so we have this strange thing in program data. Program data is not a standard place that we see VMware installing things so this looks a little suspicious.

What we are able to do is then query our own database to see what we know about this particular process and can we learn from other hosts.

Watch a video version instead

One of the cool things we have is the ability to highlight any binaries we want, in the context menu saying search inobjects. It allows us to search this vast database we have of processes, modules, and file names. It will come back and tell us if we have some records for that particular object. We’ve only seen this process a couple of times on a couple of hosts. We can actually look specifically at the file name. This is also telling us that there’s more than this one host that may have been potentially infected or have these suspicious processes running. This is something that will tell us there’s more context here. We would probably want to make a note in this investigation to add these hosts also have this exact same processes, maybe we need to hash the files and see what’s going on.

This object database is extremely powerful because it allows us to easily drill down into what’s happening on that machine and look at exactly every binary that’s involved in this investigation, every process and say where else is it located and have we seen some of these things before.

If we were looking at this particular file we saw it was pretty rare and it wasn’t located on a lot of other machines. We can also see part of the investigation was vmtoolsd.exe as we saw in the leads. We can see it was executed in 252 different locations. This is also telling us some information that potentially this piece of malware is injecting threads directly into vmtoolsd.exe and then is attempting to hide itself as a vmware product in a different directory and in the autoruns of that machine.

What can we do from a response perspective? One of the things we can do is to reach out to that machine and say we’re a little suspicious of this, what we want to do is grab that file and make sure we have a copy of it. By highlighting the file path and then selecting the hand, this gives us a small list of commands that we can run. We have a larger list of commands that I will show you shortly.

This allows us to very simply in a couple of clicks, grab this file from this host. The investigation here is blank because I’m doing it from the host view, if we were doing this from the investigation view it automatically populates this in.

What we want this to do, is when we task this agent and bring us back this file, we want this file to automatically be assigned to the investigation we are working on so we can keep all the information in one place and so hunters can being doing tasks and sharing the information with each other.

Click apply and that’s going to immediately send that command off to that agent. That agent is now going to work on pulling that file and sending it back to us and assigning it to that investigation. Often this is now where as hunters we’re also reverse engineers, so we’re going to take that file were going to pull it down and run it through some of the tools we’ve developed as well as traditional reverse engineer tools such as debuggers, IDA Pro, or hex editors, so we can begin to determine is this something we’ve seen before, is this one of our customer’s being explicitly targeted, is this a piece of malware designed explicitly for this customer, or is this something that’s part of larger group of ransomware family or some other piece of malware that has been repurposed so that the hash is different. This is also another value added thing you are going to get. As hunters, we’re also skilled at being able to pull apart this code and analyze it to see what’s going on behind the scenes.

We’ve done a little bit of investigation into the host, the other thing we can do is now go back and take a look a bit more information in surrounding what else is going in this particular lead that we can gain some information from. We’ve only really looked at the one lead and there’s other ones we can look at.

One cool thing that really assists hunters and our customers, is the ability for SNOW to automatically collect information when something weird has happened. Very easily we can expand the key at top, and we can see that a few things are already available for us as analysts or hunters.

We see a process memory map, this gives us a memory map of the process in question telling us who owns what chunk of memory, are there any pieces of memory that are flagged that are suspicious or look a little weird to us. We’re able to grab segments of memory with the click of a mouse. If we wanted to pull it back and analyze that memory and do a bit of forensics work on it and see what’s going on with it. This is all done automatically, we didn’t have to interact with anything, this is part of the intelligence of the SNOW system that is automatically pulling this information back for us. It’s saving us time and giving us highly contextual information. As malware executes, if you get there much later and haven’t had an opportunity to run tools or something else has impacted your ability to analyze it, having some very contextualized information at the time that there was something suspicious detected, is really important.

We see the process memory map, another thing that it did was automatically collect the little chunk of memory where the thread is executing. Down here where we see this thread executing, it’s actually reached out and grabbed that chunk of memory back so we that we can reverse engineer it. This is really critical because as you know memory is not persistent, when something writes something to disk there’s a level of persistence that you are able to retrieve. If something writes something into memory and that process exits at some point, that memory is gone and you have no ability to perform forensics on it or do any analysis. Whereas with this system, we able to get that memory right away so that we can come back much later if needed and retrieve it and still look at what’s going on in that chunk of memory. This is really important and important when dealing with incidents are able to do that.

This is critical stuff that allows us to build a much broader and detailed investigation. Working with our customers we are able to decide how do you want us to proceed. Sometimes it’s decided in advance or customers are very pro-active want to be involved.

Now that we’ve identified something weird going on, how can we actually work to stop what it’s doing for now, completely terminate it, or what do we actually have to do to get rid of this. This is going to vary depending on customer’s operating policies and is also going to depend on the particular piece of malware we’re dealing with.

What we can do is we actually have a list of very detailed commands that allows us to interact with that machine. I’ve already pulled up this list of commands – this is accessible through our dashboard by clicking on commands – and have populated the terminate thread command already with the thread ID that we identified as well as the host.

This is actually a real subtle but powerful technique. For example, if we identify that there is a suspicious thread running in a critical process, what we shown and done in the past is we can actually isolate that thread and suspend just that thread and that allows the critical process to continue executing. Sometimes malware will actually corrupt the process and it will blow up and die, other times if they’ve done it right, they will actually have the thread there sitting in memory executing and we’re able to come along and just suspend that thread and continue executing without destroying that critical process or destroying any evidence. In this case, if I wanted to completely terminate the thread, I can signal to our agent that the thread ID I’ve identified is being kind of malicious, we want to terminate that thread completely and get rid of it so that’s it’s not executing anymore.

This is actually a really cool command that we have the ability to do and do this to any process on the machine and we send that off for tasking.

Some of the other ones we demonstrated already such as getting files, we can get some information about the file. Also, if we’ve identified the vmware update file that was in the autoruns, if we identify that that’s a bad file and we want to get rid of it, what we can simply do is “delete file” and we can tell the remote agent to delete the file. That allows us to just completely remove that file off of that machine and we already have a copy of it because we reached out the machine and pulled it back and we already have a copy of it locally that we can operate off of.

In some cases, you will do this if it’s a known piece of malware. Other cases you might not want to do that, you might want to do some network isolation or some other things. This is really driven by the customer, we don’t make decisions on our own, we work very closely with our customers when there are incidents to make sure that they are as much as part of the team as anyone else.

Those are some of the file operations. Some of the traditional stuff like you can suspend a process so we can tell it to hang there while we investigate other pieces on the machine, we don’t want to kill the process, but we want to hang it and continue investigating around it and look around on the machine. We might even want to resume the malware so we can identify where it’s command and control is, or we want to completely suspend it and then decide to kill this process all together and have it completely dead and perform clean-up on that machine.

This is also one of those things that there are many different best practices and no two incidents are the same, but having strong policies helps teams approach these things in a systematic way. Sometimes we’re able to clean up a Stage 0 infection very easily all on our own with a little input from the customer. Other times there’s much more detailed things that have to happen, and we work to write reports that say all the things we’ve discovered about it, and the reverse engineering report surrounding what happened with this particular piece of malware. This varies from incident to incident, but is a very collaborative approach between Arc4dia and our customers and that’s exactly how we handle these things.

Watch a video version instead

Process Hollowing Analysis For Malware Detection

Benoit Hamelin — Wed, 28 Dec 2016 21:42:58 +0000

Following a webinar hosted by my colleague Justin Seitz two weeks ago, we discuss here of the detection of process hollowing, and how this capability may help in detecting ongoing cyber attacks.

What is process hollowing?

Process hollowing is a trick for malware to hide its running computation. In a nutshell, it involves the replacement of the code of a process with that of the malware. Consider its implementation on the Windows OS, where Arc4dia’s hunters typically observe it. A bootstrap program will first start a process in the suspended state. The OS sets up the process structure and loads the program’s code. Then, the bootstrap uses debugging API functions in order to open the suspended process and replace the content of one of its code segments with its malware payload. It then tweaks the hollowed process’ entry point to start through the malware payload. The latter may then fork a thread to run the malware computation, leaving the main program’s thread to run its usual entry point, or replace that main function outright. When done against a little-used system process, this trick can very effectively hide a malware computation.

So, how does a hunter detect hollowed-out processes? On Windows, program code is loaded from files stored on disk; it is difficult to modify or destroy these files while the program runs. Hence, one can scan through the code segments of all processes, comparing the loaded code to that expected to have been loaded from file. Because of dynamic relocation, the code in memory never matches exactly its stored origin, but strong positive correlation is always expected — except when the code in memory has been replaced.

Pitfalls of process hollowing detection

The astute reader will have noticed that the approach described above actually detects modified processes, not specifically hollowed-out processes. The latter is a subset of the former, and program self-modification is unfortunately a very common phenomenon. Many legitimate programs leverage so-called packer systems, such as BoxedApp. Some of these packers decrypt and/or uncompress the actual program code within the same code segment. Such approaches help reduce the programs’ storage footprint, or to conceal their functionality from reverse engineers (who are not fooled by the scheme, by the way). Many online update or self-updating programs exhibit similar code replacement behavior. And all of them trip the detection technique described above.

Therefore, any attempt at detecting process hollowing is bound to generate multiple false positives. As suggested, process code modification is infrequent enough to take interest when it happens, but also common enough not to be a surefire malware identifier. Process hollowing alerts must thus be analyzed **in context** of other events reported from and other alerts raised by the concerned host. Some context elements are of particular interest:

Malware that runs from a hollowed-out process requires a bootstrap program to deploy their code. This program is likely to be the one that provides persistence to the malware. Therefore, taking a close look at programs that start out at machine boot or with user sessions may raise concerns.
Furthermore, the malware may leverage process hollowing on its own bootstrap in order to hamper a reverse engineer’s efforts to unravel its features. Taking a look at the original code that has been replaced may yield surprises. Typical packers have a signature that reverse engineers are good at identifying.
One may assume that malware will either execute from ever-running programs (such as a user’s base Explorer process, or as a service) or from programs run very often by the system or user (web browser, word processor for office machines, etc.). Process hollowing detected on an infrequent program, or on a program with a self-update feature, is likely to be a false positive.
Malware does not leverage process hollowing just for show: this code should be run by a thread within the process. Further detection heuristics that exhibit thread injection should trip, as possibly MZ-headered modules unaccounted for in the process’ module index.

Case study #1: `SC2_x64.exe`

Let us first consider a process hollowing alert generated by looking at a process loaded from the SC2_x64.exe program (namely, the main executable for the popular video game StarCraft II):

At first glance, this looks rather good. Video games often download updated code from the Internet: that it would replace the code within the same process is quite 1337 and surprising, but not worrisome. Furthermore, the tactical interest of hollowing a video game for malware purposes is low, unless the user plays with regularity and dedication — features that this games elicits, by the way, so it may become interesting to ask the user about his habits. We dig in SNOW’s object database to see how often we’ve see this module:

So we’ve seen this executable only on this host, which is perplexing. However, a quick check of its hash on VirusTotal shows that this is the real StarCraft II client, a well-known signed executable. Therefore, it is very unlikely to be a self-hollowing bootstrap; the hypothesis of a self-modifying game code is reinforced. We then look at the list of autoruns (programs made to run when the host boots up), which reveals nothing out of the ordinary. Finally, we look at other alerts raised for the host under investigation:

The process hollowing alert was raised on December 27th; the previous alert for this box goes back two days before (simply an object never seen before on other systems), and nothing more has happened since (at the time of writing). There is thus no context clue for worrying about an attack. Conclusion: false positive.

Case study #2: `vmtoolsd.exe`

Let us now look in depth at how the malware examined in Justin Seitz’s webinar stacks up. The initial process hollowing alert shows up like this:

The target here is a VMware Tools daemon used by the VMware hypervisor for various data sharing features between the host and the virtual machine (e.g. clipboard, automatic resolution adjustment, etc.). As we are investigating a virtual machine, this makes sense. This daemon runs all the time and boots with the machine. Thus, from a tactical point of view, if would be a very good place for malware to hide. Like we did with the Starcraft II case, let us check that this vmtoolsd.exe program is legit.
This time, we have seen many avatars of file vmtoolsd.exe across all the networks we defend. The hash observed on the host under investigation is this one:

It checks out good on VirusTotal, so if there is a bootstrap, it is elsewhere.

At this stage, I start to look out for further context on this alert. Other alerts generated on this host:

The initial process hollowing alert corresponds to the next to bottom-most entry in this list. So we see that over the following few hours, the process hollowing alert repeated, side by side with thread injection and hidden module alerts targetting the same vmtoolsd.exe process. This raises the alert to orange level — an execution thread has its entry point within the modified code segment.

The hidden module alert:

The base address (0x7ff693880000) indicated for this hidden module (e.g. identified MZ-headered code segment unaccounted for within the process’ list of modules) is exactly the same as that of the hollowed-out code segment. The alert level has now risen to scarlet. Hunters would team up to get a dump of this segment and start reverse engineering it, while somebody else looks for the persistent bootstrap. The first place to look is the autoruns list, for which we see here the beginning:

It takes close examination, but a good look at entry 3 shows something fishy: the vmware_update.exe made to run is in C:\ProgramData. This directory is normally hidden to the Explorer on most computers, so this is a pretty good place to put files better concealed… A look at this file’s hash shows it is unknown by VirusTotal, which is very uncommon for files related to VMware. This is a likely candidate for the bootstrap! Reverse engineering of this file (obtained through SNOW) and of the dumped module quickly show definitive malware features. Conclusion: very malware.

Summary

So, process hollowing detection is a nice feat of live memory forensics, but it has its caveats. The detection test actually identifies process code modification, which is a phenomenon too common to systematically label such events as malware. However, taken in proper context, process hollowing alerts do facilitate and accelerate the detection of an ongoing attack, enabling a quicker response and a more effective defense.

Obviously, sorting out all the false positives for process hollowing detection remains tedious. However, automation planned for implementation in SNOW can eliminate many common false positive cases. In particular, packed executables (e.g. executables set up using packer software, as described above) are easy to identify using tools such as PEiD. Automating this analysis, as well as other heuristics for flagging classes of legitimate process code modifications, should help the SNOW platform to scale better.

Thank you to Justin Seitz for his advice on contextual analysis and Marc Théberge for info on packers.

Some Freedom In Your Virtualization Solution, Using QEMU

George Trudeau — Sun, 18 Dec 2016 08:00:01 +0000

Virtual machines are a very common practice nowadays, for reasons ranging from emulation to sandboxing. But when it comes to virtualization platforms, which solutions are there? Basically the big players are VMware and VirtualBox. There is another one that deserves interest: QEMU.

QEMU is a free software member of the Software Freedom Conservancy, meaning it does respect your freedom and your privacy. You can build it yourself but you’ll likely use your distribution’s package manager to install it, e.g.

$ sudo apt-get install qemu

Binaries for Windows are also available, but the points of the discussion will focus on the GNU/Linux host and guest operating systems, and take advantage of it. This article will go over the essentials of QEMU. We’ll also cover a practical setup configuration and how to seamlessly integrate the virtual machines into the host environment.

Basics of QEMU command line

First, create an image for your virtual machine :

$ qemu-img create -f qcow2 my_vm.qcow2 16G

Note that QEMU supports VMware and VirtualBox formats, respectively vmdk and vdi. However, you will likely want to use QEMU’s specific qcow2 format, as unlike the aforementioned, this format is easily manipulated by QEMU’s tool to resize, modify and convert the image. Now you can use an ISO image to boot from in order to install the OS of your choice:

$ qemu-system-x86_64 -drive file=my_vm.img -boot d -cdrom image.iso -m 512

Once the installation is complete, boot your machine with more specific configuration options:

$ qemu-system-x86_64 -drive file=my_vm.img -enable-kvm -smp 4 -m 8G

The options given here are the essential basics:

drive : virtual machine image

enable-kvm : if you run a virtual machines with the same architecture as your host, this option will greatly improve performances. It enables the Linux Kernel-based Virtual Machine (KVM), which turns your kernel into a hypervisor. To be able to activate this you need KVM module in your kernel. To verify that you have the necessary module, check your configuration :

$ grep KVM_INTEL /boot/config-$(uname -r)

If you have an Intel processor, or in the case of AMD :

$ grep KVM_AMD /boot/config-$(uname -r)

smp : number of CPU cores to let the virtual machine use from the host.

m : amount of RAM to dedicate to the virtual machine.

Additionally, you may want to use the -nographic option to disable the graphic output of the virtual machine. It provides a lightweight emulation if you only want a server with which you would interact through SSH for example.

If you need something more like RDP, the SPICE protocol is the best choice. It offers copy-paste, resolution adjustments and much more. Refer to the wiki page for a detailed explanation.

File sharing between the host and the VM

If you want file sharing, here are two options.

9p

-fsdev local,id=share,path=/path/to/share,security_model=none
-device virtio-9p-pci,fsdev=share,mount_tag=share

These switches provide support for Plan9 file sharing protocol between the GNU/Linux host and a suitable UNIX guest (we still assume GNU/Linux for simplicity). It’s an efficient file sharing protocol. However, kernel support is required both on the host and guest operating systems. For a GNU/Linux system, this feature is enabled through a specific kernel module. Just make sure your distributions have it enabled, or add the module yourself. You can check your current kernel config again :

$ grep 9P /boot/config-$(uname -r)

You should have at least these enabled:

CONFIG_NET_9P
CONFIG_NET_9P_VIRTIO
CONFIG_NET_9P_FS
CONFIG_NET_9P

On the guest, mount the shared path:

$ mount -t 9p -o version=9p2000.L,trans=virtio share /path/to/mount-point

After you get it to work, you can add it to your fstab :

share /mnt/share 9p trans=virtio,version=9p2000.L 0 0

More details can be looked up in the QEMU wiki.

SSHFS

This is a pragmatic approach, as SSH provides all you need on the guest side. You only need to install SSHFS on the host. It is not as fast as the 9p protocol, but for transfers between a VM and host, the difference is unremarkable.

Interfacing with the virtual machine through SSH

Once your VM is all set up, deploy a SSH server on it. We’ll use QEMU’s TCP port forwarding to easily access it :

 -net user,hostfwd=tcp:127.0.0.1:[port]:22
 -net nic

Replace [port] with a TCP port of your choice. The -net nic option adds a network interface card to the VM, which is necessary to enable port forwarding.

A common approach to SSH authentication is to proceed using a key pair instead of typical password authentication. You can enforce it by setting this parameter in /etc/ssh/sshd_config:

PasswordAuthentication No

Next, generate a RSA keypair for authenticating with this VM:

$ ssh-keygen -b 4096  # Follow the on-screen instructions.

Then append the public key in file ~/.ssh/authorized_keys for the user to authenticate as on the guest system. Then you can add an entry to your SSH config for this VM:

host my_vm
        Hostname        localhost
        Identityfile         %d/.ssh/my_vm_key.priv
        Port                 [forwarded port]
        User                [guest user]

To log on: $ ssh my_vm

A nice feature about X11, the system for running mouse-keyboard graphical applications on most GNU/Linux systems, is that it has networking capabilities. You can easily use it with SSH-based X forwarding to run applications from into the guest operating system onto your host X server. It has the advantage of being directly integrated in your environment. Manage the window as if it was part of your host, native copy-paste, notifications directly on the host… many perks. If you want to enable X forwarding for this connection you may add :

ForwardX11Trusted yes

to your SSH config file. Or if you want to specify it in your command argument, add this switch:

$ ssh -Y [...]

And from there launch any graphical application, they will appear as if they were part of your system.

Conclusion

QEMU, unlike commercial solutions, requires additional configurations and understanding. There is an extensive man page that provides all the information you need and online examples to help with the basics. However, once the initial learning curve has passed, it provides a free and extensible solution that can easily integrate in any environment. If this article has sparked interest and has shed light on a new solution, it has fulfilled it’s purpose.

Investigating Privilege Elevation on Linux

Insider Threats

Detection of Privilege Elevation by Malware on Linux

Exploiting high-privilege user-mode programs

Exploiting the operating system kernel

Summary

Detecting Malware Through Process Chain Analysis

How To Test Malware Detection Capabilities

Defending From Endpoint Agent Disablement Cyber Attacks

Houston, we’ve lost contact

Communication failures

Machine failures

Agent crashes and attack scenarios

Attack scenarios

Authentication of agents

Looking for Cyber Threats Through Statistical Outliers

Responding to Cyber Incidents

Process Hollowing Analysis For Malware Detection

What is process hollowing?

Pitfalls of process hollowing detection

Case study #1: SC2_x64.exe

Case study #2: vmtoolsd.exe

Summary

Some Freedom In Your Virtualization Solution, Using QEMU

Basics of QEMU command line

File sharing between the host and the VM

9p

SSHFS

Interfacing with the virtual machine through SSH

Conclusion

Case study #1: `SC2_x64.exe`

Case study #2: `vmtoolsd.exe`