How To Test Malware Detection Capabilities

From a software quality perspective comes the idea to verify our system detection capabilities. More specifically, we aim at verifying that the overall system is able to detect malware and hacker activities from the SNOW sensor installed on a host to the SNOWboard Command System used by the hunters. It’s a simple idea, but not so trivial.

First, a “detection capability” is not a feature of a single component, but is an emergent property coming from the interaction of many components. This constrains us to create an end-to-end infrastructure dedicated to testing. Second, there are some “unpredictable” behaviours in some components induced by deliberate randomness and communication latencies between the components. This limits how far the testing environment can be fine-tuned.

Even without these concerns, how can we test that the system is *truly *able to detect malware and cyber attacks?

In fact, it’s not the malware itself, nor the hacker’s presence that are detected, but their behaviour. In simple cases, a behaviour is clearly suspicious. There are well-known malicious techniques, such as process hollowing, that are easily recognizable. In more complex cases, many clues must be combined together to expose a malicious intent. In any case, to test our detection capability, we must recreate scenarios we expect to stimulate our SNOW sensor and centralized detection algorithms.

To achieve this, we can use real malware samples, provided that we remain very cautious. The malware must be isolated so as not to invade our testing infrastructure. Also, we must clean up the damage these samples wreak using virtual machine snapshot capabilities, for example. Alternatively, we can use custom made harmless malware. Since it is the behaviour that is detected, it is not mandatory to cause real damage. Such pseudo-malware can be self-cleaning, avoiding the need to use snapshots. In practice, both real and harmless malware are valuable.

In the analytics central, we want to detect if an alert of the expected type has been raised within a reasonable time range. An easy way to proceed could be to search in the database. However, it does not guarantee end-to-end detection. Instead, we communicate with the SNOWboard RESTful API, retrieving the same data as used by the SNOWboard Command System.

The whole process is automated. Having malware and pseudo-malware in a pool, our automation platform can launch them and check if the alerts have been raised without any human intervention. A notification is sent when a problem is detected.

In summary, in order to test our detection capabilities, we prepare scenarios that act suspiciously and execute them on a dedicated, isolated testing infrastructure. After the maximum reasonable detection delay, we parse the same data that hunters request to check if the expected alerts have been raised. That way, we manage the quality of our solution and we ensure hunters have the information they need to provide a high-quality service.