November 18, 2018
We recently were invited and gladly accepted to participate in the first round of NSS Labs Endpoint Detection and Response (EDR) testing. We are very proud to have participated and we are thankful to NSS Labs for the learning experience. The lesson learned was to realize that we, as a community, are defining what is an EDR and what we want as an EDR. Our assessment was that NSS Labs testing was not in line with our strategy for what we want out of our EDR, SNOW. Our targets are going towards structure and goals. Mainly, we are focusing on malware hunting & detection rather than after the fact forensic reporting, while NSS Labs testing emphasizes on forensic reporting.
Even though we continuously improve our malware hunting platform, SNOW is very close to where we want it to be. It achieves the desired results our clients are expecting, that is, finding malware that was not found by other defenses in place.
NSS Labs 2018 EDR testing was divided into three main categories: Detection (248 tests), Reporting (75) and False Positives (200). Testing detection is always a hard problem to solve and NSS Labs shined here with a great suite of tests that were “unknown” to tested products beforehand. To the best of our knowledge, the testing did not include proprietary new malware but only “commodity” malware of limited known signatures that were used during the tests. NSS Labs decided to put emphasis on Reporting category and assign it a much higher weighted average than Detection and False Positives categories to calculate the “system effectiveness” scores.
We performed very well in two of the three categories, and more specifically in the subcategories: detection of exploits and evasions. These two subcategories expand into: code execution exploits, operating system security evasions, crafted documents and blended threats. We also had great results by not triggering false positives thrown at SNOW. Some of the tests were simple cases of “download and execute” malware executables, a category we don’t focus on because our clients block them by deploying whitelisting technologies such as AppLocker and Bit9. It is no surprise the industry does so as it is the part of many highly respected security guidelines. These guidelines are included in Canadian Communication Security Establishment TOP 10 IT SECURITY ACTIONS and Australian Cyber Security Center in their Essential 8 mitigation strategies.
The category we didn’t do so well was the forensics reporting. Defined as follows by NSS Labs: “NSS Labs’ testing of forensic reporting capabilities in EDR products included application programming interface (API) calls, data exfiltration, file system, network traffic, registry, and system and data integrity.” This section focused on what they could measure as metrics for scoring and is, in our opinion, still very limited. For example, we prefer focusing on caching code executed: hidden shellcode, thread injections, and code that does not belong versus monitoring API calls, traffic, files, and registry. You can think of these as reasons versus symptoms. The code segments are the root cause (reasons) of all these after-the-fact artifacts (symptoms) versus recording everything happening on the host. Reporting tests attributed no points for saving in memory code segments and code. It attributed points for being able to go back in time and see if registry keys were created or deleted. During malware hunting, the most valuable information is the one that can help decide if the behavior is malicious or not. To do so, quite often, a captured memory containing code is necessary. Especially true with the rise of in-memory only malware which avoids triggering symptoms altogether.
One of our primary goals is to have an EDR that is very lightweight with little impact on CPU and resources usage such as battery power. Other goals are to limit to the very minimum the increase of the attack surface (avoid hackers using our platform to hack through the network) and to respect as much as possible the privacy of the users on a network. For these reasons, we scope the monitoring to relevant system activity required to determine if leads are actually malware or normal system behavior. Thus, it is no surprise that we did not score many points in the Reporting category. From what we understand, NSS Labs chose to assign points for a set of artifacts they could find out from malware mostly using sandbox testing. Many of these artifacts happen outside the scope of SNOW’s monitoring. They did not assign points for keeping other relevant information at determining if a detection is actually a malware or not. Something that we would like to see in the next testing iteration.
The third category was about false positives. NSS Labs threw random executions in the mix to see if EDRs would bite and report those as threats or leads. Most platforms must have done well in that category and we are happy with our score. In our opinion, the test sample was very light/easy without any usual suspect such as packed executables or interpreters with JIT pages flirting with the edge of shell code signatures.
For Arc4dia, detection is of the utmost importance, for the simple logic that if a threat is not detected as fast as possible, it is not possible to remediate and action on it before damage occurs directly or from lateral movement. Being able to evaluate the extent of infection or damage from a point in time where signatures are well known is a relative problem to solve. Something that SNOW performs very well by simply searching for hashes, names or other recorded information.
Based on the current round of tests, we have very different opinions on the weight-in factors of Reporting versus Detection that NSS Labs attributed. Plus, the Reporting testing category was not mature enough to have so much influence on the “security effectiveness” final score. From our extrapolation (we were not given the formula used to come up with the final score), it appears that the Reporting category was weighing in much more than the Detection on the final score. Although we speculate it would not change the score of our competitors by much, we believe that under the current testing methodology, the weight should remain on the Detection more than Reporting. Especially when one understands that a total of 248 malware were thrown at the EDRs, and only 75 Reporting artifacts taken from a subset of only 20 malware were accounted for.
Before diving in the how the Operating Expense (OPEX) was calculated, we just want to outline that based on pricing provided to NSS Labs for a 3x years, 500 hosts self-managed, the self-hosted licensing was in the ~$250 range per host. Our OPEX score was at $1600+ based on a series of other factors that come in, such as the number of incident response and timings to solve problems based on extrapolation from the number of information the EDR was cataloging. Essentially, our product speculated OPEX derives from different approach philosophies leading to low scores in the reporting section, compounded into running costs. These calculations are available in the full NSS Labs report “EDR Comparative Report”.
Other aspects we and our clients believe are really important that were not evaluated:
We are asking ourselves the question: To participate again or not?
We would participate again if we see an improvement in the testing methodology, more transparency on the scoring by dividing some of the tested categories. A little bit more information in the free “NSS Labs 2018 EDR Security Value MapTM” is a must.
Positioning our competitors in the bottom left corner, in what we call the “shame box”, as not evaluated, is provocative and we are not convinced it is helping to improve the industry relationships. Seeing more and more players publishing their results would indicate a mutual benefit between the independent testing and the security industry.
Put yourself in our shoes, if we don’t participate, we risk ending up in the shame box. If we participate, we have to accept the philosophical view and technical level NSS Labs is bringing to the table. We love to compete, but we are in a difficult situation after all.
In the end, EDR stands for Endpoint Detection and Response
The complete report of Arc4dia’s EDR evaluation and scores is available form NSS Labs at the following link:
The full comparative report is available under “Get Access” here: