Information Security Strategy, Part 1

The Problems of Security Testing and Unmanageable Reports

I’d like to talk a little bit about security testing, the problem of information overload and issue prioritisation. To do this I intend on broadly discussing some of the problems of the various options for security testing that organisations have.

I’ve written about some related things before, if you’d like a warm up:

However, I’d like to look a little at security a little more strategically today and to discuss the wider problems with security testing. To centre around the idea that, there are three main problems with the way companies approach security testing:

  • How do you ensure that security issues are correctly prioritised?
  • How do you ensure that testing deliverables are manageable?
  • What is your mean-time-to-detection for new security issues?

What do you consider a “vulnerability”?

If you’re in charge of the overall security of an organisation, then you want to know about all of the security issues, so you can appropriately prioritise them, right? All of the security issues? Sure?

What I’d like to lead up to, is that if I were to present to you all of the information I could gather about your organisations security posture and present them to you in a report – that report would be several hundred pages long and entirely useless. Useless because it is unmanageable.

If you were to say, well grade those issues based on some standard measurement so that priority can be applied – for example, if you were considering beginning your work at the “critical” issues and working your way down towards the mediums and lows I’d still argue the report would be entirely useless. Useless because there is no standard measurement for all security issues.

So, let’s start there: “There is no standard measurement for all security issues”. The problem here of course is that I’m using the generic term “security issue” and not something conveniently more specific such as “Vulnerability”. If we were talking about vulnerabilities, then we could look at the Common Vulnerability Scoring System (CVSS)!

There are two problems here, the first is the tricky problem of what do we consider a security vulnerability? There are multiple categories here, or at least a distinction between security vulnerabilities, policy issues, and hardening measures. The second problem is once you’ve decided what you’re going to include within those categories how do you appropriately score those to allow for a prioritised approach overall (not to get ahead of myself but the fact that the we’re now up to CVSS version 3 is an indicator that scoring is harder than it may initially seem). If you’re completely unfamiliar with CVSS, to massively simplify it is a means of calculating a risk score (out of 10, where 10 is the worst case) based on the impact a vulnerability has against the confidentiality, integrity, and availability, of a system.

Consider the scenario where you are performing a penetration test for an organisation and they have asked you to document as many vulnerabilities as possible within the time frame for the assessment. You find a server which is missing some patch, several months worth, however those patches do not appear to be security related (“bug fixes”, “new features”, “improvements”). How do you report this issue? Consider also, the fact that previously vendors have been known to hide serious security fixes within patches with change logs that simply stage “bug fixes”.

In that context, do you highlight the issue as a lack of a sufficient patching policy? A very serious issue which can certainly be high impact. Alternatively, do you consider it from the context that as far as is documented these “bug fixes” aren’t related to security issues and therefore it’s an issue of limited severity. What about that CVSS system we talked about earlier? To simplify the problem here is that CVSS considers vulnerabilities based on their impact against confidentiality, integrity, and availability – which doesn’t neatly fit this kind of issue.

Another example which highlights a problem with this kind of scoring is the fact that it considers the risk against all confidentiality, integrity, and availability. For example, if you had a poorly written function on your company web application which would trivially allow an attacker to extract confidential data from your website database (something like insecure direct object reference), that would score a 7.5, not a 10. Why? It impacts only confidentiality. If you’re new to CVSS but want to play around with it a little try this calculator:

For a couple of good examples of how “scoring risk out of 10” can be difficult consider this vulnerability: CVE-2015-6668. In particular note how the base, impact, and exploitability score are wildly different and compare the CVSS scores for version 2 versus version 3.

Finally, CVSS base scoring considers vulnerabilities independently, which may not give a fair score of the real world impact of an issue where that issue can be chained together with other vulnerabilities to allow for a higher impact. This is the idea that “three lower impact vulnerabilities may become a high”. Real world example of that? If I’m performing a Penetration test and I find:

I can pretty much guarantee that I’m going to fully compromise that network (gain domain administration level access, etc, etc) based on two infos and a medium, because I can chain these issues together and experience tells me in this context I’m highly likely to cause a significant impact to this network.

So scoring, and by proxy prioritising vulnerabilities is hard.

How do you ensure that deliverables are manageable?

Finally consider security hardening steps and how that should be reported within a security assessment. Organisations likely want to know if their servers and workstations are appropriately hardened and details be included within the report if they’re not. One benchmark for that would be the CIS Benchmarks. A great benchmark which gives real-world actionable hardening guidance including issue listing, rational, and remediation guidance.

Take the Windows 10 Enterprise CIS Benchmark for example (and bear in mind that many companies deploy multiple operating systems, and don’t forget about your clouds!) Take just the Windows 10 Enterprise CIS Benchmark: it’s 1,089 pages long.

So the amount of possible security issues that we could address through security testing, for each of the systems deployed by a company, can be huge. Across all categories such as vulnerability/policy/hardening. Reporting them all will likely lead to a hugely length report, is that output useful? If it’s not, then which issues should we ignore?

I’m being slightly unfair here of course, because we’d break these findings down across multiple assessment types. So I’m talking about the findings of a vulnerability scan, penetration test, build review, firewall configuration review, and hardening assessment. Each of these would yield different and contextually specific results – but when put together there would be a huge amount of raw output.

How often do you test for security issues?

Additionally, consider how to approach testing in terms of how often should you perform testing? That’s not a simple thing to answer and an answer such as “We perform annual Penetration Tests” is likely insufficient. Allow me to elaborate:

Firstly, consider that PCI DSS requires several types of testing on different schedules. Quarterly internal scans, quarterly external ASV scans, biannual segmentation testing, and penetration testing is required annually and on significant network changes. If you’re not a PCI company you can still appreciate that this recommendation, from a well-established data security standard, clearly expects something more than “annual penetration testing”.

I’ve got two main problems with the idea of annual penetration testing:

Mean-time-to-detection. If a new issue is discovered shortly after your penetration test are you’re relying only on penetration testing activity then it’ll be a long time before you’re informed of this issue. For an extreme example take a look at HeartBleed, it was introduced in code in 2012 but not public knowledge until 2014. Many penetration tests were conducted during that period but they didn’t include HeartBleed until 2014.

Issues get missed but when they become public knowledge there should be some way of dealing with those that doesn’t involve waiting 11 months until your next penetration test. For HeartBleed many companies were aware of it because it was a “BBC News level vulnerability” (That’s one above “Critical” on the vulnerability scale of course), what about those critical issues that get discovered but don’t get news coverage?

This idea of testing “After a significant change”. Have you documented for your organisation at what level a change is considered significant enough to warrant a penetration test? How do you deal with lots of little changes? Lots of little changes can be the equivalent in terms of the impact on security to one big change, but may not “trigger” a security test in the same way.

So here we have the idea that annual penetration testing is likely insufficient, that the triggers for testing are likely ill defined, and that it’s not simply as easy as “we’ll do penetration testing more frequently then!” as budgets are an ever-present restriction.

One method of achieving some degree of visibility whilst preventing a significant budget increase would be to perform more periodic vulnerability scanning (as is the way with PCIs quarterly scans for example), but an alternative approach would be to move away from a “Quarterly Scanning” or “Annual Penetration Testing” towards continuous security testing.

Ideally any movement a company can make towards considering security testing as an ongoing task as opposed to a quarterly or annual hurdle the better. I’d consider this in regard to three core issues:

  1. How can an organisation achieve the ability to detect changes in the network layout internally and externally to ensure that all systems are appropriately included within security assessments. That is, if your attack surface changes, how do you ensure that is covered by testing.
  2. How can vulnerabilities be appropriately prioritised if scanning technology scores vulnerabilities independently without considering exploit chaining?
  3. How can we reduce the mean-time-to-detection for finding introduced vulnerabilities, especially given many small changes may occur over time and that new vulnerabilities may be publicly released between engagements?

Maybe a better approach would be continuous security testing. How could that look?

A balance can be struck between testing efficiently through the use of automated scanning technologies and ensuring depth of testing through human led testing – but I don’t think that is best left to quarterly automated scanning and annual penetration testing.

One way of achieving this would be to allow the scanners to gain coverage over areas and functionality that they’re good at testing, and to bring in the humans for the functionality that the scanner isn’t so great at testing. Testing doesn’t have to be entirely automated or entirely human. A scanner may be very good at discovering SQL injection vulnerabilities (for example, sqlmap has done a very good job of automating the discovered and testing of these issues, proving that it’s possible) but might struggle to find logic vulnerabilities, or to dynamically discover certain features in an application which could instead be tested by a human.

Further, automated scanning is useless if you don’t have the appropriate humans to interpret the output and additionally there are often large areas of systems which scanners simply cannot enumerate and test. On the other hand human led assessments are costly.

What kind of areas do scanners often struggle to enumerate? Any functionality which requires specific or dynamic input. Consider a car insurance application form (you know, those multipage forms that require specific logical information to be input on each page before it’ll allow you to view the next page) and an automated vulnerability scanner which crawls an application looking for pages and then tests all the functionality on those pages in an entirely automated way. Would that scanner be able to complete the first page of that form with enough accurate detail to be able to get to the second page? If not, then functionality isn’t being tested. What about applications with complex authentication? (Please enter your username and password, and the third letter of your memorable word) Many scanners have not functionality to deal with this kind of hurdle.

To be clear, I’m not calling for the abolition of penetration testing. This type of continuous security testing can compliment an existing penetration testing framework, whilst addressing shortcomings of that approach such as a length mean-time-to-detection.


In short, a CISO who wants to know about all security issues, in all areas, in prioritised order, is unlikely to receive manageable and timely output from traditional security testing.

So let me leave you with my opening questions:

  • How do you ensure that security issues are correctly prioritised?
  • How do you ensure that testing deliverables are manageable?
  • What is your mean-time-to-detection for new security issues?

It’s established that relying solely on annual penetration testing is insufficient, but does adding quarterly scanning into that solve these problems raised here? I’m not convinced. It’s probably worth taking a look at your security testing plan and seeing how it measures up against the questions raised here.