[BSides Talk] Offensive Anti-Analysis

Brief: A talk about options advanced attackers can deploy to beat behavioural malware analysis through the detection (and subversion!) of the behaviour engines themselves. Including a demonstration of how to beat modern engines through a working tool (demos!).This talk should be interesting to malware writers and analysts alike as it shows implementations of beating analysis, but also includes enough inline explanation to make it accessible to beginners.

I wrote down rough speaker notes and made the slides available here:

Heya Guys, my name is @HollyGraceful and I’m the author of that pink information security blog GracefulSecurity.com. Today, I’d like to share with you a little about a topic that I don’t blog about. My site is filled with articles, tutorials and information about how to break in to computer systems (that’s my day job).

Today though, I want to talk about something that I call Offensive Anti-Analysis; fundamentally we’re going to talk about malicious software development. Yeeeeah, not analysis, but development. This talk will be centred around anti-malware evasion, concentrating on the most naive methods of evasion I could find. It seems, that often evasion is a lot easier than we might expect.

A recurring issue throughout this talk is “What does malicious look like?” Define malicious. We’re going to talk about evading detection and manipulating scanning engines – for the most part don’t blame the vendors for that. Remember that red team have the advantage and that defence and detection is difficult.

So – How do you define malicious? What does malicious look like? Well if I wanted to know what a horse looked like I would say, “Okay Google, what does a horse look like?”

Perfect! Yes, definitely six good examples of horses. Well this seems effective.

So now I’m on stage talking about anti-virus evasion and computer hacking. I work as a Penetration Tester, a professional hacker. Let’s try another one. “Siri, what does a hacker look like?”

Oh…as it turns out, relying on stereotypes and generalisations isn’t so accurate. It gives a lot of scope for missing the things that don’t quite fit. I guess this is the same reason as to why social engineering is so simple: “Heya, my password isn’t working, can I borrow yours?” Then the person does a mental lookup to determine if I look like a hacker, or a burglar and since I don’t look like that, well…”Oh what’s the worst the tiny blonde girl can do? Sure, it’s Password1.”

Now, talking about anti-virus evasion means there are ethical concerns in this talk. For those less familiar with information security our industry is separated into the red team and the blue team. Terms that hail back to the military with BLUFOR and OPFOR. Blue team are the defenders and red team play the role of the enemy during training engagements.

As a member of the red team, although I play the part of the attacker, my purpose is still to improve the overall security of the system, with the reports and recommendations I produce. That’s where the ethics line is, Red Team’s work must benefit blue team more than it benefits malicious attackers. So I’ll be going in to technical detail for a lot of areas here, and even releasing proof-of-concept code where appropriate – but no I’m not going to drop a BotNet implementation with full evasion capabilities on the last slide of this talk. I am going to talk a lot about some of the problems with Information Security as it currently stands though.

Now I’m here to talk about malware development, but also wider frustrations within InfoSec where fixes and improvements exist but they’re not being implemented, where certain tasks the malicious attackers must undergo aren’t as difficult as we maybe think they are – and maybe they should be.

But, starting at the beginning of my story, my relationship with Malware is simply. I really like malware. I work as a penetration tester, so “malware” and “hacking tools” are a requirement for my job. I need them, to be effective.

The job of anti-malware is a difficult one, as to detect malicious you must be able to define malicious and it’s not quite as simple as asking Google, or spotting the “red code” as the documentary CSI: Cyber might have you believe.

Anti-malware systems have had a long time to get good at this though. You know it’s not just going to fall foul of naive bypasses…

But, saying that, Ransomware is still a thing. A major problem.

Pause. 1971, the first computer virus, “Creeper”, is released. Creeper moves between PDP-10 mainframes on the ARPANET. With “Reaper” close behind it, cleaning it up. The first malware and anti-malware.

1986 brings us “Brain”, the first malware for PCs. 1989 brings us the “AIDS Trojan”. It encrypts files with symmetric key cryptography and then demands a ransom. 1989 – we have the first ransomware. What happens in 1990? I am born. Malware, and even ransomware, have been a problem for longer than I have.

1996 brings us ransomware utilising asymmetric key cryptography, with RSA and TEA being used to make a more potent ransomware (one that malware analysts can’t just trivially extract the decryption key from). Take malicious, add a dash of bitcoin and a little asymmetric cryptography and you’ve got a recipe for profit. With CryptoWall netting a supposed $3,000,000 dollars and CryptoWall gaining $18,000,000. How does 18 million sound? Yeah I said this talk was ethically challenging.

Ransomware makes people money and it affects everyday, it’s 27 years old. Why doesn’t anti-malware just stop it? That’s because it’s a hard problem.

There are two types of scanning engine; Vendors might tell you that there are six or seven, but that’s like the seven signs of ageing. Really there are just Signature engines and Behavioural Engines. It’s fairly well known that signature scanners can be trivially bypassed. That doesn’t mean that they’re useless though, if a signature scan is less computationally expensive than a behavioural scan then it’s favourable, it has a place still. A detection on the signature scan saves you haven’t to run the behavioural scan.

Now I don’t want to skip over any steps so I’ll just note that I’ve previously talked about bypassing anti-virus and I’ll just give the TL;DR here. Most people I talk to about developing evasion techniques think that it’s something crazy difficult – because anti-virus have a head start of several decades. Yes, anti-virus evasion can take some next gen skills and sometimes I’m up to my elbows in assembly working on a new evasion technique, sometimes the most basic stuff works. Today, I want to talk about the naive stuff, I want to talk about how evasion can be effective without any real skill – because it’s that part that’s terrifying to me.

Sometimes it takes next gen skills and sometimes you just rename a file and your detections drop.

Now, I didn’t quite just rename the file, I renamed the file and also modified the executable with a hex editor to replace the “Original Filename” in the PE structure. As far as I’m concerned that’s basically the same thing, but surprisingly my detections dropped from 25 to 20, that’s a drop of 20%! On the journey to zero we’re off to a good start. Many of the evasions that I’ve played with really are that simple.

A behavioural engine would never fall for this, at this point we’re bypassing the signature engines. So why don’t the behaviour engines kick in and take over the detection? Well that’s the thing with VirusTotal, the main scanner page isn’t necessarily the full capabilities of the named scanner. VirusTotal isn’t necessarily deploying every behavioural scanner from their list.

Don’t believe me? Well they don’t hide it. It’s in their FAQ:

So at this point really we’re just bypassing signature detections, plus maybe a behavioural engine or two but certainly don’t be misled into believing we’re bypassing all behavioural engines here, neither VirusTotal or I are saying that. However signature evasion is a good place to start. A quick TL;DR of my previous post is – encoding is a waste of time and will probably cause your detections to go up, auto-erotic exploitation is jokes and one of my favourite but the one that I personally find the most effective is the well known technique “Crypting”.

Crypting is the art of pulling out the meat of an executable, encrypting it and adding a decryption stub and then sticking it all back together. The idea being that by encrypting the payload you’re removing all possibility that you’re being detected by a signature engine (other than in the stub itself of course, but this is a small piece of code to work on rather than having to mess around with the whole EXE).

In my experience, if this is well written it’s effective in bypassing all scanners – however I said that we’re concentrating on naive evasion, it’d be unfair for me to stand up here and go – “Yeah just rename a file and then write 200 lines of Assembly”. Thankfully someone’s already done that for you.

These will get you started, the first is a working packer which uses compression, the second is a nine-part tutorial on writing a crypter and the third is a leaked crypter implementation. Why is this important? Because the barrier to entry is lowered significantly when effective implementations like this are available.

But are they really effective? These implementations are either old or well known, surely they’re detected! Well now, take a step back. Yes these methods are detected out of the box and in fact writing a crypter in the way that I’ve described would be eaten alive by a behavioural scanner – but surprisingly we can take these old and known implementations and make tiny nudges to them and suddenly we’ve got complete evasion.

One of the reasons that a behavioural scanner would eat us alive is the fact that, although the tool is now encrypted at rest, as soon as it is executed it decrypts its payload the scanner can see its true intentions. One naive way around this is to introduce stalling code to your executable to effectively delay the decryption until the scanner stops executing. Most scanners I worked with scanned for about 45 seconds before deeming an executable safe and letting it through.

Scanners aren’t stupid though, if you stick a “sleep 45” at the start of your code then they’re going to skip over that, that won’t work. Instead, what if you encrypted your payload but didn’t include the key? That way the scanner couldn’t mess with the execution environment to get around the stalling code. Effectively you have your packer brute force its own key at run time! I took one of the example implementations from the previous slide and implemented this technique – in about 10 lines of code – and got an effective reduction in detections.

However, naive evasion brings naive detections. If you encrypt your payload you’ll prevent the scanner from seeing intensions however an encrypted PE section has a higher entropy – so the scanner can flag the fact that the entropy is higher than usual. Naive evasion for this? Add NULL bytes – two lines of code – and suddenly your detections drop further!

Just a couple of lines of C and we’re down to like 12 detections.

Well that’s down to less than 50% of the original detections without any “next gen” skills. However 12 detections? Take a look:

Now this is custom code that I’m deploying here, but six of the detections have some how come up with the same name for the detection. How is this possible? Well that’s because they’re all the same engine, so I don’t have 12 engines left to evade, the number is much smaller than that. Not convinced?

F-Secure doesn’t hide it, this is them acknowledging that they license and use technology provided by BitDefender.

So signature engines suck, we know that, but maybe you didn’t appreciate just how little effort you could put in to avoid these systems. However, at this point instead of continuing on the current adventure – which is little more than making arbitrary changes and seeing if the detections drop, instead I started to approach the engines in a different way.

Instead of operating blindly I wanted to map the internals of the scanning engines, so I could teach my EXE that if it executes and recognises the environment as a scanner then it should not perform a malicious action – thereby circumventing the scanner.

Now there’s an argument that says that malware won’t detect virtualised systems and avoid them in the name of avoiding being detected because so many servers are not virtualised that the malware would miss so many targets. Secondly people say that if you attempt to detect if you’re in a virtual environment that itself if reason enough to flag you as malicious – fair points.

However in this case I’m not looking at detecting virtualisation and avoiding it, instead I’m looking at detecting scanner and avoiding them. Malware has been seen to detect scanners like this in the wild already!

The approach I took was simple, map the internals of each scanner engine and use any unique features as an indication that I’m in a scanner –  used naive aspects of a system such as files that exist, registry keys, usernames, uptime, installation date, etc.

To extract the information I had a few options; If you approach the problem as a penetration tester would then exfiltrating data comes down to two approaches. In-band, which is simply attempting to embed data within the standard interface of the application. The second method is out-of-band, which is to use some other communication protocol to extract data, such as raw sockets, ICMP traffic, or DNS requests with data embedded within them.

I wrote two tools to achieve this, the first tool simply automated the act of exfiltrating data from the systems – allowing me a large amount of data to look through for unique features. The second tool was a proof-of-concept for determining if execution is occurring within a virtual environment or a scanner.

The TL;DR is that I could detect these environments accurately without utilising any difficult programming concepts, simply checking thing in the above list – files that existed and uptime were the two methods I used mainly, which are certainly simple methods.

Turns out I could not only detect if I was virtualised on real world engines, but what type of virtualisation these systems were using:

I could also not only detect if I was being scanned, but which scanner I was in:

Furthermore the naive detection was so accurate that I could even detect which node of the scanner I was in. I could differentiate between scanner nodes! (I even named them so that I could keep track of the small differences between nodes)

The end result of all of this work was that it does not take much technical skill to detect scanning engines. By detecting a scanner engine a piece of malware can perform malicious actions only when it is “safe” for them to do so, when they are being scrutinised by scanners they can perform some other benign activity to appear none malicious. Effectively as an attack I can chose what the sysadmin sees when he tests my malicious software…

Here you can see my tool accurately detecting that it’s running within the VirusTotal scanner, also note that it says “this is fear”, this line is referring to the names I gave to the nodes, so it’s just letting me know that it knows which node it is in. So – instead of stealing passwords, since we’re being watched by a behaviour scanner the tool just outputs some benign lyrics to entertain the sysadmin testing our payloads.

I wrote to Google (who own VirusTotal) and informed them of the problems that I’d found and the fact that I could arbitrarily bypass their scanner in this way – their response?

So the ability to bypass the behavioural scanner isn’t a security vulnerability according to Google – my opinion is that it’s certainly concerning. However this issue doesn’t only exist in this one scanner – the others that I tested all had the same issue.

So is anti-virus dead? Not at all, these scanning engines, just like signature analysis, still have a place – they still help out and give us indicators. However maybe treat the output of these tools just as an indicator, not a perfect and complete answer.

Be aware that malware is investing time and effort in to not only evading endpoint scanning engines but also looking at detecting systems like VirusTotal and the Cuckoo analysis system. Remember that ransomware (and ICS malware) can afford to detect and avoid virtualisation systems and are actively looking for scanners and evading those too.

Finally, if you’re shocked by the fact that malicious software can deploy such trivial evasion techniques then join the arms race and take a look at beating some malware through malware analysis. It’s not something that I’ve personally covered much on this site, so maybe give http://www.malwaretech.com/ a visit and http://blog.malwaremustdie.org/. There are some good books out there too: Practical Malware Analysis: The Hands-On Guide to Dissecting Malicious Software