AV vendors once again claim to be “surprised” by an attack vector they should have expected.
A new complaint is running rife in the antivirus community, this time about PDF and Adobe Reader, the new frontier for viruses, worms and other cyber creepy-crawlies.
Let’s unpack a paragraph from the Avast! blog post: Another nasty trick in malicious PDF. Following an innocuous quotation from an out-of-date version of the PDF Reference (more on that below), the author writes:
“That’s another surprise from PDF, another surprise from Adobe, of course. Who would have thought that a pure image algorithm might be used as a standard filter on any object stream you want? And that’s the reason why our scanner wasn’t successful in decoding the original content – we hadn’t expected such behavior. To be fair, any data (text or binary) can be declared as an monochrome two-dimensional image – that’s the reason why JBIG2 algorithm works here.”
I’d like to offer two observations.
One – why is this a surprise? It is common practice to use multiple filters to encode streams in a PDF file. It’s been a common practice since PDF was released in 1993. Multiple filters on a stream is part of PDF and always has been. If virus-scanning software claims to scan PDF files, that claim implies the developer has read the PDF Reference and knows how to parse PDF.
After all, it’s not as if PDF files are unusual – they’re everywhere, and have been for years! Google counts almost 300 million PDF files online, and there are tens of billions more in banks, insurance companies, government agencies, and yes, on your hard-drive too. Given the popularity of PDF for well over a decade, there’s nothing in the PDF Reference that should come as a “surprise”.
Two – I would expect antivirus software developers to consider the possibility that an image filter could be used to encode non-image objects for nefarious purposes. The programmers who write virus detection software need to think like virus writers. The fact that they “didn’t expect” this behavior does not mean there is anything wrong with the file format, but rather that the programmers writing virus detection software failed to anticipate this exploit, and thus failed in their chosen responsibility of protecting the public.
If they haven’t thought of this type of exploit, what else haven’t they thought of? The author says as much in the last sentence quoted above.
This is only one recent occurrence of antivirus developers “discovering” something about PDF they should have known all along. The reason is simple: While PDF files are everywhere, meaningful awareness of PDF technology is not common amongst software developers. A PDF file is a free-form database file that can contain any type of data, and there are many perfectly legitimate ways to encode data in PDF.
For a start, it’s time the antivirus community bothered to read the most recent version of the PDF Reference. The document quoted in the Avast! blog post is an old Adobe version published in 2006. The current PDF Reference is ISO 32000-1:2008, officially available from the ISO, and also freely from Adobe’s website.
When it comes to PDF, we need three things from antivirus people:
- Get familiar with the technology.
- Stop identifying PDF with Adobe! Yes, PDF was invented by Adobe, but as of 2008 PDF is an open International Standard. Lots of developers write software for PDF, not just Adobe.
- Stop referencing older, dated versions of PDF documentation. Read up on ISO 32000.
Antivirus software is a multibillion dollar industry that thrives (in part) on pushing the latest threat. It’s true that hackers are increasingly attempting to leverage PDF files with evil intent. I applaud the industry’s strenuous efforts to keep us all safe – but that means they need to think a little harder and read a little deeper into PDF.
by Duff Johnson