Monday, October 22, 2007

Bashing-up a BACKLOG of Malware - ASGS (Automated Signature Generation System)

Having to deal with a "backlog" of malware samples is nothing new for a typical AV-company. A backlog typically comprises of malware samples that are considered "not-so-important" at the moment or have not made it into the priority samples set. Each AV-company typically assigns its own priority levels to incoming malware samples. It is also well known that AV-companies co-operate with each other and exchange known malware samples with each other (personally, I think this type of co-operation is of utmost importance in order to fight the battle against todays' malware). Hence, it is not uncommon for an AV-company to treat a certain malware sample with higher priority while for another AV-company to treat the same malware sample with lower priority. Hence, while certain malware samples might be detected by a certain AV-company, the same malware samples might be awaiting to be processed (as backlog) by another AV-company. Typically for any AV-company, their current customer base and the prevalence of particular malware in their region determine their backlog collection.

Adding to the backlog

My recent visit to the Virus Bulletin conference 2007 in Austria, helped establish relations with other AV-companies who are now partnering with us in exchanging malware samples. These are reputed AV-companies based in India, China, Finland, Austria, Spain, etc. further diversifying our malware collection.

Apart from exchanging samples, there are a multitude of trusted sources and customer base from where we obtain malware samples each day. The numbers have been constantly growing within the past two years, contributing to an enormous backlog of malware samples. On a typical day, we could be receiving anywhere from 2,000 to 3,000 samples.

The need for Automation

The rate at which malware are being "thrown at us" is much greater than we (a few malware analysts) can manually analyze and add detection for them. Hence, instead of throwing more individuals at the problem, there is a definite need for "Automation", more so today.

The first step in the automation process is to be able to identify samples from the bulk (waiting to be processed) as "malware" (as reported by other scanners), and then automate the "signature" generation for detection of these malware samples. This takes away a huge chunk of human interaction or manual work, speeding up the process. A huge challenge in such a procedure is generating "safe" signatures, i.e. signatures with the probability of "close-to-zero" false-positives.

Enter ASGS (Automated Signature Generation System)

The ASGS took a few months for me to implement that involved quite a few iterations in improvements and testing. I implemented the system using two of my favorite scripting languages - 4nt and Perl. Most of the iterations were about improving efficiency and minimizing false positives by incorporating extra checks. The system is now fully functional and sits as a Window's-XP Virtual Machine image processing the backlog once a week. As of now, the ASGS automatically generates signatures for only Window's PE files (the most prevalent of malware types on today's Internet, and the larger chunk of existing backlog), but the intent is to eventually automate signature generation for other file types as well. The system is completely automated and takes extreme care in NOT generating signatures that could cause potential false-positives. A typical scenario would be where a malware analyst simply executes a single command line program and the rest is taken care of. Once the signatures are ready, the malware analyst is notified via e-mail and a complete false-positive test is carried out before the signatures are released.

Bashing-up the backlog

Since my initial work developing the ASGS in Nov 2006, followed by several months of tweaking/improvements and testing, the results produced are quite impressive and satisfying. Initially it was sporadically being used to generate signatures for a few thousand samples each week, but was still not fully automated. An initial "automated" first version of the ASGS (by April 2007) tackled about 20,000 malware samples. By June 2007 I had the second version of ASGS tackle another 25,000 malware samples and by August 2007 the third version of ASGS was able to tackle about 27,000 more malware samples. It was exciting to be able to come in and have "safe" and "ready" signatures to be tested and released, detecting thousands of pieces of malware. By September 2007 the final version of ASGS went into production that automatically generated signatures for an astounding 35,000 malware samples. The backlog has since been declining. As of today, the backlog stands at a more manageable number as most of the malware samples left are non-PE files (such as text, scripts, html, Microsoft office documents, *NIX files, etc.).

Future work

1. Automating signature generation for non-PE files.
2. Automating the analysis process and generate an initial report for suspicious files (or those that are not detected by any other scanners).
3. Integrate "automated analysis" and "automated signature generation" with e-mail honeypots and high-priority alert systems to fight todays' growing threats in real-time.

Digg This | Slashdot This | Add to del.icio.us

No comments: