Introduction
The collecting of malware has always been an interesting hobby, or fetish, as some have called it. It is part science, part voodoo magic, and a lot of sharing with friends and partners. It has been a long and involved process for us since we set out on the path to what Shadowserver has now become. Our initial reason for even collecting malware was as a means to find botnets and be able to track and report on them. It has grown into needing to map out criminal infrastructures, better understand campaign targeting, be able to automatically extract out configurations and have sources to help drive attribution.
To help gain these understandings we have developed a large amount of analysis technology, both static and dynamic. But to feed that engine we still need files – more specifically, malware and lots of it.
History of Collecting
The growth in the rate of acquisition of new malware samples somewhat mirrors Shadowserver’s own growth as a non-profit foundation:
- 2005-05-25 – First Binary
- 2007-05-27 – One Million
- 2007-07-27 – Two Million
- 2007-09-24 – Three Million
- 2007-10-03 – Four Million
- 2007-10-31 – Five, Six, and Seven Million
- 2008-01-01 – 10 Million
- 2011-06-29 – 100 Million
- 2016-01-01 – 500 Million
- 2018-12-07 – One Billion
I remember very well the conversation that occurred in 2007:
- Chief Architect – Hmm, we just hit 1M binaries.
- freed0 – Cool beans! Think of everything we can do with that. 1M… soooo cool!
- Chief Architect – What do you think the growth rate will be? How will we store them?
- freed0 – Oh, I am sure it will take us another year to double that number. We should have plenty of growth time.
Two Months Later
- Chief Architect – Hmm, we just hit 2M samples. I though it would take another year?
- freed0 – Nice! Growing better than I thought. So exciting.
- Chief Architect – Umm, how are we going to keep storing these? We can throw them all away after we process them?
- freed0 – Noooooo – save them all!
- Chief Architect – And for how long?
- freed0 – FOREVER!
- Chief Architect – No
- freed0 – Yes!
- Chief Architect – No!
- freed0 – YES BABY!
- Chief Architect – Sigh… okay
Less than Two Months Later
- Chief Architect – Excuse me, it has gone up again
- freed0 – Wow, we better start figuring out how to store this better
- Chief Architect – Can I throw any away?
- freed0 – No way!
Less than One Month Later
- Chief Architect – Umm, we are now up to 7M samples….
- freed0 – [Insert your favorite profanity here] Yeah…. about that… we need faster analysis to keep up, and well, more storage
- Chief Architect – What is my budget?
- freed0 – [checks pocket] Lint?
- Chief Architect – Sigh
The Story Continues
So when we got to our first One Million Binaries milestone, the sounds of Dr Evil’s voice rang in our ears with our pinkies twitching. We giggled for a long while and were very excited, that is until the numbers did not stop growing. And growing at a rate that became not only astounding, but also frightening. What were we going to do with them all? Even the thought of continuous storage was quickly getting painful, then there was all the needed analysis too. How would we ever keep up? They were all questions that had to have an answer, and quickly. Questions that had to be answered more than once as our malware collection quickly out-grow any growth plan we thought of.
I will admit that the same sort of excitement hit us when we recently hit the One Billion Binaries milestone. The same Dr Evil voice was in our heads, but the giggling did not last nearly as long as we needed to think about the same questions all over again.
Some Answers
Well, we were still able to keep all of those billion binaries. But it was extremely painful and continues to be so, particularly for a constantly cash-short non-profit foundation. How did we deal with those One Billion Binaries? Well, that is another story….
And, if you want to share malware with us, please let me know. We still want more, MUCH more… MANY MORE
- Chief Architect – [smacks freed0]