SEC disclosure documents as a source of information security intelligence

The information security trade press and blogosphere were atwitter (oh yeah, them too) when Reuters reported in February 2012 that a Verisign SEC filing revealed that Verisign “[…]experienced security breaches in the corporate network in 2010 which were not sufficiently reported to Management”.

By and large, the subsequent focus was on the details (virtually none of which were definitively revealed): Just when did these breaches occur? How many were there? What systems and data were involved? As I recall, the general assumption was that this admission was due to new guidance from the SEC requiring disclosure of breach-related risks. While the Reuters report did not ascribe causality, it did note that Verisign’s filing “followed new guidelines on reporting security breaches to investors”. Indeed, Reuters thought the SEC guidelines change to be significant enough that it undertook “a review […] of more than 2,000 documents mentioning breach risks since the SEC guidance was published”.

Reuters found 2000 documents mentioning breach risks, (at least) one of them was thought to be newsworthy, and the new guidelines were thought to be quite important, since only documents following the creation of those guidelines were examined.

This got me thinking. If someone were to compare SEC disclosure documents from before the date of the new guidelines to those afterwards, one could at least in principle see if there’s evidence of any impact on reporting, post-guidelines. One would also be able to understand how important “breach risks” are thought to be by the management of publicly traded companies: if they’re mentioned in only 1% of filings that’s way less perceived importance than if they are mentioned in 75%. Additionally, there are information security (or closely allied) risks beyond those relating to a breach, for example those around contingency planning, or lack thereof. The more I pondered this, the more intrigued I became, especially since nobody seems to have noticed for three months that Verisign had reported significant breaches to the SEC. What other nuggets are hiding in plain sight?

I am going to try to find out, and it promises to be a fun ride. My “plan” (scare quotes used to suggest that it’s more like a “way I think I will proceed”) is to:

Obtain all SEC 10-Q documents filed in 2010, 2011 and the first quarter of 2012.
Parse them and yank out the section discussing risks
Identify the number of “cyber-risks” and total risks mentioned in each filing
Do some basic analysis of variance in number of cyber risks reported, comparing by industry, date, company size and so on.
Do some text mining of the corpus of reports. I have to do some of this to find those mentioning cyber-risks, so I might as well carry it further just to see what I come up with.

So far I have done step 1, have been thinking hard about step 2, and found an R package that somebody smart enough could use for step 5. I hope I’m that somebody - we’ll see.

I’ll be blogging more on this as things proceed, and fully intend to write it up more formally if I am able to obtain useful results.

By the way, use of this data source for information security is hardly a new idea. I quickly found references in 2005, 2007, 2008, and 2010, and that was by searching where I vaguely remembered seeing stuff.

A sophisticated machine learning algorithm selects these as possible accompaniments: