Last week, news broke about “PRISM.” A whistleblower alleges the US government has direct access to the servers of many tech companies; Microsoft, Google, Facebook, their subsidiaries, and more are listed. Then, over the next several days the story softened to be “access” instead of direct access. Now it’s down to the point where they might be talking about the National Security Letters, which is almost a non-story. But regardless of what’s true, what’s known, and what’s happening – I wanted to take a look at how big-data collection works.
Believe me, there are people more qualified than me to talk about this, but I wanted to explain give a breakdown readers would understand. So let’s take the program on what people think PRISM is most capable of: monitoring, in real time, every single phone call, text message, picture message, your cell phone’s GPS coordinates, email, Facebook post, internet search, every map you look at online, and every website you visit. That is a mountain of data, the extreme majority of which is worthless. Everybody makes fun of Twitter and says “nobody cares what you had for breakfast.” The government cares even less. Think about it in terms of space. Twitter is a website that lets you post messages that are only 140 characters in length. A sentence, maybe two, at a time. Twitter, alone, generates about 12 terabytes of Tweets every single day. A terabyte is about 1,000 Gigabytes.
I can’t fathom the amount of data and pictures that Facebook is storing, or imgur, or Reddit, Flickr, Digg, WordPress, Blogger, or any of the other thousands of websites. Even comments on blogs and news sites begin to pile up. If you were capturing all of that data (UPDATE: which they’re likely doing by means of a Fiber-Optic tap, similar to what we called a vampire tap, years ago), how could you ever read it all? You can’t. At least not today. That’s why the NSA is building a one million square foot data center in Utah. Right now, it will act as a repository. Some day, the world will have enough processing power to go back through and search for key words and phrases through every bit and byte of data that is stored. But right now, it simply isn’t possible.
So what could “PRISM” be doing, if it exists right now and is already in place at all of these phone companies, and tech companies, and internet providers? Right now PRISM would be looking for patterns in “Meta-Data.” Meta-Data is the basic stuff. To. From. Subject. Dates and times. It would be designed to highlight certain numbers or email addresses. For instance, let’s say we know the cell phone number of a suspected terrorist – well, then we could punch that number in to PRISM, and see all of the calls made to and from that device. Are they listening to every single call? Extremely doubtful. But if they see that a particular number called the suspected terrorists’ number over, and over – they can start to correlating potential accomplices and other frequent contacts. Even General Michael Hayden says that’s how it works.
I’m far from the tin-foil hat wearing conspiracy theorist. In fact, if the government wants to read all of my emails, I couldn’t care less! Have fun! Read away! What concerns me is whatever happens with them next. This is the part that concerns me. As I mentioned, the Utah data center’s goal would be to capture and archive all of the internet’s traffic. Right now, we can’t really parse and understand all of that data, which is why only meta-data would be checked. But if they’re storing all of the data, it doesn’t matter what they’re looking at? They have all of the data. What if some other hacker group wanted to release all of my private emails to the public? Think about it, publically traded companies with their business reputations at stake still get hacked from time to time. A government organization with nothing to lose probably isn’t going to secure our data as tightly as someone like eBay! But there you have it, all of your emails, the attached pictures, embarrassing stories, whatever the contents may be, someone gets in to the NSA’s system and leaks come out.
Or, what if the NSA decides that they have worked for these emails and that now they “own” them and can turn a profit on them by selling them to insurance companies. Suddenly an email surfaces where I talk about pigging out on a triple cheeseburger, and my health insurance rates go up because I make unhealthy lifestyle choices. Or maybe, just maybe, a pattern that isn’t really there emerges, by mistake. It looks suspicious. But this “predictive” system has determined that I must be a threat to national security, and in a very Minority Report sort of way, I could be charged with conspiring to commit a crime – which I’m completely unaware of, but I match the criteria and profile of someone who would commit such a crime.
Having data literally warehoused in one sweet, sweet hacker target isn’t appealing to me. The system isn’t perfect. We have too many wrongful convictions, even people put to death for crimes they didn’t commit. Leaving it in the hands of a computer to draw conclusions based on correlation isn’t the best solution. No, the system isn’t perfect, but luckily, the system isn’t even online as we imagine it – at least not yet. As I set out to mention, earlier, the only thing they can possibly be doing right now is picking and choosing the data points they want to monitor, and see how X is interacted with by A, B, C, D, and E. But what does the future hold? Maybe now is the time to stop the machine from becoming the behemoth that it is set to become. I’ve said it before and I’ll say it again: if the government wants to read my email, I genuinely don’t care. But I don’t want them storing copies of it haphazardly on external hard drives in the back seat of an employees car, etc… I don’t know if the solution is data retention policies to ensure things are deleted, I don’t know if it’s opposing the entire project and calling it a fishing expedition, I don’t know if it’s bowing to our new robotic overlord, but I have started thinking about the future.