We’d all like to believe that anything we do in the privacy of our own homes is truly private. Sadly this isn’t true. With data being collected by every search engine, data about the queries we type, the sites we visit, even our email how can we possibly believe that any part of our lives is private anymore. Maybe it’s something we all simply accept as part of life in an online world, but the recent AOL screw up should make us all think once again about simply accepting things.
If you haven’t already heard AOL released this past weekend a list of over 20 million searches recorded between March and May by roughly 658,000 AOL users. Instead of rehashing all the details you can take a look at this article from TechCrunch which has more and links to a number of other sources of information. And instead of the usual AOL bashing this story is bringing, (their image is already pretty bad without me having dump on them) I want to talk more about the privacy implications.
AOL claims the the release was an innocent attempt to reach out to the academic community and I’ll take them at their word. Still it’s something that should never have happened and should make all of us think about what information is out there that can be used to build profiles about each of us. Do you really want every search you make to be made public. I know I don’t.
I’ve seen the argument that as long as you have nothing to hide then why should you care. Fair enough, except that even if I don’t have anything to hide I don’t necessarily want everything about me made public. Forgetting search engines for a second we’ve all done things in our lives we’re not always proud of. I have no problem sharing those things with friends and even family, but I don’t feel the need to share that information with my neighbor or my boss if I had one. We all have a fundamental right to privacy.
But let’s look at the having nothing to hide argument in terms of search alone. Who other than the person typing a query in a search engine knows the intent of wanting to find that information. If you search for ‘how to make a bomb’ are you a terrorist or a novelist. Maybe you’re just curious. The truth is there’s absolutely no way to know what your intent was when you typed those words.
The plentyoffish blog presents one of the more frightening search histories released and admittedly it does seem like someone should be checking out user 17556639, but do we really have any idea why user 17556639 typed those words into a search engine? Maybe the user actually suspects someone else of the potential crime and is looking for information to confirm suspicions. Maybe the given AOL user had just gotten into a fight with his wife and those searches were away to let off steam. For all any of us know some of those searches are the name of some song by an obscure band. I agree the searches looking pretty incriminating, but it’s impossible to know for certain why those search queries made their way into AOL.
What happens when searches we perform at completely different times and for completely different reasons are seen by someone else to be connected. In a previous post on Google fighting the government’s request for search data I pointed to an example from a commenter on the MSN Search WebLog. The commenter asked what happens when someone performs a search on something about the Israeli Palestinian conflict looking for news and at a later time performs a search looking for information about armaments in order to better program a video game? Someone might easily come to the wrong conclusion about the intent of this particular searcher.
How about someone unhappy with their current career and spending the evenings look for a job and information about companies where they think they might be happier working. They may not even be all that serious about changing jobs, but are just considering their options. Nothing at all wrong with that until the search data is released or sold and the person is fired.
And what about the case where the searcher wasn’t even the same person? The searches are tied to an id, but how many people might use the same computer and be logged into the same account when searching. Pretty common with a family or any guests that might be over. What about less than savvy users on a public computer who may login to check email, but never log out? There might be any number of people searching using the same toolbar that’s associated with one person.
The only way our privacy could be absolutely guaranteed would be to never record any of the search data. We all know that’s never going to happen given it’s marketing data used to increase revenue. And there are good reasons to collect the information. The main issue I have is with it being identified with a particular user. AOL may not have released real names, but the IDs could no doubt be traced easily to a specific name and searches themselves can often reveal personally identifiable information.
Removing the IDs and the ability to tie a particular search with a set of other particular searches would go a long way towards helping to retain privacy. Putting it all together to say that 4.78% of AOL users searched for a particular phrase doesn’t really bother me. And while I can see the marketing value of knowing that of those 4.78% of AOL users all of them also searched for product ‘x,’ it does start to cross over into the area where it can identify specific people. At the very least though, remove the Id’s.
Sadly the collections of search data and the continued practice of recording them isn’t going to go away, since there’s too much money in it for the search engines. Still something needs to be done to insure the privacy of searchers. If not it won’t be long before searches become regularly released or sold. Profiles will be built on each of us based on our search behavior and the calendar will turn back to 1984.