The Internet is full of personal information that many times we happily deliver to the big internet technology companies.
There are also many data that, in principle, have been anonymized – which have been stripped of any detail that allows us to identify ourselves – and which we have given our consent to be public.
We refer, for example, to medical information used for research purposes or census information.
Well, now, researchers from Imperial College London and the Université Catholique de Louvain of Belgium have developed an algorithm capable of identifying 99.98% of Americans based on public databases, crossing this type of “anonymous” data.
The research has been published in the journal Nature Communications, and its authors explain that its algorithm only needs fifteen attributes – such as gender, sociodemographic data, marital status or zip code – to get this degree of perfection.
Beyond scientific achievement, what research shows is that anonymizing data is not enough to ensure our privacy.
This is why, surprisingly, researchers have made the algorithm code public so that anyone can verify its effectiveness and that anonymous data ends up not being so.
The usual way to protect privacy is to “cancel the identification” of individuals by removing certain fields or replacing them with false values.
But this research has shown that these methods are inadequate, said Dr. Yves-Alexandre de Montjoye.
“We have to go beyond disidentification. Anonymity is not a property of a data set, but of how it is used.”
Normally, when a security breach is discovered, the usual procedure is to communicate it to the company or organization that guards the data, but since there are millions of anonymous data circulating around the world, scientists have chosen, on this occasion, to make their method public, so that those who trade with this type of information can ensure, in the future, that these data are really anonymous.
This is a tricky issue, since the more depersonalized the less useful data is for scientists who want to use it, but every piece of information stored, however small, is one more possibility of being identified.
That is why a possible solution is to control access to this type of data much more in, for example, restricted rooms, which cannot be copied or accessed remotely. Although the latter, in the 21st century, is an anachronism and perhaps a utopia.
Finally, scientists who have developed the algorithm believe that, as always, people should be more aware of the risks they face when they transfer personal information, even when they are assured that it will be used anonymously and only for certain purposes, and that you should read very carefully the clauses that specify for what purposes that data will be used.
In 2016, two people were identified from web browsing caches of three million Germans, data that had been purchased from a supplier.
Put simply, there is not such a thing as anonymous data.