Data: The New Pollution?

Jack O'Sullivan

November 24 2020

Computers naturally produce data. Also, computers are everywhere – what we still call ‘a car’ is a computer with wheels and an engine – even more so as older ‘non-digital’ vehicles are being removed from our roads.

We produce more and more data because there are strong incentives for doing so, and very little against. For example:

If I own a very popular website there’s little gain in reducing the amount of data that I produce, store and then perhaps sell to third parties
If I use a very popular website I’ve likely signed up because I want to use its services – be it comparing car insurance providers, sharing photos or looking for a romantic partner – which far outweighs the hypothetical risk of my data being lost at some point in the future. On top of this, many users are unaware that web sites are usually designed to track them and their habits

The right to privacy is hard to enforce when it’s not convenient for all parties involved. Eventually, I hope governments will step in with regulations to enforce the long-term interests of their citizens, and draft international treaties to deal with the fact that data is moved across country boundaries. Meanwhile, data is produced faster than we can deal with it, while regulations lag behind the significant momentum of government bodies.

Thinking of data as pollution is a useful metaphor, as it’s helps us to see a repeating pattern in industrial history. Just because pollution is a free by-product of most industrial processes, doesn’t mean it’s OK to dump it in the environment. Similarly, the fact that computers naturally produce data doesn’t mean it should all be stored forever or sold for profit.

Companies that kept polluting in the traditional sense, trying to circumvent the law for the sake of profits even when the risks were well known, are now perceived as bad. Eventually they lost value, some disappearing altogether or having to undergo costly PR campaigns. Public opinion shifted and gradually became more aware of the dangers. In all of this, government intervention was pivotal to regulate and enforce the new rules.

Data’s turn to be kept in check

Over the last decade or so, a similar argument and attitude has been growing towards data as the new pollution – but something in even greater need of scrutiny and control. Rallying wake-up calls like the ones below are only a tiny representation of an increasing number of voices.

Bruce Schneier on Facebook

Facebook initially was aimed at university students. It opened to the public in September 2006; earlier that year (March) security guru Bruce Schneier wrote:

“The pervasiveness of computers has resulted in the almost constant surveillance of everyone, with profound implications for our society and our freedoms. Corporations and the police are both using this new trove of surveillance data. We as a society need to understand the technological trends and discuss their implications. If we ignore the problem and leave it to the “market,” we’ll all find that we have almost no privacy left.

The common thread here is computers. Computers are involved more and more in our transactions, and data are byproducts of these transactions. As computer memory becomes cheaper, more and more of these electronic footprints are being saved. And as processing becomes cheaper, more and more of it is being cross-indexed and correlated, and then used for secondary purposes.

Information about us has value. It has value to the police, but it also has value to corporations.”

Cory Doctorow on data as nuclear waste

In 2008, Cory Doctorow compared computer-generated data to nuclear waste:

“Every gram – sorry, byte – of personal information these feckless data-packrats collect on us should be as carefully accounted for as our weapons-grade radioisotopes, because once the seals have cracked, there is no going back. Once the local sandwich shop’s CCTV has been violated, once the HMRC has dumped another 25 million records, once London Underground has hiccoughup up a month’s worth of travelcard data, there will be no containing it.

And what’s worse is that we, as a society, are asked to shoulder the cost of the long-term care of business and government’s personal data stockpiles. When a database melts down, we absorb the crime, the personal misery, the chaos and terror.”

Having read these warning shots, and so many more since, why does it appear so little is being done?

The absence of any incentive

As I’ve already suggested, the answer might be that it’s easier and more beneficial to produce and keep more data. And this goes for pretty much anyone, from software manufacturers to end users. As users we don’t perceive an immediate threat, so it’s unlikely we’ll ever complain too loudly.

For software manufacturers

There are direct incentives to produce more data:

Raw data can be sold to brokers; the more data, the more money
Raw data comes basically for free out of any running software

There are incentives to keep as much data as possible:

Storing data is cheaper than any investment required to reduce the amount of data produced
Everything is stored “just in case we need it later”. The cost of storage far outweighs the risks of losing any data.

There are indirect incentives against reducing the amount of data:

Writing software to filter and anonymise data requires expertise, skill and care
For most scenarios, adding storage space is orders of magnitude cheaper than the man-hours required to reduce the amount of data stored (example: the hourly rate of an experienced database administrator vs. adding more resources to a server)

Users

As a user, threats aren’t immediate. You might think:

Nobody is going to guess I’ve used the same password
It’s just my email
What is going to happen even if this data is lost

Yet we fail to realise not only the immediate privacy implications, but the consequences this might have in the medium and long-term.

A trivial example: many signed up using their work or personal e-mail to the adult dating website Ashley Madison. When their database was leaked, everyone on the internet could search it and find the names or e-mail addresses of spouses and partners. Imagine that scenario applied to forums or websites to discuss sexuality, gender, or mental health problems.

This is compounded by the fact that the perceived cost of online services is zero, as most draw their revenue from advertisers or by selling their user data. The saying goes: “If it’s free, then you’re the product” – meaning that as a product, we have no right to privacy. Ironically, we are expected to be security-conscious and privacy-aware – especially when it comes to minors. End users are often blamed for their poor security choices when, in reality, they are the victim of an overly complex system, combined with incentives against their privacy.

Criminals

There is a price tag associated with personally identifiable data, as it can be leveraged to gain economic benefits illegally. For example, using someone’s PII to open credit lines, mortgages, bank accounts, etc. This makes data appealing to criminals. Reducing the amount of data produced, stored and sold is one way to reduce the risk that falls into the wrong hands. This, however, is only one side of the coin; we need to design systems where personal data can’t be the only means to identify a person. Some form of identification or a ‘chain of trust’ needs to be in place as well.

Secondary intent

Lastly, data is at risk of being used for secondary purposes in an opaque manner. For example, insurers already offer ‘black boxes’ to put in cars or phone apps, to reduce fraud by tracking people’s location and habits. With vague terms and conditions (which, sadly, nobody reads) and with the lure of lower premiums, how likely it is that in the near future they might use that information to increase your premium for a life insurance based on the number of visits to your GP?

While this scenario might sound far-fetched, let us not forget that until ten years ago the idea of willingly donating to a private entity our personal history, location, photos, relationships and political views would have sounded ridiculous. Now it’s called Facebook.

What’s next?

First, we haven’t yet seen the real consequences of a major privacy breach on a national scale, because such attacks are costly to perform and deploy. Elections have been a canary in the coal mine. For example, I’m willing to bet it’s not only possible but also inexpensive to buy the list of people undecided on a candidate, and then target them specifically with advertising designed to elicit an emotional response (also called ‘fake news’) to sway their vote in one direction.

The raw data (voters’ preferences) is available in the first place because the internet business model is surveillance and tracking. Every website we visit, every email we open, every conversation we have, every action we do online is stored, shared and sold in bulk. It’s only a matter of paying enough money to have personalised adverts shown only to the right people. Have you ever compared the adverts you see with the ones your friends see?

Second, given the security posture of most online services (very low), it’s only natural for attackers to go for the low hanging fruits. All major breaches have been the result of a trivial compromise, or a lack of patches, or related to a human error. There will always be an insecure database somewhere; ironically, this means that most people’s personal data will never be breached with criminal intent, simply because there’s easier stuff to snatch.

Remedies are scarce, and demand effort and discipline: password managers to avoid re-using the same password; secondary e-mail for unimportant services. Consider using different devices (laptops, tablets, phones) for sensitive or reserved activities such as those related to medical, financial or online banking.

However, as security awareness slowly improves, privacy breaches related to older, unprotected services will worsen. When people’s lives will be directly affected then the government will step up and try to do something. Meanwhile we’re all on our own, fighting an uphill battle.

For more information about data, data protection, and business security, contact one of our experts today.

Data’s turn to be kept in check

The absence of any incentive

For software manufacturers

Users

Criminals

Secondary intent

What’s next?

Latest

Securing Financial Transactions in the Digital Age

The Role of AI in Cybersecurity Friend or Foe

IOT Self-Statement of Compliance for PSTI?