On Controlling “Your Data”

For years now, we’ve discussed variations of the question “who owns my data?

And alas, we didn’t get very far, for several reasons:

  • The most economic value of data rests not in individual data, but in aggregated data and data/insights generated from that aggregate data.
  • It’s not about data ownership. Since digital data can be copied without loss, for all practical purposes the very concept of data ownership takes a back seat to data usage (read: exploitation). You can use data without owning it. And owning a bit of data doesn’t come with a whole lot of control over how it’s used.
  • The data “marketplace” is a farce. There’s no level playing field in which all participants in the data “marketplace” interact at eye level.

Sidebar: I put “marketplace” in quotes because it is a marketplace only in name, totally dysfunctional to the data subjects that are impacted by the use of data about them. I’d go a step further and argue that even the framing as a marketplace is faulty, and should be abandoned entirely when it comes to data about individuals. We need to talk rights, not economics. But I digress. Back to the main argument.

So, who gets to collect, store, process — in other words, financially benefit — from data about you, the individual, is the real question here. That, and how to give individuals a real, meaningful choice in the matter.

The other day, I learned about a new organization called Polypoly, and jumped on a call with them since they seem to attempt something interesting. There are a lot of moving parts and open questions but the gist of it is this: They aim to build the infrastructure to store individual data decentralized, on users’ devices, rather than on the servers of the highly centralized big tech platforms. There’s a lot of shenanigans in the background, both technologically (standards, encryption, permissions, etc.) and organizationally (a co-op, a foundation, a company…). It’s an interesting and ambitious project for sure. It’s also very early stage, so who knows.

The key building block from a strategic perspective is to get so much data into those collective user “pods” of data that are built around user permissions first and foremost that the big platforms would rather have limited access to this giant pool of data than have full control of a smaller data pool on their own servers. It’s a giant game of armwrestling with some of the wealthiest and most powerful companies in the history of the world.

Now I can’t tell if they’re on to something or not. If this is possible at all, and if so, if theirs is the right team and approach to pull it off.

However, I did notice one thing that to me seems like it might be a problem: The path towards tipping the scales away from the platforms and towards this collective of users is to essentially let users export their data from, say, Facebook and into their own pod. Over time, so the theory, exporting each users’ data from FB and Amazon and Google and Netflix and all the others will add up to a much bigger data pool so that it becomes more financially interesting to the FBs of the world to ditch their own data centers and rely on this new infrastructure/data pool.

That part I have a hard time believing in. Big organizations don’t necessarily keep their own infrastructure because it’s cheaper, but because it’s a strategic asset that supports independence, resilience, control. So this new data pool would have to be so *enormously* superior for FB et al that they would ditch one of their most valuable strategic assets for it. In the meantime, even if there were millions of Polypoly users exporting their data, they would only export copies of their data while the “original” data sets would remain on Facebook’s servers. It would be redundant, slowly forking data collections, two instead of one. That, to me, makes this approach a tough sell.

But this post isn’t about an infrastructure co-op. It’s about thinking about data ownership, or rather, meaningful control over data about the individual.

We already covered that data ownership isn’t a very useful framing, and that aggregate data and data generated from aggregate data is where the real action is. Thinking of “owning” “my data” isn’t particularly useful. Instead, we need to be thinking about “how can I make sure there are no negative impacts that stem from data about me”.

Which brings us to the much more structural, fascinating and salient points around externalized costs and societal impact.

Martin Tisné of Luminate and Marietje Schaake of the Stanford Cyber Policy Center are spot on when they write (highlights mine):

As with CO2, data privacy goes far beyond the individual. We are prisoners of other people’s consent. If you compare the impact of data-driven harms to those of CO2, it becomes clear how impacts are societal, not individual. My neighbour’s car emissions, factory smoke from a different continent, affect me more than my own small carbon footprint ever will.” (The Data Delusion, Stanford Cyber Policy Center)

When it comes to data, the focus on the individual is counterproductive. It’s a lazy argument to avoid regulation, like blaming individuals rather than systemic issues for climate change. Yes, recycling is important, but you can’t recycle your way out of a climate crisis.

When it comes to controlling the impacts of data about us, we need collective bargaining rights; we need infinitely stricter rules for companies to collect and use data about us; and in the meantime we need to protect individuals and communities from being harmed by data collected about them and permissions given (or simply taken) from others.

Shutting down large swathes of the data “marketplace” might be the only way to even get to a real discussion of how to rebuild better. That, I imagine, will be a tough sell to a ridiculously powerful industry. But we know the current model isn’t working, and it’s actively damaging to society and individual autonomy in general, and to vulnerable communities particularly.

We need to come up with better models for how we can use data for important purposes without causing the data equivalent to the climate crisis. Thinking about data not in individual but in collective terms is an important step in that direction.

PS. For the Getting Tech Right podcast, I recently spoke with researcher Di Luong about algorithmic bias, public interest tech and the impact of data-driven systems on vulnerable communities and she touched upon some of the key aspects of this.

1 Comment

To draw an analogy with climate change: if you don’t cause pollution, you don’t need to clear up the mess. As has been pointed out in Shoshana Zuboff’s book Surveillance Capitalism, we talk too much about data ownership but not about data collection. And especially whether we even SHOULD collect a lot of data about us. Most of the data collected in the digital world would cause an outcry if done in a similar way in the analogue world. So we need to include that part in the discussion as well

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.