Data Digest № 004
Welcome to the fourth edition of the Datawallet Data Digest. Given the prevalence of Facebook in our weekly news roundup, we are considering a rebrand to ‘Zuckerberg Digest.’ Feedback is welcome, especially from Facebook staff members. Moving on, here are the main headlines from this week in the data industry.
Mark Zuckerberg proposes a legal framework for the next generation of the internet
This is a longer piece than usual, but I think it’s of immense importance.
Mark Zuckerberg took to the Washington Post to deliver an op ed that details his ideas about a legal fix for Facebook’s PR Department’s the internet’s four biggest problems: harmful content, election integrity, privacy, and data portability. While all four are important, I’d like to focus on the one we have a lot of first-hand experience with, data portability. The degree to which Facebook’s words and actions diverge on this particular subject, elaborated below, may be a good indicator how sincere Mr. Zuckerberg is about caring about the other three.
Facebook is one of several companies that have founded the Data Transfer Project (DTP), which also includes Google and Microsoft. Here is the official description of DTP as taken from their website:
“The Data Transfer Project was formed in 2017 to create an open-source, service-to-service data portability platform so that all individuals across the web could easily move their data between online service providers whenever they want.”
The contributors to the Data Transfer Project believe portability and interoperability are central to innovation. Making it easier for individuals to choose among services facilitates competition, empowers individuals to try new services and enables them to choose the offering that best suits their needs.
Data portability is one of the core tenants we are working on at Datawallet. We allow users to extract data from platforms such as Facebook and Amazon, turn the data from raw input into a standardized data output, and allow users to upload their existing data into new applications. This way you can use your Facebook data to immediately use a new social network instead of creating all data from scratch, or you can use your Amazon data to train recommendation algorithms on new eCommerce websites without spending $1,000 on products on said website for it to learn about you.
We have developed sophisticated tools to allow users to extract their data from these platforms, which doesn’t need their collaboration, but uploading it into new services is dependent on us closely coordinating with companies receiving the data. DTP’s standardized taxonomy should make it much easier to upload data, which is why we are very excited about it.
In reality Facebook is saying one thing but doing another. Something we had to deal with at Datawallet, is that Facebook actively changes the structure of personal data downloads. At the time of this writing you can download your data twice from them and it will be formatted differently each time, which intentionally hinders data portability. This is in direct opposition to the DTP goals of creating and maintaining standardized data structures to enable data portability.
That being said: even with the DTP, however, companies still store data in central repositories outside of users’ control and can factually do with it whatever they please — and that’s the real issue with data today. DTP is a good first step, but to truly fix the problem of data control, usage permissions need to be clearly contractually defined and any permissioned data flow needs to be auditable. Finally, computations should happen in users’ own trusted execution environment, not on companies’ servers. With these fixes in place, users can get all the upside of the services that companies deliver to them without exposing any of the underlying data. Unless Facebook is willing to commit to this level of data ownership, we know for a fact that we are more in the realm of appeasing users and regulators rather than bringing about real change.
Where in the world is your Facebook data? Probably lots of places
Cybersecurity researchers at Upguard found many people’s Facebook data was publicly available via two company’s Amazon databases: Cultura Collectiva’s had 540 million record; and At the Pool’s had 220,000 records stored alongside the users’ plaintext passwords for At the Pool. These recent examples show how poor data stewardship can have a long and far-reaching effect; Upguard notes there’s a good chance that there are many more databases where people’s Facebook data is just waiting to be found.
Losing Face: Two More Cases of Third-Party Facebook App Data Exposure
FCC talks about protecting your location data
FCC Commissioner Geoffrey Sacks wrote an editorial condemning the “pay-to-track” industry that has popped up and boomed with the advent of smartphones and apps. Whether or not the FCC’s Chairman Ajit Pai agrees and actually does anything is another matter, but it’s nice to see someone at the FCC is concerned and vocal about data privacy.
FCC commissioner calls for crackdown on sales of phone location data
Credit reports future will also use “alternative data”
The major US credit bureaus are trying to improve credit score accuracy and expand credit to lower-income Americans by using information not normally considered, for example: rental payments, asset ownership, public records, and consumer-permissioned data. Whether including this info will help or hurt (or not affect) your score depends on the data, but many believe that incorporating non-standard data gives people outside the traditional credit system a way in. Not everyone is excited about the idea though, as using broader datasets and targeting lower-income individuals raises concerns about potential discrimination and predatory lending — and there’s no guarantee that incorporating these data sources won’t make it harder for some to get credit. It’s a complex topic, so I definitely encourage you to read the Fast Company article if you have the time.
Now wanted by big credit bureaus like Equifax: Your ‘alternative’ data
Ad Industry is upset that California wants to respect data owners’ rights
California state senator Hannah-Beth Jackson is the leading force behind a recently introduced law that would allow consumers to sue companies that fail to follow the California Consumer Privacy Act (CCPA). The CCPA, which goes into effect January 2020, requires things such as providing a person’s data when they ask for it (up to 2 times a year) and not selling a person’s data without their approval. Sen. Jackson now received a letter from the ad industry warning that the law “would encourage frivolous lawsuits,” because the lawsuits would not need to prove harm. Wait a second though… harm?! The ad industry wants people to prove harm? Since when is harm required to express property rights? And who in the ad industry has this opinion, you may wonder? Only the Association of National Advertisers, Interactive Advertising Bureau, American Association of Advertising Agencies, American Advertising Federation, and Network Advertising Initiative.
Proposed Change To California Privacy Law Would Encourage 'Frivolous' Suits, Ad Groups Argue
Are tech’s talks about data ethics just posturing?
Not so much news as an opinion, but John Naughton shares his thoughts on tech’s trend to talk about data ethics, which Gartner has announced is one of the top 10 strategic trends in 2019.
Are big tech’s efforts to show it cares about data ethics another diversion? | John Naughton
That’s all for this week. Stay off Facebook, friends!
SerafinCCPAData DigestConsumer Privacy Ad Tech Analyst Reports Industry Trends