Steve Englehardt and I recently made available our draft paper Online tracking: A 1-million-site measurement and analysis, funded in part by a DTL grant. It is part of the Web Transparency & Accountability Project at Princeton, and it’s the most detailed look at online tracking conducted so far. Among our findings was the fact that the the Audio, Battery, and WebRTC APIs in HTML5 are all being abused by third-party scripts for fingerprinting. There’s been some press coverage here and here.
This research is part of a broader, emerging movement for data transparency. Key to the success of data transparency (and the related concept of algorithmic transparency) is combining cutting-edge research with community involvement. In our own work, we faced the research challenge of accurately identifying and attributing different types of tracking and fingerprinting on 1 million websites. But the next steps are to use this research to benefit end-users, which will require building tools, education, advocacy, and so on.
To this end we’ve made available the privacy measurement tool we built and the data we collected on 1-million sites. We encourage you to explore them and find new uses for them. We also plan to help DTL create visualizations of online tracking and a tool for users to see the tracking they have encountered in their own web browsing. We’re excited about this direction.
Research on data and algorithmic transparency requires a number of sub-communities of computer science to come together: web measurement, privacy & security, machine learning, systems, and human-computer interaction. None of the existing conferences or publication venues is well suited for nurturing this type of research. So Alan Mislove, Nikolaos Laoutaris, and I have been planning an initiative to build a cross-cutting community on transparency research. Expect an announcement shortly!