DTL TECHNICAL PROGRAM
DTL Technical Program is intended to connect the key stakeholders that are part of the Data Transparency Landscape: individuals, researchers, industry players, policy makers, and other stakeholders. Our ultimate target is ensuring the right tools to create awareness are available so that an open and informed conversation around the usage of online personal data can take place.
Many of the tools researchers create collect important amounts of data (datasets) either directly from real-users, in controlled experiments or via automation/crawling. In any case, those datasets could be useful not only to the tools’ makers but also to many other researchers.
DTL provides all the infrastructure and support to foster a healthy Data Sharing ecosystem that enables researchers to publish Datasets so that the entire scientific community could use them to perform additional research work.
DTL is going to be extremely careful so that no personal information is contained in any of these datasets and ensuring the right measurements are put in place to avoid the re-identification of users. DTL is exploring technologies such as Aircloak to expose datasets to researchers in the most anonymous possible way.
Some of those datasets
ReCon Dataset – Northeastern University
ReCon is a system that inspects network traffic to identify personal information leaked by mobile apps (to sign up, click here). Recon is publishing regularly aggregated information about the Personal Information Leaks detected: which apps, which type of information and where is the information sent to.
Apart from the Dataset, the Data Transparency Lab has built an API on top of it. The API is alpha status so any feedback is very welcome. The documentation is available at Github as well as some introductory slides. This API allow developers to check the number of PI Leaks detected for a specific mobile application or sent to a specific domain.
Privacy Census – Princeton University
One of DTL 2015 Grantees is developing a 1-million-site measurement and analysis of tracking techniques. This is the largest and most detailed measurement of online tracking to date. It measure stateful (cookie-based) and stateless (fingerprinting-based) tracking, the effect of browser privacy tools, and “cookie syncing”. Apart from the source code of the tool the data gathered is available as bzipped PostgreSQL dumps and the DB schema is also available.
DTL wants to become the reference point for anyone willing to work in any project related to Data Transparency. DTL encourages the cooperation and the contributions to any tool that targets providing more transparency to how data is used.
DTL will host some projects at the DTL GitHub organisation site but will also work with projects hosted in alternative sites if they are aligned with DTL targets.
Some of the projects funded by DTL
OpenWPM is a web privacy measurement framework which makes to easy to collect data for privacy studies on a scale of thousands to millions of site. OpenWPM is built on top of Firefox, with automation provided by Selenium. It includes several hooks for data collection, including a proxy, a Firefox extension, and access to Flash cookies. This open source project is going to be the basis for one of the grantees that will develop an easy-to-use tool intended to publish a monthly “web privacy census” on tracking and privacy, comprising 1 million sites.
Appu is a Chrome plugin that helps users to keep track of password reuse of across multiple websites. Appu notifies user when a password shouldn’t be used between an important website that might store valuable data and an unimportant website. This open source project is going to be extended by one of DTL 2015 grantees to also detect any other usage of Personal Information during desktop browsing and provide ways to let users “clean up” their privacy footprint.
ReCon is a Software that analyzes network traffic to identify personal information that is being transmitted. It detects device/user identifiers used in tracking, geolocation leaks, unsafe password transmissions, and personal information such as name, address, gender, and relationship status.
Tool Curation and Incubation
Many independent developers and research institutes might have very good ideas that can be launched as small research projects. However, in order to make those projects hit the mass market, they need extra support in terms of infrastructure, tooling or technical support. DTL aims to support some of these tools so they can move from small independent projects to commercial grade ones.
Additionally, some companies or organisations are already working in many tools to provide more light to how personal data is used. DTL welcomes those tools and is willing to support these companies in order to ensure it reaches as many users as possible and provide them as much value as possible.
You can find the tools DTL is promoting at the tools section.