I am an assistant professor at Northeastern University in the College of computer and information science and I’m also a member of the Cyber Security and Privacy Institute there.
What is Recon?
Recon is a tool that allows individuals to see what kind of data is being collected about them as they use their mobile devices. So specifically, what we’re doing is recording information such as your GPS location, your name, your email address, when this information is sent to other parties, like advertisers or just other companies that collect data about you. So, we’re making this data available to users, so they can see what kind of potential privacy problems exist and we’re also giving users the ability to control how that information is shared. So, they can set policies like block certain kinds of traffic, or even replace some personal information with other data. So as an example, take a fine-grained GPS location that says that you are here in Telefónica headquarters and translate it instead to say that I live in Barcelona and give me recommendations for somewhere in Barcelona, without having to reveal exactly where you are.
How did you come up with the idea of Recon?
I started experimenting with mobile devices all the way back in 2011 trying to understand what information was being sent over the network and so I started by just recording network traffic traces from my own iPhone and I saw something immediately that concerned me. So, I was typing searches into the search bar in the Safari browser and I could see every letter that I was typing being sent over the network unencrypted as I was typing it. So essentially anybody who was sharing the same Wi-Fi as me at a cafe, could see exactly what I was searching for, letter by letter, and so of course what we searched about tells a lot about us as individuals and so for me one of the main privacy concerns is how do we understand what information is being exposed by apps and then also how do we control it. So, what we started with is the idea that instead of looking for specific information about an individual like looking for your name or looking to see if your password is being exposed without encryption, to do that we would have to get users to tell us what that information is and we didn’t want to have users have to tell us that. So, we wanted a way that worked without knowing in advance what personal information is for a given person and so what we did is used a machine learning approach and so what we do is we are able to learn the context in which personal information is exposed. So if we look at network traffic we’ll see something like “name=” followed by someone’s name and so we have a system that automatically learns these associations between certain information in network traffic and the likelihood that your personal information is in that network traffic and so through refinement we were able to get the system to be very accurate at detecting personal information, fortunately without users having to tell us anything about themselves.
Who is involved in this project?
This started as a collaboration with Northeastern University, with INRIA and also with University of Helsinki. Mainly the research has been done at Northeastern and so we’ve developed this tool and provided it to users, so anyone can sign up. Of course, the Data Transparency Lab has also been involved. They provided the initial funding for this project, really helped get Recon off the ground. They continue to provide resources so that we can run our Recon service for the users who are contributing to our experiments. They’re also hosting data that we generate that helps inform individuals about the kinds of privacy concerns that we’ve seen and more recently the project is also funded by the Department of Homeland Security in the U.S. to help protect consumers against privacy threats, not only on mobile devices, but also in their homes and in smart connected spaces where there’s devices that are capturing information about individuals as well. So, the question is what are the resources involved or the resource demands involved for supporting a system like Recon. So, it’s certainly a non-trivial problem to solve. We have a machine-learning implementation that needs constant attention as advertisers and other data collection companies change the way that they collect data, we need to make sure that our systems are adapting properly. In addition to that, although our system doesn’t require an app, so we can run this entirely using a VPN proxy, we’re increasingly interested in running it in a bunch of other environments. We’d like to run it on the device itself as part of an app, so we’ve already collaborated with the Antmonitor team, also funded by DTL, and we’re in talks to collaborate with the Lumen team, which is another DTL funded project. And then in addition to that, we’ve implemented Recon on a router, so I’ve been very fortunate to have a postdoc, Daniel Dubois, who’s been doing this implementation and we used it at the Rooftop Film Festival this summer for an interactive where we monitored network traffic for individuals who are attending the event. This is where the Harvest documentary film that uses Recon appeared and we’ve also made this available to individuals and researchers at Telefónica and our goal is to make that available to people in general.
There needs to be an effort towards advertising the tools that exist today and potentially also having a sort of marketplace where those who need tools for transparency can advertise their own needs so that as researchers who work on this topic we can get more informed as to what are the directions we need to go with our tools to better meet the needs of those particular users.
What was your session at DTL Conference about?
The session was on mobile transparency so understanding what tools are available for improving transparency in the mobile space, which means for phones but also for any kind of Internet connected device that gathers information about us as we move about the world, whether it’s in our home or whether we’re just in other environments that happen to have internet connected devices. So, one of the themes that came out of this discussion is that increasingly there’s a challenge between how those who are collecting data make it harder for us to understand what’s being collected and then how we might push those companies to be more open about how they’re doing data collection. It seems like it’s unlikely that this will happen naturally, and so there’s potentially an opportunity for regulation that nudges companies in this direction of being more transparent about the data they’re collecting or essentially giving regulators or other parties the ability to analyse in detail what these devices, what these apps are collecting. So that’s certainly a grand challenge is that tools are technologies that these companies use to protect user information from eavesdroppers is hurting the ability for independent researchers and regulators to tell what’s being collected in the first place.
Another topic that came up is that there needs to be more of an effort towards advertising the tools that exist today and potentially also having a sort of marketplace where those who need tools for transparency can put out suggestions or advertise their own needs so that as researchers who work on this topic we can get more informed as to what are the directions we need to go with our tools to better meet the needs of those particular users. So I think certainly one concrete aspect or one concrete outcome of this panel is an opportunity for an organization like DTL to provide this kind of marketplace to do sort of have a one-stop shopping website, here’s all of the data transparency tools, including those funded by DTL and those that are that have been developed independently and allow this interaction between those who develop the tools and those who need tools for transparency.
Is it important to have events like DTL Conference?
I think it’s a great way to bring together a lot of researchers and people from other areas of focus, such as technology and policy, and even journalism as we did today with our panel, to bring all of these parties together so that they first of all become more aware of what the new technologies are and so that they can interact and better understand each other’s needs and potentially lead to better outcomes in the future in terms of improving transparency, both for individual privacy and at larger scales for regulatory agencies and other organizations that can benefit from these tools. I think this problem of data transparency particularly with respect to privacy and fairness and accountability online the problem is only going to get worse over time. At Northeastern it’s something that we are very focused on, as I mentioned earlier, I’m part of the Cyber Security and Privacy Institute, so we have multiple faculty members working on different aspects of this problem, ranging from the kind of work that I do with Recon and understanding how privacy is violated by apps and Internet of Things devices, but it also extends to algorithmic transparency. My colleagues, Alan Mislove and Christo Wilson, work in that area and even different aspects of privacy such as differential privacy. We have experts in that field, such as Jonathan Ullman. We also interact with lawyers and policy experts, so for example, recently we hired Woodie Hartzog, who’s an expert on privacy law, and so the hope is that as we can get together this wide variety of experts across different aspects of privacy, we can build more powerful outcomes that come from this interdisciplinary approach.