My name is Arti Ramesh. I’m an assistant professor in State University of New York at Binghamton, New York. I’m from the United States and I work in machine learning and creating models that are relational and care about predicting users or objects that are connected. Since our world has become connected itself with IOT devices, users, social networks, and so on, by models serve all these different kinds of data. There is a dimension of privacy, transparency, fairness and also interpretability that I would like to add to my models, so this is the general overview of my research
How would you define Transparency?
Transparency to me is being able to create machine learning models, which offer a good explanation for why they’re making certain predictions. I’m talking more from a machine learning point of view because I am a machine learning researcher. Machine learning has become very popular. Many people use these models and most of these models lack explain ability and interpretability. They’re more used as a black box and we have seen at the conference as well several people talked about it that they are used constantly as a black box. So, we want to be able to use these models to generate predictions which are interpretable, which we can explain. And I think that would help making these models, and the data, that these models use more transparent to the user.
Do companies really care about data transparency? And users?
I think users definitely care about it in ways that they understand because most users do not really understand the implications of privacy. So sometimes the fairness and transparency, something that they should care about but they have a limited understanding of what it means to be unfair and what this means to be not transparent. But I think that if you explain it to them, if they know the implications they will understand it better and they’ll care about it more. And as well as companies are concerned I think that so far there’s a great amount of emphasis on creating models or machine learning approaches which have best prediction power, which has really good advantages in prediction at some expense of transparency. But there is being some recent work moving towards that, having more models which are transparent in nature, having more of approach in which users are part of the equation and they understand the implications and then they are able to participate and make it better.
What are the last trends related to Data Transparency?
The last few years have seen machine learning grow in many dimensions, such as big huge models, which are able to take advantage of the computational advancements, such as big deep learning models, which require a lot of memory, but with recent advances in hardware and memory it’s possible to create such big complex models. So that is one big thing that we have seen in the recent years and the other recent thing that we have seen is being able to use machine learning across several disciplines such as health, IOT, smart cities. These are newer applications of machine learning, but I think what is definitely lacking is creating models which are fair, which are interpretable, and I think that interpretability in fairness go like hand in hand. So, if you can create interpretable models, models that actually have meaningful combinations of features or data, to put it in an easier way, meaningful combinations of data and therefore being more interpretable, which would help us verify whether the models are fair, whether the models are transparent, and then make changes to them accordingly to make them fair, to make them transparent to the user and make them privacy aware. So, all those things are possible only when we know what the model is doing, and interpretability is a really big equation in this. And I think that the field is definitely progressing towards that and we will see more of it in the next few years.
We wrote a grant for DTL on understanding data from personal assistants, such as Google Home or Alexa. The project involves being able to identify how much personal information does these commands actually reveal.
Which are your current projects involving data privacy and transparency?
We wrote a grant for DTL, which got a travel grant, that’s why I’m here, which is on identifying privacy leakages in personal assistant devices. That project is on understanding data from personal assistants, such as Google Home or Alexa. All these are really used for short lifestyle tasks, such as navigation, maybe seeing whether a restaurant is open, checking the weather, smart home controls and going out in Friday nights for dinner options, or takeout options nearby, things like that. And these are very more revealing of a person’s personal information, such as age, gender, location, that’s a big thing, as a focus. And all these things can be more revealed by using these short commands, rather than big Google searches or search engine usages that people have because those are more targeted information when they want to gather more information on a certain subject, but these personal assistant commands are more lifestyle oriented are for simple small things that people want to search. And the project involves being able to identify this personal information using these commands, how much personal information does these commands actually reveal and we won’t know until we actually apply machine learning to see what kind of commands and how they come together. For example, I can buy a stroller and that could actually indicate that I have a kid, or I could be a woman and I could buy something for a man putting that I’m married, or music preferences, dietary preferences, like I’m a vegetarian or I like these foods. All these are nuanced information about a user beyond location, or age, or gender. And those could actually lead to much more privacy invasion than the user would like, and we saw in a talk today how vegetarianism or other dietary preferences, credit scores, all those things can be very, very damaging and that’s what I’m trying to do, whether these commands come together and reveal information.
What do you think of the data transparency lab?
I think it’s a really, really nice effort and very, very timely and very much needed indeed because data transparency is of the most importance and to the extent that there’s a lot of data like a huge amount of data that’s being collected and most of it is collected without the users knowledge in many, many ways and also most of the data is stored in places where they could be hacked and also the data could be used by apps or other things or even malicious entities if they get access to the data they would use it for wrong purposes and should definitely be stopped. And I think there are lots of different dimensions to this data transparency and I think Data Transparency Lab brings all these together in a nice way. I think that while all this data collected is very, very important. There’s also very significant side to the models, which I was talking about earlier, from the models and how they use data and that is going to give us real insight into how to prevent this from happening.