DTL Talks: Anders Søgaard, DTL 2016 Grantee (Copenhagen University)

10 Apr 2017
Anders Søgaard, DTL 2016 Grantee, presents his project 'Finding Waldo in a haystack of informal writing styles'.

Copenhagen, Denmark





I am a university professor of the University of Copenhagen as well as the Principal Investigator of the Waldo project sponsored by Data Transparency Lab.

The project Waldo is a collaboration between University of Copenhagen, the Technical University of Denmark as well as the Northeastern University. What we try to do in this project is make a web service that will allow users to upload texts or Twitter handles and send your predictions on where these texts have been produced.

Geolocation is a relatively private thing for a lot of people and the extent to which your location can be identified from what you post in social media is something that people are generally not aware of. So, what we want to do in this project is make that kind of technology reliable enough to identify where you're coming from based on what you write and make that transparent to end users. Generally, if you want to locate where people are coming from or you want to exploit whatever you can find in the social network of that person so basically base your predictions or condition them on where their social network peers are coming from. But often when privacy is a concern you'll be less likely to have connected to a lot of people in the local area you're living in. And so, text is the main predictor so what we're doing here in this project is two things: we're producing research papers about what kind of linguistic properties, what kind of things in your language will be predicted from where are you coming from as well as make this web service available so that people can try it out themselves. So, you can actually upload your Twitter handle and see to what extent state-of-the-art models can identify where you're coming from also give you an analysis of what kind of words that you're using, what kind of linguistic constructs that you're using are predicted on where you're coming from. The web service will be launched in a couple of months from then and the research paper it should be out later this year.