ST3P: Store and Analysis of Web Trackers and Privacy Policy Pages
Componente | Categoria |
---|---|
Alessandro Mei | Aggiungi Tutor di riferimento (Professore o Ricercatore afferente allo stesso Dipartimento del Proponente) |
Privacy policy pages and 3rd party trackers are a gold mine of information. They allow knowing what kind of data a website collects, which data a website shares with other services, how the website uses the collected data, and so on.
Privacy policy pages are updated frequently. The reasons could be a change in the policy of the websites or to comply with new regulations. Since regulations are different from country by country, the same website often serves the user different privacy policy pages based on the estimated country of origin of the user.
As for now, retrieve and analyze information from the privacy policy page requires a heavy manual effort.
The goal of the St3p project is to build a framework to retrieve, store and analyze privacy policy pages and 3rd party trackers from over a million websites automatically. St3p will retrieve privacy policy pages and 3rd party trackers at regular intervals of time and from different locations for each website.
The data retrieved will be used to build a longitudinal dataset, useful to understand how online companies react to the regulations change or understand which personal data each specific business use case collects.
To carry out these analyses, there is a need to automate the information extraction process on such a large volume of data. Thus, the St3p's engine has to be powered by machine learning models able to extract meaningful information from the pages.
Finally, we plan to Leverage the processed information to automatically verify the compliance of the privacy policies pages with the actual regulations, the correspondence between 3rd party services declared in the privacy policy pages, and the 3rd party services the website really embeds.
All this is the goal of the St3p project proposal.