Haojian Jin, Minyi Liu, Kevan Dodhia, Yuanchun Li, Gaurav Srivastava, Matthew Fredrikson, Yuvraj Agarwal, and Jason I. Hong
Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT)
Many smartphone apps collect potentially sensitive personal data and send it to cloud servers. However, most mobile users have a poor understanding of why their data is being collected. We present MobiPurpose, a novel technique that can take a network request made by an Android app and then classify the data collection purposes, as one step towards making it possible to explain to non-experts the data disclosure contexts. Our purpose inference works by leveraging two observations: 1) developer naming conventions (e.g., URL paths) often offer hints as to data collection purposes, and 2) external knowledge, such as app metadata and information about the domain name, are meaningful cues that can be used to infer the behavior of different traffic requests. MobiPurpose parses each traffic request body into key-value pairs, and infers the data type and data collection purpose of each key-value pair using a combination of supervised learning and text pattern bootstrapping. We evaluated MobiPurpose’s effectiveness using a dataset cross-labeled by ten human experts. Our results show that MobiPurpose can predict the data collection purpose with an average precision of 84% (among 19 unique categories).