Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Machine Learning Models for Categorizing Privacy Policy Text

Aryasomayajula, Naga Srinivasa Baradwaj

Abstract Details

2018, MS, University of Cincinnati, Engineering and Applied Science: Computer Science.
A privacy policy is a legal document that discloses the privacy practices of a company to its customers and contains information on how the company collects, uses and manages their data. The privacy policies of many companies on the web are written in natural language. The vocabulary employed in these documents is often sophisticated, and the policy documents themselves are lengthy. This complex nature of privacy policy documents leads end users to skip reading them or not perceive vital information, thus resulting in users not making informed decisions whether to allow the company to collect their personal information. There is a need to address this issue by making privacy policies more user-friendly. In order to address these issues, this thesis makes use of a privacy policy corpus called OPP-115, which contains 115 privacy policies annotated with different data practices. In this thesis, privacy policy text from First Party Collection/Use category of OPP-115 corpus is used for the analysis. The methods used here are a combination of linguistic and machine learning techniques applied to the corpus. A set of features which include noun phrases, verb phrases, and the relative positions of text are derived in this thesis, after observing the behavior of the text fragments in the corpus. These features are used in various supervised learning algorithms. Using the bag of words on the text as a base model, the performance of these algorithms with the extracted features is compared using various statistical measures. It is observed that the supervised learning methods with the features extracted in this thesis outperform the baseline methods.
Shomir Wilson, Ph.D. (Committee Chair)
Gowtham Atluri, Ph.D. (Committee Member)
Raj Bhatnagar, Ph.D. (Committee Member)
75 p.

Recommended Citations

Citations

  • Aryasomayajula, N. S. B. (2018). Machine Learning Models for Categorizing Privacy Policy Text [Master's thesis, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1535633397362514

    APA Style (7th edition)

  • Aryasomayajula, Naga Srinivasa Baradwaj. Machine Learning Models for Categorizing Privacy Policy Text. 2018. University of Cincinnati, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1535633397362514.

    MLA Style (8th edition)

  • Aryasomayajula, Naga Srinivasa Baradwaj. "Machine Learning Models for Categorizing Privacy Policy Text." Master's thesis, University of Cincinnati, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1535633397362514

    Chicago Manual of Style (17th edition)