Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
SharmaDissertation.pdf (5.09 MB)
ETD Abstract Container
Abstract Header
Towards Data and Model Confidentiality in Outsourced Machine Learning
Author Info
Sharma, Sagar
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=wright1567529092809275
Abstract Details
Year and Degree
2019, Doctor of Philosophy (PhD), Wright State University, Computer Science and Engineering PhD.
Abstract
With massive data collections and needs for building powerful predictive models, data owners may choose to outsource storage and expensive machine learning computations to public cloud providers (Cloud). Data owners may choose cloud outsourcing due to the lack of in-house storage and computation resources or the expertise of building models. Similarly, users, who subscribe to specialized services such as movie streaming and social networking, voluntarily upload their data to the service providers' site for storage, analytics, and better services. The service provider, in turn, may also choose to benefit from ubiquitous cloud computing. However, outsourcing to a public cloud provider may raise privacy concerns when it comes to sensitive personal or corporate data. Cloud and its associates may misuse sensitive data and models internally. Moreover, if Cloud's resources are poorly secured, the confidential data and models become vulnerable to privacy attacks by external adversaries. Such potential threats are out of the control of the data owners or general users. One way to address these privacy concerns is through confidential machine learning (CML). CML frameworks enable data owners to protect their data with encryption or other data protection mechanisms before outsourcing and facilitates Cloud training the predictive models with the protected data. Existing cryptographic and privacy-protection methods cannot be immediately lead to the CML frameworks for outsourcing. Although theoretically sound, a naive adaptation of fully homomorphic encryption (FHE) and garbled circuits (GC) that enable evaluation of any arbitrary function in a privacy-preserving manner is impractically expensive. Differential privacy (DP), on the other hand, cannot specifically address the confidentiality issues and threat model in the outsourced setting as DP generally aims to protect an individual's participation in a dataset from an adversarial model consumer. Moreover, a practical CML framework must ensure a fair cost distribution between the data owner and Cloud with by moving the expensive and scalable components to Cloud while limiting data owner's costs to the minimum. Therefore, constructing novel CML solutions, which maintain a good balance among privacy protection, costs, and model quality, is necessary. In this dissertation, I present three confidential machine learning frameworks for the outsourcing setting: 1) PrivateGraph for unsupervised learning (e.g., graph spectral analysis), 2) SecureBoost for supervised learning (e.g., boosting), 3) DisguisedNets for deep learning (e.g., convolutional neural networks). The first two frameworks provide semantic security and follow the decomposition-mapping-composition (DMC) process. The DMC process includes three critical steps: 1) Decomposition of the target machine learning algorithm into its sub-components, 2) Mapping of the selected sub-components to appropriate cryptographic and privacy primitives, and finally, 3) Composition of the CML protocols. A critical aspect of these frameworks is the identification of the ``crypto-unfriendly" subcomponents and their alteration or replacement with ``crypto-friendly" subcomponents before the final composition of the CML frameworks. The Disguised-Nets framework, however, due to the intrinsically expensive nature of deep neural networks (DNN) and size of the training images, relies on a perturbation based CML construction. By relaxing the overall security and disguising the training images with cheaper transformations, Disguised-Nets enables training confidential DNN models over the protected images very efficiently. I present the formal cost and security analysis for all three CML frameworks and back them with extensive experiments. The results show that these frameworks are practical in real-world scenarios and generate robust models comparable to models that train on unprotected data.
Committee
Keke Chen, Ph.D. (Advisor)
Xiaoyu Lu, Ph.D. (Committee Member)
Krishnaprasad Thirunarayan, Ph.D. (Committee Member)
Junjie Zhang, Ph.D. (Committee Member)
Pages
173 p.
Subject Headings
Computer Engineering
;
Computer Science
Keywords
outsource storage
;
machine learning computations
;
public cloud providers
;
cloud outsourcing
;
Cloud
;
privacy concerns
;
privacy attacks
;
data confidentiality
;
model confidentiality
;
confidential machine learning
;
CML frameworks
;
semantic security
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Sharma, S. (2019).
Towards Data and Model Confidentiality in Outsourced Machine Learning
[Doctoral dissertation, Wright State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=wright1567529092809275
APA Style (7th edition)
Sharma, Sagar.
Towards Data and Model Confidentiality in Outsourced Machine Learning .
2019. Wright State University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=wright1567529092809275.
MLA Style (8th edition)
Sharma, Sagar. "Towards Data and Model Confidentiality in Outsourced Machine Learning ." Doctoral dissertation, Wright State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=wright1567529092809275
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
wright1567529092809275
Download Count:
673
Copyright Info
© 2019, all rights reserved.
This open access ETD is published by Wright State University and OhioLINK.