Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
sixiang_dissertation_ohio_link.pdf (814.2 KB)
ETD Abstract Container
Abstract Header
Mitigating Distributed Configuration Errors in Cloud Systems
Author Info
Ma, Sixiang
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=osu164912259816919
Abstract Details
Year and Degree
2022, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
Abstract
While many techniques have been proposed to find software configuration errors in software systems, most of them focus on finding misconfiguration occurring on a single node. Unfortunately, the nature of distributed systems brings up a more complex problem: some failures may only occur when a system is configured inappropriately on multiple nodes, whereas the configuration of each node is considered correct individually. To distinguish these configuration errors from local configuration errors which have been widely studied, we call these errors as distributed configuration errors. In this dissertation, we combat distributed configuration errors in two ways: 1) we re-design the system to reduce the chance that the administrator may introduce an inappropriate distributed configuration; 2) we use the traditional software testing approach to test what distributed configurations are unsafe. In the first direction, we focus on timeout, an important parameter that is hard to configure right. We propose SafeTimer, a mechanism to enhance existing timeout failure detection protocols to tolerate long delays in the OS and the application: at the heartbeat receiver, SafeTimer checks whether there are any pending heartbeats before reporting a failure; at the heartbeat sender, SafeTimer blocks the sender if it cannot send out heartbeats in time. As a result, as long as networking delays are bounded, SafeTimer can guarantee the correctness of failure detection. We applied SafeTimer to HDFS and Ceph with little modification, and found the performance overhead is small. In the second direction, we propose ZebraConf, a testing framework that reuses existing unit tests and integration tests to test whether a parameter can be configured in a heterogeneous manner. To address the challenge of assigning different configurations to different nodes in unit tests, ZebraConf incorporates several heuristics to accurately map configuration objects to nodes. To reduce the massive test number, ZebraConf profiles unit test suites to only generate effective tests and groups multiple tests into a single one. We applied ZebraConf to five cloud systems and found 47 heterogeneous-unsafe configuration parameters.
Committee
Yang Wang, Dr (Advisor)
Michael Bond, Dr (Committee Member)
Xiaoyi Lu, Dr (Committee Member)
Kannan Srinivasan, Dr (Committee Member)
Feng Qin, Dr (Committee Member)
Subject Headings
Computer Science
Keywords
Distributed Systems
;
Cloud Systems
;
System Configuration
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Ma, S. (2022).
Mitigating Distributed Configuration Errors in Cloud Systems
[Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu164912259816919
APA Style (7th edition)
Ma, Sixiang.
Mitigating Distributed Configuration Errors in Cloud Systems.
2022. Ohio State University, Doctoral dissertation.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=osu164912259816919.
MLA Style (8th edition)
Ma, Sixiang. "Mitigating Distributed Configuration Errors in Cloud Systems." Doctoral dissertation, Ohio State University, 2022. http://rave.ohiolink.edu/etdc/view?acc_num=osu164912259816919
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
osu164912259816919
Download Count:
289
Copyright Info
© , all rights reserved.
This open access ETD is published by The Ohio State University and OhioLINK.
Release 3.2.12