Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Approaching Overload: Diagnosis and Response to Anomalies in Complex and Automated Production Software Systems

Abstract Details

2018, Master of Science, Ohio State University, Industrial and Systems Engineering.
This thesis captures patterns and challenges in anomaly response in the domain of web engineering and operations by analyzing a corpus of four actual cases. Web production software systems operate at an unprecedented scale today, requiring extensive automation to develop and maintain services. The systems are designed to regularly adapt to dynamic load to avoid the consequences of overloading portions of the network. As the software systems scale and complexity grows, it becomes more difficult to observe, model, and track how the systems function and malfunction. Anomalies inevitably arise, challenging groups of responsible engineers to recognize and understand anomalous behaviors as they plan and execute interventions to mitigate or resolve the threat of service outages. The thesis applies process tracing techniques to a corpus of four cases and extends them to capture the interplay between the human and machine agents when anomalies arise in web operations. The cases were elicited from expert practitioners dealing with anomalies that risked saturating the capacity of the system to handle continuing load across multiple elements in the interconnected network. The analysis is based on a framework distinguishing parts of the system above the line of representation, the human engineers and operators, from the automated processes and components below the computer interfaces – the Above the Line / Below the Line framework (ABL). The analysis of the incidents directly links the cascade of disturbances below the line with the cognitive work of anomaly response above the line. Recorded digital text-based communications were artifacts used to construct several new representations of the cases from two perspectives: 1) tracing the evolving hypotheses from anomalous signs and interventions, and 2) charting the four basic coping strategies used in response to mitigating overload. The hypothesis generation timelines for the cases supported findings about the importance of updating mental models during the incident response and that the activity happened explicitly and frequently between the distributed engineers. Diverse perspectives expanded the hypotheses considered and beneficially broadened the scope of investigation. Strange loops and weak representations focusing on primitive event changes hindered observability for the responders. Effects at a distance from their driving source also complicated hypothesis exploration, which is a natural consequence of the network complexity. Furthermore, the response timelines demonstrated the tendency of the automation to respond with tactical, local strategies; whereas, the humans used a variety of mostly strategic responses to fill in the gaps left by the automation. New forms of tooling and monitoring could be designed to support diagnostic search across functional relationships, as well as broaden awareness of the hypothesis exploration space. The Above the Line / Below the Line (ABL) framework provided a beneficial frame of reference for analyzing the relevant parts of the system in each incident and could be the basis for future work in decision support tool design. The case study research demonstrated specific and general patterns for complications to the cognitive work of anomaly response by the autonomy in complex web operation systems.
David Woods (Advisor)
Michael Rayo (Committee Member)
129 p.

Recommended Citations

Citations

  • Grayson, M. R. (2018). Approaching Overload: Diagnosis and Response to Anomalies in Complex and Automated Production Software Systems [Master's thesis, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1543495231467142

    APA Style (7th edition)

  • Grayson, Marisa. Approaching Overload: Diagnosis and Response to Anomalies in Complex and Automated Production Software Systems. 2018. Ohio State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1543495231467142.

    MLA Style (8th edition)

  • Grayson, Marisa. "Approaching Overload: Diagnosis and Response to Anomalies in Complex and Automated Production Software Systems." Master's thesis, Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1543495231467142

    Chicago Manual of Style (17th edition)