Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Characterizing and Accelerating Deep Learning and Stream Processing Workloads using Roofline Trajectories

Abstract Details

2019, Master of Science, Ohio State University, Computer Science and Engineering.
Over the last decade, technologies derived from convolutional neural networks (CNNs) called Deep Learning applications, have revolutionized fields as diverse as cancer detection, self-driving cars, virtual assistants, etc. On the other hand, organizations have become heavily reliant on providing near-instantaneous insights to the end-user based on vast amounts of data collected from various sources in real-time. The most common method of increasing the processing capability of these applications is to execute them in a distributed manner over in large clusters. However, users of such applications are typically not experts in the nuances of distributed systems and find the already challenging task of scaling these applications quite difficult. Consequently, there is limited knowledge among the community to run such applications in an optimized manner. The performance question for these software stacks has typically been addressed by employing bespoke hardware better suited for such compute-intensive operations. However, such a degree of performance is only accessibly at increasingly high financial costs leaving only big corporations and government with resources sufficient enough to employ them at a large scale. For such users to make effective use of resources at their disposal, concerted efforts are necessary to figure out optimal hardware and software configurations. This study is one such step in this direction as we use the Roofline model to perform a systematic analysis of representative Deep Learning models and identify opportunities for black-box/application-aware optimizations. We also use the Roofline model to guide the architectural enhancements needed to design accelerated Message Brokers, called Frieda, necessary for optimized stream processing pipelines. Using the findings from our study, we are able to obtain up to 3.5X speedup compared to vanilla TensorFlow with default configurations. Moreover, compared with Kafka, Frieda exhibits a reduction of up to 98% in 99.9th percentile latency for micro-benchmarks and up to 31% for full-fledged stream processing pipeline constructed using Yahoo! Streaming Benchmark.
Xiaoyi Lu (Advisor)
Gagan Agarwal (Committee Member)
84 p.

Recommended Citations

Citations

  • Javed, M. H. (2019). Characterizing and Accelerating Deep Learning and Stream Processing Workloads using Roofline Trajectories [Master's thesis, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1574445196024129

    APA Style (7th edition)

  • Javed, Muhammad Haseeb. Characterizing and Accelerating Deep Learning and Stream Processing Workloads using Roofline Trajectories. 2019. Ohio State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1574445196024129.

    MLA Style (8th edition)

  • Javed, Muhammad Haseeb. "Characterizing and Accelerating Deep Learning and Stream Processing Workloads using Roofline Trajectories." Master's thesis, Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1574445196024129

    Chicago Manual of Style (17th edition)