Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
main.pdf (1.05 MB)
ETD Abstract Container
Abstract Header
Intelligent Caching to Mitigate the Impact of Web Robots on Web Servers
Author Info
Rude, Howard Nathan
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=wright1482416834896541
Abstract Details
Year and Degree
2016, Master of Science (MS), Wright State University, Computer Science.
Abstract
With an ever increasing amount of data that is shared and posted on the Web, the desire and necessity to automatically glean this information has led to an increase in the sophistication and volume of software agents called web robots or crawlers. Recent measurements, including our own across the entire logs of Wright State University Web servers over the past two years, suggest that at least 60\% of all requests originate from robots rather than humans. Web robots display different statistical and behavioral patterns in their traffic compared to humans, yet present Web server optimizations presume that traffic exhibits predominantly human-like characteristics. Robots may thus be silently degrading the performance and scalability of our web systems. This thesis investigates a new take on a classic performance tool, namely web caches, to mitigate the impact of robot traffic on web server operations. It proposes a cache system architecture that:~(i) services robot and human traffic in separate physical memory stores, with separate polices;~(ii) uses an adaptable policy for admitting robot related resources;~(iii) combines a deep neural network with Bayesian models to improve request prediction. Experiments with real data demonstrate (i) significant reduction in bandwidth usage for prefetching and (ii) improvements in hit rate for human driven traffic compared to a number of baselines, especially in configurations where web caches have limited size.
Committee
Derek Doran, Ph.D. (Committee Chair)
Tanvi Banerjee, Ph.D. (Committee Member)
John Gallagher, Ph.D. (Committee Member)
Pages
60 p.
Subject Headings
Computer Science
Keywords
web cache
;
web robots
;
crawlers
;
prefetching
;
prediction
;
LSTM
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Rude, H. N. (2016).
Intelligent Caching to Mitigate the Impact of Web Robots on Web Servers
[Master's thesis, Wright State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=wright1482416834896541
APA Style (7th edition)
Rude, Howard.
Intelligent Caching to Mitigate the Impact of Web Robots on Web Servers.
2016. Wright State University, Master's thesis.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=wright1482416834896541.
MLA Style (8th edition)
Rude, Howard. "Intelligent Caching to Mitigate the Impact of Web Robots on Web Servers." Master's thesis, Wright State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=wright1482416834896541
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
wright1482416834896541
Download Count:
734
Copyright Info
© 2016, all rights reserved.
This open access ETD is published by Wright State University and OhioLINK.