Skip to Main Content
 

Global Search Box

 
 
 
 

Files

ETD Abstract Container

Abstract Header

Efficient Online Learning with Bandit Feedback

Abstract Details

2020, Doctor of Philosophy, Ohio State University, Electrical and Computer Engineering.
Online learning has been widely used in modern machine learning systems. In spite of the recent progress in online learning, many challenges remain unsolved when online learning is applied to the complex practical scenarios. The key challenges can be summarized from the following perspectives. First, curse-of-dimensionality of the action space: to learn the optimal action, the learning algorithm needs to explore all the actions to some extent. Thus, large-scale action space makes the learning algorithm converge slowly and incur large cost in the cold-start cases. Consider the online recommendation systems, the number of users or the number of items is a large number compared to the finite exploration budget. Second, non-stationary environments: real-world learning systems are highly dynamic, which is reflected in the fact that users' preferences change over time due to various internal or external factors, and item popularity vary due to fast emerging events. Failing to be adaptive to the changing environment may lead to sub-optimal decisions. Third, computational resource constraints: online learning algorithms need to be updated with the arriving data stream, which may require heavy computations for each arriving new data. The frequent updates cost a large amount of computation resources and prevent online learning tools from real-time applications. In this dissertation, we aim at developing efficient online learning solutions, and more specifically multi-armed bandit solutions, to conquer the aforementioned challenges. First, we study the correlations between actions in order to reduce the action space complexity of the online learning problem. We start with a special case of correlations, that playing one action also generates the side observations of other actions. Then we generalize the result to a general form of correlations over actions. Second, we consider multi-armed bandit in a non-stationary environment such that the learning algorithm can automatically detect the potential changes and adapt its decision making strategy accordingly. Finally, we shift our focus on the trade-off between computational complexity and optimality of the bandit algorithms. Our goal is to design a class of bandit algorithms that can be computed efficiently and also behaves closely to the optimal algorithms. By combining the proposed research, an online learning system can be more efficient in large-scale applications, more adaptive to the changing world, and cost less computational resources. The proposed solutions can be applied to a wide spectrum of applications including the recommendation systems, viral marketing, communication networks, clinical trials in healthcare, sequential experiment design, and many more.
Ness Shroff (Advisor)
Eylem Ekici (Committee Member)
Yingbin Liang (Committee Member)
Abhishek Gupta (Committee Member)
219 p.

Recommended Citations

Citations

  • Liu, F. (2020). Efficient Online Learning with Bandit Feedback [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1587680990430268

    APA Style (7th edition)

  • Liu, Fang. Efficient Online Learning with Bandit Feedback. 2020. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1587680990430268.

    MLA Style (8th edition)

  • Liu, Fang. "Efficient Online Learning with Bandit Feedback." Doctoral dissertation, Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1587680990430268

    Chicago Manual of Style (17th edition)