Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

DYNAMIC DECISION APPROXIMATE EMPIRICAL REWARD (DDAER) PROCESSES

Abstract Details

2014, Doctor of Philosophy, Ohio State University, Industrial and Systems Engineering.
This dissertation concerns model uncertainty in the context of Markov Decision Processes (MDPs). We argue that, taking data limitations into account, the system is no longer Markovian since the transition probabilities are not known. Also, the amount of available data from history can be viewed as part of the system state description. This motivates our introduction of ``Dynamic Decision Approximate Empirical Reward" (DDAER) as a generalization of MDPs. We offer a straightforward expected reward formulation for finite horizon problems and solve it with a generalized backward induction algorithm and prove its convergence together with solution quality guarantees. \par For infinite horizon problems, we prove that generalized contraction mapping is not guaranteed to converge using a counterexample. For small problems, we establish that (assuming sample normality) enumeration combined with ranking and selection methods offers solution quality guarantees. Further, we propose a formulation involving a new type of probabilistic constraint which is the probability that the policy would be found to be optimal with perfect information. We show that a so-called ``two phase" method combining simultaneous multinomial intervals in phase I and selection and ranking in phase II is associated with guarantees of solution quality. \par Several numerical examples illustrate the application of the proposed methods. In relation to numerical test problems, we provide a comparison table to illustrate the potentially important practical benefits of the proposed infinite horizon methods compared with alternatives including robust and Bayesian adaptive methods. The benefits derive from the fact that robust methods are conservative and the naive methods fail to account for the amount of available data. Next, we formulate jobshop scheduling as a reward process. Then, we demonstrate the application of the proposed ``two-phase" algorithm. The solutions provide insights into the advisability of shortest processing time heuristics for situations in which many jobs are late and earliest due date heuristic for cases in which few jobs are known to be late.\par We also offer suggestions for future research. These include addressing larger problem sizes with more states and actions using extensions of the proposed methods. Also, we conjecture the possibility of a solution for the infinite horizon problem using a scenario set approach as was done for finite horizon problems. Further, by viewing the proposed methods as a ``big data" analysis tool, desirable aggregations of states or actions, having fewer states and actions could offer benefits in terms of improved solutions and computational speed.
Theodore Allen (Advisor)
Guzin Bayraksan (Committee Member)
Simge Kucukyavuz (Committee Member)
120 p.

Recommended Citations

Citations

  • Xie, C. (2014). DYNAMIC DECISION APPROXIMATE EMPIRICAL REWARD (DDAER) PROCESSES [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1398991609

    APA Style (7th edition)

  • Xie, Chen. DYNAMIC DECISION APPROXIMATE EMPIRICAL REWARD (DDAER) PROCESSES. 2014. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1398991609.

    MLA Style (8th edition)

  • Xie, Chen. "DYNAMIC DECISION APPROXIMATE EMPIRICAL REWARD (DDAER) PROCESSES." Doctoral dissertation, Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1398991609

    Chicago Manual of Style (17th edition)