Skip to Main Content
 

Global Search Box

 
 
 
 

Files

File List

ETD Abstract Container

Abstract Header

Learning to Rank Algorithms and Their Application in Machine Translation

Abstract Details

2015, Doctor of Philosophy (PhD), Wright State University, Computer Science and Engineering PhD.
In this thesis, we discuss two issues in the learning to rank area, choosing effective objective loss function, constructing effective regresstion trees in the gradient boosting framework, as well as a third issus, applying learning to rank models into statistcal machine translation. First, list-wise based learning to rank methods either directly optimize performance measures or optimize surrogate functions of performance measures that have smaller gaps between optimized losses and performance measures, thus it is generally believed that they should be able to lead to better performance than point- and pair-wise based learning to rank methods. However, in real-world applications, state-of-the-art practical learning to rank systems, such as MART and LambdaMART, are not from list-wise based camp. One cause may be that several list-wise based methods work well in the popular but very small LETOR datasets but fail in real-world datasets that are often used for training practical systems. We propose a list-wise learning to rank method that is based on a list-wise surrogate function, the Plackett-Luce (PL) model. The PL model has convex loss to ensure a global optimal guarantee, and is proven to be consistent to certain performance measures such as NDCG score. When we conduct experiments on the PL model, we observe that it is actually unstable in performance; when the data has rich enough features, it gives very good results, but for data with scarce features, it fails horribly. For example, when we apply the PL with a linear model on the Microsoft 30K dataset, it gives 7.6 points worse NDCG@1 score than an average performance of several linear systems. This motivates us to propose our new ranking system, PLRank, that is suitable for any data sets through a mapping from feature space into tree space to gain more expressive power. PLRank is trained based on the gradient boosting framework, and it is simple to implement. It has the same time complexity as the LambdaMART, and runs a little bit faster in practice. Moreover, we extend three other list-wise surrogate functions in a gradient boosting framework for a fair and full comparison, and we find that the PL model has special advantages. Our experiments are conducted on the two largest publicly available real-world datasets, Yahoo challenge 2010 and Microsoft 30K. The results show this is the first time in the single model level for a list-wise based system to match or overpass state-of-the-art point- and pair-wise based ones, MART, LambdaMART, and McRank, in real-world datasets. Second, industry-level applications of learning to rank models have been dominated by gradient boosting framework, which fits a tree using least square error (SE) principle. Another tree fitting principle, (robust) weighted least square error ((R)WSE), has been widely used in classification, such as LogitBoost and its variants, but hasn't been reformulated to fulfill learning the rank tasks. For both principles, there is a lack of deep analysis on their relationship in the scenario of learning to rank. Motivated by AdaBoost, we propose a new principle named least objective loss based error (OLE) that enables us to analyze several important learning to rank systems: \emph{we prove that (R)WSE is actually a special case of OLE for derivative additive loss functions; OLE, (R)WSE and SE are equivalent for MART system}. Under the guidance of OLE principle, we implement three typical and strong systems and conduct our experiments in two real-world datasets. Experimental results show that our proposed OLE principle improves most results over SE. Thrid, Margin infused relaxed algorithms (MIRAs) dominate model tuning in statistical machine translation in the case of large scale features, but also they are famous for the complexity in implementation. We introduce a new method, which regards an N-best list as a permutation and minimizes the Plackett-Luce loss of ground-truth permutations. Experiments with large-scale features demonstrate that, the new method is more robust than MERT ; though it is only matchable with MIRAs, it has a comparatively advantage, easier to implement.
Shaojun Wang, Ph.D. (Advisor)
Keke Chen, Ph.D. (Committee Member)
Xinhui Zhang, Ph.D. (Committee Member)
Michael Raymer, Ph.D. (Committee Member)
103 p.

Recommended Citations

Citations

  • Xia, T. (2015). Learning to Rank Algorithms and Their Application in Machine Translation [Doctoral dissertation, Wright State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=wright1451610555

    APA Style (7th edition)

  • Xia, Tian. Learning to Rank Algorithms and Their Application in Machine Translation. 2015. Wright State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=wright1451610555.

    MLA Style (8th edition)

  • Xia, Tian. "Learning to Rank Algorithms and Their Application in Machine Translation." Doctoral dissertation, Wright State University, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=wright1451610555

    Chicago Manual of Style (17th edition)