In this thesis, we discuss two issues in the learning to rank area, choosing effective objective loss function, constructing effective regresstion trees in the
gradient boosting framework, as well as a third issus, applying learning to rank models into statistcal machine translation.
First, list-wise based learning to rank methods either directly optimize performance measures or optimize surrogate functions of performance measures that have
smaller gaps between optimized losses and performance measures, thus it is generally believed that they should be able to lead to better performance than point-
and pair-wise based learning to rank methods. However, in real-world applications, state-of-the-art practical learning to rank systems, such as MART and
LambdaMART, are not from list-wise based camp. One cause may be that several list-wise based methods work well in the popular but very small LETOR datasets but
fail in real-world datasets that are often used for training practical systems.
We propose a list-wise learning to rank method that is based on a list-wise surrogate function, the Plackett-Luce (PL) model. The PL model has convex loss to
ensure a global optimal guarantee, and is proven to be consistent to certain performance measures such as NDCG score. When we conduct experiments on the PL
model, we observe that it is actually unstable in performance; when the data has rich enough features, it gives very good results, but for data with scarce
features, it fails horribly. For example, when we apply the PL with a linear model on the Microsoft 30K dataset, it gives 7.6 points worse NDCG@1 score than an
average performance of several linear systems. This motivates us to propose our new ranking system, PLRank, that is suitable for any data sets through a mapping
from feature space into tree space to gain more expressive power. PLRank is trained based on the gradient boosting framework, and it is simple to implement. It
has the same time complexity as the LambdaMART, and runs a little bit faster in practice. Moreover, we extend three other list-wise surrogate functions in a
gradient boosting framework for a fair and full comparison, and we find that the PL model has special advantages.
Our experiments are conducted on the two largest publicly available real-world datasets, Yahoo challenge 2010 and Microsoft 30K. The results show this is the
first time in the single model level for a list-wise based system to match or overpass state-of-the-art point- and pair-wise based ones, MART, LambdaMART, and
McRank, in real-world datasets.
Second, industry-level applications of learning to rank models have been dominated by gradient boosting framework, which fits a tree using least square error
(SE) principle. Another tree fitting principle, (robust) weighted least square error ((R)WSE), has been widely used in classification, such as LogitBoost and
its variants, but hasn't been reformulated to fulfill learning the rank tasks. For both principles, there is a lack of deep analysis on their relationship in
the scenario of learning to rank. Motivated by AdaBoost, we propose a new principle named least objective loss based error (OLE) that enables us to analyze
several important learning to rank systems: \emph{we prove that (R)WSE is actually a special case of OLE for derivative additive loss functions; OLE, (R)WSE and
SE are equivalent for MART system}. Under the guidance of OLE principle, we implement three typical and strong systems and conduct our experiments in two
real-world datasets. Experimental results show that our proposed OLE principle improves most results over SE.
Thrid, Margin infused relaxed algorithms (MIRAs) dominate model tuning in statistical machine translation in the case of large scale features, but also they are
famous for the complexity in implementation. We introduce a new method, which regards an N-best list as a permutation and minimizes the Plackett-Luce loss of
ground-truth permutations. Experiments with large-scale features demonstrate that, the new method is more robust than MERT ; though it is only matchable with
MIRAs, it has a comparatively advantage, easier to implement.