Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Algorithms and Models for Collaborative Filtering from Large Information Corpora

Strunjas, Svetlana

Abstract Details

2008, PhD, University of Cincinnati, Engineering : Computer Science.

In this thesis we propose novel collaborative filtering approaches for large data sets. We also demonstrate how these collaborative approaches can be used for creating user recommendations for items, based upon preferences towards items that users demonstrated in the past.

We propose a framework, called a collaborative partitioning or CP for short, that is focused on finding a partition of a given set of items in order to maximize the number of partition-satisfied users. Two theoretical models for evaluating the quality of partitions are proposed. Both are introduced as bicriteria optimization problems with the percentage of satisfied users and the level of users satisfaction as the two optimization coefficients. As both of these bicriteria optimization problems are NP-hard, we propose Hierarchical Agglomerative Clustering - based approaches to compute approximations of their solutions. The results obtained by running the heuristic approaches on a real dataset show that the proposed approaches for CP have good results and find items partitions that are very close to a human-based genre partition for a given set. The genre partitions are partitions of items according to some human-created classifications. The results also show that the proposed heuristic approaches are a very good starting point in creating a top-k recommendation algorithms.

The second part of this thesis proposes a collaborative filtering framework for finding seminal and seminally affected work for sets of items. The concept of seminal work for a set of items is used to mark items released in the past that are highly correlated to some future sets of items in the terms of users preferences. Similarly, the seminally affected work is a concept that is used in this thesis to mark items that are highly correlated to some previously released (older) items in the terms of users preferences. In this approach, we translate item-item correlation into a correlation directed acyclic graph (DAG). Direction in the DAG is determined by a chronological ordering of items. We demonstrate and validate the proposed approach by applying it on the web-based system called MovieTrack. This system uses seminal and seminally affected work in movies to give movie recommendations to users. It is built by applying the previously proposed approach on a real data set of movie reviews released by Netflix.

Fred Annexstein, PhD (Committee Chair)
Kenneth Berman, PhD (Committee Co-Chair)
Karen Davis, PhD (Committee Member)
John Schlipf, PhD (Committee Member)
Kevin Kirby, PhD (Committee Member)
114 p.

Recommended Citations

Citations

  • Strunjas, S. (2008). Algorithms and Models for Collaborative Filtering from Large Information Corpora [Doctoral dissertation, University of Cincinnati]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1220001182

    APA Style (7th edition)

  • Strunjas, Svetlana. Algorithms and Models for Collaborative Filtering from Large Information Corpora. 2008. University of Cincinnati, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=ucin1220001182.

    MLA Style (8th edition)

  • Strunjas, Svetlana. "Algorithms and Models for Collaborative Filtering from Large Information Corpora." Doctoral dissertation, University of Cincinnati, 2008. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1220001182

    Chicago Manual of Style (17th edition)