Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Using Genetic Algorithms for Feature Set Selection in Text Mining

Rogers, Benjamin Charles

Abstract Details

2014, Master of Science, Miami University, Computer Science and Software Engineering.
The rationale behind design decisions are often recorded in different project documentation. One way to extract this rationale is by using text mining. Text mining involves data mining over natural language documents. The performance of a text mining system depends on many factors, including the feature sets used. Exhaustive searching for optimal combinations of feature sets is rarely feasible, often leading researchers to make guesses as to which combinations to use. A genetic algorithm is used to find optimal combinations of feature sets for binary rationale, the argumentation subset, the arguments-all subset, decisions, and alternatives. The genetic algorithm uses GATE, WEKA, and a pipeline that allows the automatic passing of information from one to the other. This pipeline is also useable in other text mining contexts. The genetic algorithm produced medium sized feature sets which tended to prefer unigrams and bigrams over 4-grams and 5-grams when compared to random selection.
Janet Burge, PhD (Advisor)
Dhananjai Rao, PhD (Committee Member)
Michael Zmuda, PhD (Committee Member)
69 p.

Recommended Citations

Citations

  • Rogers, B. C. (2014). Using Genetic Algorithms for Feature Set Selection in Text Mining [Master's thesis, Miami University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=miami1389811705

    APA Style (7th edition)

  • Rogers, Benjamin. Using Genetic Algorithms for Feature Set Selection in Text Mining. 2014. Miami University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=miami1389811705.

    MLA Style (8th edition)

  • Rogers, Benjamin. "Using Genetic Algorithms for Feature Set Selection in Text Mining." Master's thesis, Miami University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=miami1389811705

    Chicago Manual of Style (17th edition)