Skip to Main Content
Frequently Asked Questions
Submit an ETD
Global Search Box
Need Help?
Keyword Search
Participating Institutions
Advanced Search
School Logo
Files
File List
Madeti Preetham PDF A APPROVED.pdf (440.45 KB)
ETD Abstract Container
Abstract Header
Using Apache Spark's MLlib to Predict Closed Questions on Stack Overflow
Author Info
Madeti, Preetham
ORCID® Identifier
http://orcid.org/0000-0002-7130-009X
Permalink:
http://rave.ohiolink.edu/etdc/view?acc_num=ysu1463790062
Abstract Details
Year and Degree
2016, Master of Computing and Information Systems, Youngstown State University, Department of Computer Science and Information Systems.
Abstract
Monitoring posts quality on the Stack Overflow website is of critical importance to make the experience smooth for its users. It strongly disapproves unproductive discussion and un-related questions being posted. Questions can get closed for several reasons ranging from questions that are un-related to programming, to questions that do not lead to a productive answer. Manual moderation of the site's content is a tedious task as approximately seventeen thousand new questions are posted every day. Therefore, leveraging machine learning algorithms to identify the bad questions would be a very smart and time-saving method for the community. The goal of this thesis is to build a machine learning classifier that could predict if a question will be closed or not, given the various textual and post related features. A training model was created using Apache Spark's Machine Learning Libraries. This model could not only predict the closed questions with good accuracy, but computes the result in a very small time-frame.
Committee
Alina Lazar, PhD (Advisor)
Bonita Sharif, PhD (Committee Member)
Yong Zhang, PhD (Committee Member)
Pages
37 p.
Subject Headings
Computer Science
;
Information Systems
Keywords
Machine learning
;
Feature Extraction
;
Apache Spark
;
Stack Overflow
;
Textual Analysis
Recommended Citations
Refworks
EndNote
RIS
Mendeley
Citations
Madeti, P. (2016).
Using Apache Spark's MLlib to Predict Closed Questions on Stack Overflow
[Master's thesis, Youngstown State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1463790062
APA Style (7th edition)
Madeti, Preetham.
Using Apache Spark's MLlib to Predict Closed Questions on Stack Overflow.
2016. Youngstown State University, Master's thesis.
OhioLINK Electronic Theses and Dissertations Center
, http://rave.ohiolink.edu/etdc/view?acc_num=ysu1463790062.
MLA Style (8th edition)
Madeti, Preetham. "Using Apache Spark's MLlib to Predict Closed Questions on Stack Overflow." Master's thesis, Youngstown State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=ysu1463790062
Chicago Manual of Style (17th edition)
Abstract Footer
Document number:
ysu1463790062
Download Count:
694
Copyright Info
© 2016, some rights reserved.
Using Apache Spark's MLlib to Predict Closed Questions on Stack Overflow by Preetham Madeti is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. Based on a work at etd.ohiolink.edu.
This open access ETD is published by Youngstown State University and OhioLINK.