Abusive and Hate Speech Tweets Detection with Text Generation

Nalamothu, Abhishek

Keyword Search

School Logo

Abhishek_thesis(1).pdf (3.51 MB)

Abusive and Hate Speech Tweets Detection with Text Generation

Author Info

Nalamothu, Abhishek

Permalink:

http://rave.ohiolink.edu/etdc/view?acc_num=wright1567510940365305

Year and Degree

2019, Master of Science (MS), Wright State University, Computer Science.

Abstract

According to a Pew Research study, 41% of Americans have personally experienced online harassment and two-thirds of Americans have witnessed harassment in 2017. Hence, online harassment detection is vital for securing and sustaining the popularity and viability of online social networks. Machine learning techniques play a crucial role in automatic harassment detection. One of the challenges of using supervised approaches is training data imbalance. Existing text generation techniques can help augment the training data, but they are still inadequate and ineffective. This research explores the role of domain-specific knowledge to complement the limited training data available for training a text generator. We conduct domain-specific text generation by combining inverse reinforcement learning (IRL) with domain-specific knowledge. Our approach includes two adversarial nets, a text generator and a Reward Approximator (RA). The objective of the text generator is to generate domain-specific text that is hard to discriminate from the real-world domain-specific text. The objective of the reward approximator is to discriminate the generated domain-specific text from the real-world text. During adversarial training, the generator and the RA play a mini-max game and try to arrive at a win-win state. Ultimately, augmenting diversified and semantically meaningful, generated domain-specific data to the existing dataset improves detection of domain-specific text. In addition to developing the Generative Adversarial Network-based framework, we also present a novel evaluation that uses variants of the BLEU metric to measure the diversity of generated text; uses perplexity and cosine similarity to measure the quality of the generated text. Experimental results show that the proposed framework outperforms a previous baseline (IRL without domain knowledge) on harassment (i.e., Abusive and Hate speech) tweet generation. Additionally, the generated tweets effectively augment the training data for online abusive and hate speech detection (tweet classification) resulting in a 9% accuracy improvement in classification using the augmented training set compared to the existing training set.

Committee

Amit Sheth, Ph.D. (Advisor)
Valerie L. Shalin, Ph.D. (Committee Member)
Keke Chen, Ph.D. (Committee Member)

Pages

104 p.

Subject Headings

Computer Science

Keywords

Text generation; Generative adversarial network; Inverse Reinforcement Learning; Online Harassment detection

Nalamothu, A. (2019). Abusive and Hate Speech Tweets Detection with Text Generation [Master's thesis, Wright State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=wright1567510940365305
APA Style (7th edition)
Nalamothu, Abhishek. Abusive and Hate Speech Tweets Detection with Text Generation. 2019. Wright State University, Master's thesis. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=wright1567510940365305.
MLA Style (8th edition)
Nalamothu, Abhishek. "Abusive and Hate Speech Tweets Detection with Text Generation." Master's thesis, Wright State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=wright1567510940365305
Chicago Manual of Style (17th edition)

Document number:

wright1567510940365305

Download Count:

922

Copyright Info

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Abusive and Hate Speech Tweets Detection with Text Generation

Abstract Details

Recommended Citations

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Global Search Box

Files

File List

ETD Abstract Container

Abstract Header

Abusive and Hate Speech Tweets Detection with Text Generation

Abstract Details

Recommended CitationsRefworksEndNoteRISMendeley

Citations

Abstract Footer

Global Footer

Ohio Department of Higher Education

State Government Links

Education Links

Recommended Citations