Skip to Main Content
 

Global Search Box

 
 
 
 

ETD Abstract Container

Abstract Header

Spam Analysis and Detection for User Generated Content in Online Social Networks

Abstract Details

2013, Doctor of Philosophy, Ohio State University, Computer Science and Engineering.
Recent years have witnessed the success of a number of online social networks (OSNs) and explosive increasing of social media. These social networking and social media sites have attracted a significant number of participants that contribute various types of contents on the Internet, which are generally referred as user generated content (UGC). A well designed UGC network can utilize the wisdom of crowds to collect, organize, and vote user contributed content to generate high quality knowledge with a relatively low cost. However, the open environment of UGC system also makes it easy to be polluted and attacked by spammers and malicious users. How users participate in UGC networks, especially how users contribute content and share content with their friends and other users, is fundamental to spam detection and high quality knowledge discovery. In this dissertation, we investigate two important research issues: (1) discovering user content generation patterns in OSNs, focusing on publicly available content (knowledge sharing), and (2) detecting spam in user generated content based on our discovered patterns. With the access to three large OSN user activity logs, including Yahoo! Blogs, Yahoo! Answers, and Yahoo! Del.icio.us, for a duration of up to 4.5 years, we are able to well analyze the patterns of content generation patterns of social network users in detail. Our analysis consistently shows that users' posting behavior in these networks exhibits strong daily and weekly patterns, but the user active time in these OSNs does not follow commonly assumed exponential distributions. We also show that the user posting behavior in these OSNs follows stretched exponential distributions instead of widely accepted power law distributions. Our discovery lays a foundation for user behavior analysis in social networks, and serves as a ground truth for anomaly detection and anti-spam. Applying the user posting behavior distribution pattern, we further conducted a comprehensive analysis of spamming activities on a large commercial social blog UGC site in 325 days covering over 6 million posts and nearly 400 thousand users. Observing power law distribution instead of our discovered stretched exponential distribution on user contributions, we find it actually indicates serious UGC spam attack activities. Our analysis shows that UGC spammers exhibit unique non-textual patterns, such as posting activities, advertised spam link metrics, and spam hosting behaviors. Based on these non-textual features, we show with commonly used classification methods that a high detection rate could be achieved offline. These results further motivate us to develop a runtime scheme, BARS, to detect spam posts based on these spamming patterns. The experimental results demonstrate the effectiveness and robustness of BARS. To timely detect spam in large social network sites, it is desirable to discover self-tuned, unsupervised schemes that can save the training cost of supervised classification schemes. Identifying the limitations of existing unsupervised detection schemes due to assumptions of spammer behaviors that no longer hold, we design an unsupervised spam detection scheme, called UNIK. Instead of picking out spammers directly, UNIK leverages both the connection-based social graph and the content-based user-link graph to remove non-spammers from the network first, and then clusters spammers with the landing pages they are trying to advertise. Based on highly accurate detection results of UNIK, we further analyze a number of spam campaigns. The result shows that different spammer clusters demonstrate distinct characteristics, implying the ability of UNIK to automatically extract spam signatures.
Xiaodong Zhang (Advisor)
Feng Qin (Committee Member)
Ten H. (Steve) Lai (Committee Member)
131 p.

Recommended Citations

Citations

  • Tan, E. (2013). Spam Analysis and Detection for User Generated Content in Online Social Networks [Doctoral dissertation, Ohio State University]. OhioLINK Electronic Theses and Dissertations Center. http://rave.ohiolink.edu/etdc/view?acc_num=osu1365520334

    APA Style (7th edition)

  • Tan, Enhua. Spam Analysis and Detection for User Generated Content in Online Social Networks. 2013. Ohio State University, Doctoral dissertation. OhioLINK Electronic Theses and Dissertations Center, http://rave.ohiolink.edu/etdc/view?acc_num=osu1365520334.

    MLA Style (8th edition)

  • Tan, Enhua. "Spam Analysis and Detection for User Generated Content in Online Social Networks." Doctoral dissertation, Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1365520334

    Chicago Manual of Style (17th edition)