Internet technology has made international communication easy and convenient. This convenience has compelled a number of people to rely on electronic mail for almost all spheres of life – personal, business etc. Scrupulous organizations/individuals have taken undue advantage of this convenience and populate users’ inboxes with unwanted messages making email spam a menace. Even as anti-spam software producers think they have almost solved the problem, spammers come out with new techniques. One such tactic in the spammers’ toolbox comes in the form of image spam – messages that contain little more than a link to an image rendered in an HTML mail reader. The image typically contains the spam message one hopes to avoid, yet it is able to bypass most filters due to the composition and format of these pictures.
This research focuses on identifying these images as spam by using an artificial neural network (ANN), software programs used for recognizing patterns, based on the biological neural networks in our brains. As information propagates through a neural network, it “learns” about the data. A large collection of both spam and non-spam images have being used to train an ANN, and then test the effectiveness of the trained network against an unidentified or already identified set of pictures. This process involves formatting images and adding the desired training values expected by the ANN. Several different ANNS have being trained using different configurations of hidden layers and
nodes per layer. A detailed process for preprocessing spam image files is given, followed by a description on how to train an artificial neural network to distinguish between ham and spam. Finally, the trained network is tested against both known and unknown images.