Research Interest:


Image Spam Hunter

Spam has become a public hazard of email users around the world. Image spam is a type of email spam that embeds text content into graphical images to bypass traditional spam filters based on statistics of text characters. Ensuring that the embedded text content be readable, image spammers leverage a set of image processing technologies to vary the visual content of individual messages, e.g., by changing foreground colors, backgrounds, font types, or even rotating and adding artifacts to the images. Thus, they pose great challenges to conventional spam filters since we need to partly resolve visual recognition problems, which are in general difficult to address.

To effectively detect spam images, it is desirable to apply image content analysis technologies to identifying them on both server side and client side. Due to the fundamentally adversarial behavior from image spammers, we extensively employ various machine learning technologies, ranging from unsupervised cluster analysis, semi-supervised or supervised classification, to more interactive active learning algorithms, to effectively analyze the statistics of visual features. Hence we are able to achieve a comprehensive solution for spam filtering to meet with different kinds of system and usage requirements. Compared to previous works, which mostly filter the spam images on the client side, we present a more desirable comprehensive solution which embraces both server side filtering and client side detection to effectively mitigate image spam.


The High-Performance Network Anomaly/Intrusion Detection and Mitigation (HPNAIDM) Systems

Existing intrusion detection systems (IDS) have three shortcomings: 1) are mostly host-based and not scalable to high-speed networks. Thus they cannot prevent the rapid propagation of the latest viruses/worms which can infect most vulnerable machines in the Internet in only ten minutes; 2) are mostly signature-based and unable to recognize unknown anomalies; and 3) are isolated or centralized systems. To address these limitations, we propose HPNAIDM system with the following features: 1) online traffic recording and analysis on high-speed networks, 2) online statistical anomaly detection, 3) integrated approach for false positive reduction, 4) hardware speedup for real-time detection, and 5) scalable anomaly/intrusion alarm fusion from multiple sources.


Pollution Resilience for Internet Caches

We investigate and develop efficient methods to detect a class of pollution attacks that aim to degrade a proxy's caching capabilities, either by ruining the cache file locality, or by inducing false file locality.


Detecting Stealthy Spreaders Using Online Outdegree Histograms

We consider the problem of detecting the presence of a sufficiently large number of hosts that connect to more than a certain number of unique destinations within a given time window, at high-speed networks. Previous techniques have focused on detecting the sources with an extremely large outdegree. However, such techniques will fail to detect spreaders such as bot scans in which each scanning host will scan only a moderate, fixed number of destinations. In contrast, our scheme maintains a small, fixed size memory usage, and is still able to detect stealthy spreader scenarios by approximating outdegree histograms from continuous traffic.

Constructed 09/27/2005, Revised 03/10/2010 , Copyright©2010, Yan Gao