Abstract
The online community has increasingly been inundated by a toxic wave of
harmful comments. In response to this growing challenge, we introduce a
two-stage ultra-low-cost multimodal harmful behavior detection method designed
to identify harmful comments and images with high precision and recall rates.
We first utilize the CLIP-ViT model to transform tweets and images into
embeddings, effectively capturing the intricate interplay of semantic meaning
and subtle contextual clues within texts and images. Then in the second stage,
the system feeds these embeddings into a conventional machine learning
classifier like SVM or logistic regression, enabling the system to be trained
rapidly and to perform inference at an ultra-low cost. By converting tweets
into rich multimodal embeddings through the CLIP-ViT model and utilizing them
to train conventional machine learning classifiers, our system is not only
capable of detecting harmful textual information with near-perfect performance,
achieving precision and recall rates above 99\% but also demonstrates the
ability to zero-shot harmful images without additional training, thanks to its
multimodal embedding input. This capability empowers our system to identify
unseen harmful images without requiring extensive and costly image datasets.
Additionally, our system quickly adapts to new harmful content; if a new
harmful content pattern is identified, we can fine-tune the classifier with the
corresponding tweets' embeddings to promptly update the system. This makes it
well suited to addressing the ever-evolving nature of online harmfulness,
providing online communities with a robust, generalizable, and cost-effective
tool to safeguard their communities.