The connectivity, the world has achieved due to social media has a flip-side to it. It has been extensively reported that incidences of spamming by bots and fake accounts on Twitter have been increasing. Such accounts can help trend and spread Fake News and opinions, creating confusions and potentially, spreading rumours. Also, using bots and fake accounts to trend a topic on twitter or generate artificial likes is becoming commonplace. Twitter, in particular, is one of the worst sufferers of this “tragedy of commons” due to its openness. The talented engineers at Twitter respond with regular product and policy changes in order to curb this menace but the problem still persists. While the arms race between spammers and Twitter wages on, we tried to determine whether AI can help an information seeker stay on top of this game.

fake accounts
                                           Twitter is suffering from the age-old problem called “tragedy of commons”

To bust fake accounts using AI, we first need to define what constitutes a fake account. We had two hypotheses about the type of fake accounts that could exist. When we tried to bust these accounts using AI algorithms, it turned out both of these hypotheses could indeed help us define accounts as “Fake” with a certain probability. At Karna Analytics, our Machine Learning research team ran multiple experiments to track these type of accounts and categorised them into two types — “Spammy Users” and “Bot Users” based on their activity and content of their post.

In this blog post, we talk about our approach that we use to detect “Spammy Users”/”Fake Accounts”(or “Spammers”) and how our approach can be used to improve the quality of research performed using data from social media. We have run our analysis based on data we tracked for two trending hashtags: #Presidentielle (For French Presidential elections 2017, which we predicted correctly using AI) and #Jio (A popular telecom company in India).

The Hypothesis of Fake Accounts: Spammers are not that good at spamming

We observe that getting fake accounts to tweet and increase mentions about a #hashtag and make it trending topic is one of the most common spamming tricks (google for “twitter hashtag trending services” and you would know what we mean). From spammers perspective, posting tweets from lots of fake accounts and that too in quick succession is a challenging task. Ideally, a spammer should be posting tweets which are relevant yet different from each other so as to make the trend look genuine. Our key hypothesis is that achieving this within the constraints of time and money is challenging and potential spammers end up doing little to edit their tweets. As seen below, even celebrities that tweeted about Jio (probably as part of influencer marketing strategy) ended up posting the same tweets.

fake accounts

Celebrities posting similar tweets support our hypothesis that spammers make little efforts to edit their tweets before posting.

Based on this idea, we have found that spammers can be effectively identified if we look at all the tweets about a topic and figure out the tweets that are contextually very close to each other, made in a very short span of time (~15 minutes). For this, we use our proprietary text analytics algorithm called Semantic Similarity for clustering contextually similar tweets. To take an analogy from the real world, we intend to use AI to closely examine answer sheets of students to identify who has cheated during the exam. For those looking to get some intuition on how this works, we have added below a visualization of how we cluster tweets that are contextually similar.

fake accounts

A cluster of tweets that contextually speak about the pricing for Jio prime membership

We analysed more than 50,000 tweets for #Presidentialle and #Jio and used Semantic Similarity technique to identify clusters of users that post very similar tweets multiple times. We produced the below list of potential spammers based on the contextual similarity and frequency of their tweets. If you search these users on Twitter, you would notice some users accounts have already been deleted or don’t appear in search results as they were classified as spammers by Twitter as well.

fake accountsfake accounts

Potential Spammers for #Presidentielle (left) and Reliance Jio (right)

It is important to note that the user ‘@JioCare’ is the customer support handle for Reliance Jio. It is categorized as a potential spammer by our model because of its standard replies to user queries. For example, the handle might reply with a standard note for detailed assessment of the query:

fake accounts

Jio’s customer reply tweets are generic and hence it got classified as a potential spammer

As you can see from the lists, the users have tweeted multiple times in the selected time period. The Semantic Similarity clusters the contextually similar Tweets and the handles of such users can be identified.

Why spam filtering is important?

Filtering the spam users allows you to listen to unbiased opinions of the users about a topic and filter the noise created by spammers. We are listing down few use-cases of spam filtering:

    • Get unique and unbiased data for analyzing what users are talking about your brand.
    • Assess the performance of a particular marketing campaign on Twitter to understand if the tweets have been generated organically or artificially by spammers.
    • Get a better understanding of your customer persona by weeding out spammers.
  • Political organisations and intelligence agencies can use spam filtering to analyse fake accounts that are spamming and pushing their ideological agenda.

This approach is one of the many that we have successfully tested for finding spammy users. In the next article, we will discuss how we successfully identified bot users using a similar approach.

If you are a marketing or research specialist interested in analyzing social media, we have flexible and cutting-edge AI-based solutions for you. Do let us know your thoughts about our approach in the comments below.

Karna.ai is a division of ParallelDots. Karna is a social media marketing research platform. We collect and analyse millions of mentions from News, Twitter, Facebook and Instagram and deliver AI driven in-depth insights through automated reports and custom analysis. 

Want to be a king of social media? Click here to schedule a free demo.

market research

Leave a Reply

Your email address will not be published. Required fields are marked *