Algorithm wars: X, Bluesky and black boxes

The original Twitter/X algorithm had a sound statistical background, but has shifted drastically throughout 2024. Here’s how it works, what changed, and why disaffected users are flocking elsewhere

A couple of things happened on 6 November 2024. Most notably, it was Emma Stone’s 36^th birthday. And you probably missed it, but that same day, the X platform, formerly known as Twitter, recorded more than 281,000 account deactivations on their webpage in a single day. In the US alone, more than 110,000 accounts were deactivated, which is about twice the previous record held for a single day, at least since Elon Musk bought the company in 2022. It is easy to correlate this with the outcome of the US presidential election that very same day, but more than half of the deleted accounts were not even from the US. Moreover, the social network has been shrinking for a while now, with deactivations at higher-than-average levels for most of 2024.

My personal take here is that the main reason X is falling apart is not because users in Brazil and UK became activists for US democracy, but simply due to the push the platform has been making for the last couple of months. Particularly, the new management’s apparent implementation of some odd changes in the algorithm, which resulted in quite a drastic shift in the platform experience, and one that most users found unpleasant to say the least.

How did the original Twitter algorithm work?

But what is this algorithm and why it is so important? Well, as you’ve probably noticed by now, all social media feeds take some internal decisions to prioritise some content, hide much of the rest and even to highlight some posts within a user’s personalised page. Similarly to how the Google algorithm ranks pages in search results, social media sites spend a good amount of resources finding the optimal feed for each user, expecting that such stream of content will keep their audience spending more time on their site/app. As you might expect, most companies do not share the specifics of how this ranking process works. But Twitter is (was?) different in this regard, as the original algorithm is open source and can be easily reviewed on Github.

Here’s a brief description of how the original Twitter algorithm worked, at least up to March 2023, when the code was originally shared and explained by the developers. Twitter’s algorithm is composed of a series of statistical models that rely on both user and specific tweet data to predict the likelihood of users interacting with presented content. As discussed by the developers, the process is summarized in three stages:

Finding an “optimal” set of candidate tweets to present to each user.
Ranking that set of tweets based on how likely each user will like, retweet or respond to them.
Clean up the results with some sanity checks to exclude tweets the user has already seen or blocked, among other things.

Let’s deep-dive into these one by one. The first step involves creating a subset of about 1,500 tweets from a pool of hundreds of millions. The company clearly explains that this initial subset considers both tweets within your network (people you follow or have interacted with) and those outside your network.

To rank the tweets within your network, Twitter uses logistic regression models to predict the probability of a target user interacting with each of the target tweets. The model relies on several features or predictors. To name a few, the model considers: number of days since the last interaction of target user and the creator of the target tweet; number of tweets the creator of the target tweet posted in the last week; language; country; number of followers of the creator of the target tweet; number of people they follow; mean interaction count of the target user, etc. The model also cleans up the data a bit, to mitigate the effect that fluctuations on user activity might have on the model. The logistic model is used for a binary classification: the model will predict whether the target user will have any interaction with the target tweet. The AUC (area under the curve) metric is used to assess the fit and find the best models. L₂ regularisation, like in ridge regression, is used for the model.

For tweets outside the network, the company’s been working on another similar logistic regression strategy, but they admittedly use it sparingly. Most of their recommendations from outside-of-your-network come from similarity measures that aim to measure how similar you are to other users, in order to serve tweets that users similar to you engaged with.

Another tool they use to recommend outside-of-your network users is an internal segmentation that bundles users within clusters called “communities”. Communities are groups of influencers (a subset of the network’s most prolific users) that are connected through their interactions, with clustering of the interaction graphs estimated through Metropolis-Hastings sampling methods. Developers mention that over 145k communities of influencers are found within their whole network of influencers. All the platform users are then assigned to the communities in which they have engaged, with personalised recommendations trending for the influencers in each community the user has not engaged with before.

After we reduce the scope of possible recommendations, we reach step two. From this initial list of about 1.5k tweets from the sources above, it is time for a neural network model to rank these tweets according to the likelihood of engagement. All candidate tweets are considered equal by the algorithm, which relies on the specific engagement metrics of each tweet to assign a rank, specific for each user. By now, we have a list of optimal tweets for the user, ranked from best to worst.

For the final step, once the candidate tweets are ranked, some heuristics and filters are implemented, which to me seem like sanity checks to catch any weird results from the models. This step removes any candidate tweets from accounts you muted or blocked, implements some rules avoid too many posts from the same authors, and lowers the scores of authors with negative feedback, among other things.

What happened?

So far, so good. The algorithm has a sound statistical background, and it seems to be deliberately designed to ensure the users receive a constant feed of relevant content. Sadly, we have ample evidence that this algorithm has shifted drastically during 2024, in not-so-transparent ways. As Twitter became X, it became widely reported that some changes were made on the platform, likely for economic and ideological reasons.

For starters, just a few months after publishing the algorithm, the platform started to highlight some posts for seemingly financial reasons. Tweets with more replies became more relevant, as well as video content, as these spaces became suitable for ad placement. Tweets from paying users with the blue tick also became more relevant on the platform. Conversely, posts with external links (as commonly seen from official news outlets and publications) became punished on the algorithm, a change even Elon Musk himself described.

There were some ideological changes implemented as well, mostly in alignment with Elon Musk’s personal preferences. Moderation became scarce on the platform, with research pointing to paying users being less scrutinised for spreading hateful or false information. In the run-up to the 2024 US presidential election, The Washington Post reported that Republican politicians and candidates in the US got more views, more follows and a much higher probability of reaching more than 20 million views than their democratic counterparts.

There’s another important driver that affects the feed algorithm, one which is not limited to X but common to all social media in general. What might be the biggest issue with social media algorithms is the fact that engagement is the target metric of the company, mostly driven by economic purposes. More engagement means more time spent on the platform, which translates into more eyes to sell ads to. X, Facebook, Instagram, TikTok and the rest of social media sites focus their models and algorithms on showing posts that drive a reaction from the users. And sadly, as has been widely reported and studied, those types of reactions are mostly triggered by controversial and inappropriate content: rage and animosity are the main drivers of engagement. That’s part of the reason why politicians like Trump have thrived in this new era, with politicians around the world now engaging in more uncivil discourse. To become relevant for their audience, politicians need the approval of the algorithm to get displayed at all. Reach is only reserved for the well-targeted, explosive posts that will drive reactions.

This does not only affect politicians. By now, you have probably heard of a new type of freelancer that relies on creating content, expecting to profit from it. I think they are called ‘influencers’ or something. They have become a powerful industry by themselves, with some vague estimations claiming that around 2.4% of social media users are considered influencers, resulting in 127 million influencers worldwide. Each one of them relies heavily on the algorithms: part of their full-time job involves testing and experimenting with different types of content to find the magic threshold that the algorithm flags as worthy of exposure. Because, realistically, we don’t get to decide which content becomes viral or not, the social media algorithms do. And, in the end, that usually means more controversial and over-the-top postings to drive engagement metrics up. This trend has been around for a while and explains why we often have a feed full of enraging and hyperbolic posts. This also contributes to an exaggerated perception of reality and has undesired consequences in the way we communicate and engage as a society. If you ask me, instead of spending too much time prophesying the exact date at which artificial intelligence will become an existential threat to mankind, we really need to spend more time discussing how not-so-intelligent machines are currently messing with our social interactions.

All this has clearly produced user fatigue that is, in my opinion, the main factor driving people away from platforms such as X, which is overly reliant on interaction key performance indicators. Even The Guardian is now out. Because even if all the stats on the models behind X are sound and accurate, the moment you choose to reward engagement, all the good and bad things that go with it will flood the network.

A better place?

If you are one of the thousands of accounts leaving X, you have likely heard that more options are arising. Bluesky, a not-so-new social media platform that’s basically a Twitter clone (even backed by ex-Twitter folks), has been in the news lately as it has massively grown to reach about 24.5 million users. Launched in 2021 and opened for public access in February 2024, Bluesky has seen a considerable influx recently, and a thriving data community forming within the platform.

Let’s enjoy it while it lasts. Because right now Bluesky exists as a non-profit, with some quite novel characteristics, particularly an openness to becoming a “marketplace for algorithms“. This is huge and unprecedented: Bluesky allows developers to build and share their own algorithms, so users don’t have to rely on the company’s models. This means users can actually refine and even build their own feed, instead of letting the owners pick the content for them. This will obviously limit Bluesky’s abilities to grow at some point and, as it faces more difficult decisions in the road ahead, let’s just hope the company does not pivot away from its origins.

As in the case of X, decisions made by social media companies might simply prove too much for a lot of users: as the owner’s incentives start to heavily deviate from those of its users, the sites risk losing popularity. In theory, all social media platform owners should easily identify big shifts in user engagement and modify their algorithms accordingly, but with the increasing reliance on black-box machine learning models, I do not honestly believe even the developers truly know what is going on within their rankings – much less the consequences that their choices are having on their users. Hopefully, all this leads to novel options and ideas in the social media space, because the perfect “digital town square” still feels a long way away.

Carlos Grajales is a statistician and business analytics consultant based in Mexico. He is also a member of the Significance editorial board.

You might also like: AI: statisticians must have a seat at the table