After a two-year gap, X (formerly Twitter) finally released its latest transparency report for the first half of 2024 in September — a marked departure from its previous practice of more frequent disclosures. Transparency reports, which platforms like Facebook and Google also release, are designed to offer insights into how companies manage content, enforce policies and respond to government requests.
The latest data from X is both perplexing and illuminating: while the number of reported accounts and tweets has surged, the platform’s moderation decisions have shifted. The most stark reveal: alarming reports of child exploitation content, yet a significant decline in actions taken against hateful content.
The report underscores a key trend in content moderation: X’s increasing reliance on artificial intelligence to identify and manage harmful behavior. According to the platform, moderation is enforced through “a combination of machine learning and human review,” with AI systems either taking direct action or flagging content for further examination. Can machines truly manage the responsibility of moderating sensitive issues — or are they exacerbating the problems they’re meant to solve?
More Red Flags, Less Action
X’s transparency report reveals staggering numbers. In the first half of 2024, users reported over 224 million accounts and tweets — a massive surge compared to the 11.6 million accounts reported in the latter half of 2021. Despite this nearly 1,830% increase in reports, the number of account suspensions grew only modestly, rising from 1.3 million in the latter half of 2021 to 5.3 million in 2024 — a roughly 300% increase. More strikingly, out of the more than 8.9 million reported posts related to child safety this year, X removed only 14,571.
Part of the discrepancy in these numbers can be attributed to changing definitions and policies regarding hate speech and misinformation. Under previous management, X released comprehensive 50-page reports detailing takedowns and policy violations. In contrast, the latest report is 15 pages long and uses newer measurement methods.
Furthermore, the company rolled back rules on COVID-19 misinformation and no longer classifies misgendering or deadnaming as hate speech, complicating the enforcement metrics. For instance, the report indicates that X suspended only 2,361 accounts for hateful conduct compared to 104,565 in the second half of 2021.
Can AI Make Moral Judgments?
As AI becomes the backbone of several global content creation and moderation systems — including platforms like Facebook, YouTube, and X itself — several questions remain about its effectiveness in addressing harmful behavior.
Automated reviewers have long been shown to be prone to error, struggling to accurately interpret hate speech and frequently misclassifying benign content as harmful. For example, in 2020, Facebook’s automated systems were seen blocking ads from struggling businesses, and in April this year, its algorithm mistakenly flagged posts from the Auschwitz Museum as going against its community standards.
Moreover, many algorithms are developed using datasets primarily from the Global North, which can lead to insensitivity to diverse contexts.
A September memo from the Centre for Democracy and Technology highlighted the pitfalls of this approach, noting that a lack of diversity in natural language processing teams can negatively impact the accuracy of automated content moderation, particularly in dialects like Maghrebi Arabic.
As Mozilla Foundation notes, “As it is woven into all aspects of our lives, AI has the potential to reinforce existing power hierarchies and societal inequalities. This raises questions about how to responsibly address potential risks for individuals in the design of AI.”
In practice, AI falters in more nuanced areas, such as sarcasm or coded language. This inconsistency may also explain the decline in actions against hate speech on X, where AI systems struggle to identify the full spectrum of harmful behaviors.
Despite advancements in language AI technology, detecting hate speech remains a complex challenge. A study by Oxford and the Alan Turing Institute in 2021 tested several AI hate speech detection models, revealing significant performance gaps. Their model, HateCheck, includes targeted tests for various hate speech types and non-hateful scenarios that often confuse AI. Looking at tools like Google Jigsaw’s Perspective API and Two Hat’s SiftNinja, the study found that Perspective over-flagged non-hateful content while SiftNinja under-detected hate speech.
Over-reliance on AI moderation also risks infringing on free expression, particularly for marginalized communities that often use coded language. Also, researchers argue that even well-optimized algorithms may worsen existing issues within content policies. As platforms push to automate their moderation processes, they may end up curbing free expression, particularly for marginalized communities who often rely on coded language.
A Wider Lens
X’s struggles are not unique. Other platforms, such as Meta (which owns Facebook, Instagram and Threads), have shared similar trajectories with their AI moderation systems. Meta has acknowledged that its algorithms often fail to adequately identify disinformation or hate speech, resulting in false positives and missed instances of harmful behavior.
The troubling trends highlighted in X’s transparency report may pave the way for new regulatory policies, particularly as several U.S. states move forward with social media restrictions for minors. Experts at the AI Now Institute advocate for increased accountability from platforms regarding their AI moderation systems, urging for transparency and ethical standards. Lawmakers may need to consider regulations requiring a more effective combination of AI and human moderators to ensure fairer moderation.
As platforms increasingly shape political and social discourse, especially in light of key upcoming elections, the stakes are high. However, the current landscape, including the shutdown in August of Meta’s CrowdTangle — an analytics tool that helped researchers monitor social media posts, particularly to track misinformation — has made it more difficult to access data. Additionally, Elon Musk’s discontinuation of free access to the X API in early in 2023 further restricted access to valuable data, raising concerns about the ability to examine these trends.
In the long run, social media platforms may have to address the ethical challenges posed by AI-driven moderation. Can we trust machines to make moral judgments about the content we consume? Or will platforms need a more fundamental overhaul to aim for more fairness and accountability?