Search
Search

AUB AI

Content moderation: The fine line between protection and restriction

In this special series, students from the SAIL Institute at AUB build on their academic papers with the L’Orient Today team to look beyond the buzz around AI and explain how it works, where it falls short and how it's already shaping our lives.

Content moderation: The fine line between protection and restriction

Illustration by Celine Bejjani

For over two decades, content moderation online has caused debate due to its fundamental tension: how to protect users from harm, while safeguarding their right to free expression.

In Lebanon, as a reaction to the 2019 anti-governmental protest movement, authorities increased enforcement of criminal defamation laws, summoning journalists and activists over online criticism. Rights advocates have pointed out that Lebanon’s vaguely worded laws make room for abuse of power and can be used to silence dissent.

Between 2017 and 2023, there were more than 800 violations against journalists in Lebanon. That includes legal actions against online media outlets Sharika Wa Laken, The Public Source and Megaphone News and journalist Dima Sadek, who received a prison sentence for social media posts.

Supporters of stricter moderation argue that clear limits are necessary to curb hate speech, harassment and incitement. Critics respond with a fundamental question: who decides where the line is drawn, and according to what standards?

Between 2019 and 2020, for instance, Instagram’s automated moderation bots flagged and took down breast cancer awareness campaigns for showing mastectomy scars and post-surgical breasts for violating META’s sexual content policies. After receiving backlash, META enhanced an overlay of context-aware models to prevent flagging similar posts.

AI steps in

Here, our growing global reliance on artificial intelligence (AI) intensifies the debate.

Technology companies increasingly rely on AI systems to review and filter vast amounts of user-generated content. These systems are trained on large datasets of labeled material and are evaluated based on “accuracy,” or how closely their decisions match those of human reviewers applying platform policies.

For example, YouTube receives hundreds of hours of video uploads every minute and relies on a mix of AI for rapid detection and human reviewers for context and final decisions.

Except that during the COVID-19 spike in traffic, it leaned more heavily on automation, removing a record number of 11 million videos in one quarter of 2020. The shift revealed AI’s limits: many videos were wrongly removed, triggering a spike in appeals and reinstatements, and the establishment of a larger team of human moderators.

But that measure assumes that the policies themselves are clear and consistent. When definitions of hate speech, misinformation or incitement are vague or politically contested, AI systems do not resolve the ambiguity. Instead, they apply those same unclear standards at scale.

Experts note that machine learning systems do not truly understand language. They detect patterns based on probability. Context, such as satire, political dissent, reclaimed slurs or evolving slang, can be difficult for automated tools to interpret. Platforms could then remove lawful but controversial speech to minimize legal and political risk.

A Libyan academic told the Middle East Institute that Twitter flagged and removed a reply she had sent to a fellow Libyan, in which she had used a colloquial word that roughly translates to “idiot.”

In a more serious context, Human Rights Watch reviewed over 1,050 takedowns and other suppression of content Instagram and Facebook posted by Palestinians and their supporters between October and November 2023, in the first month of Israel’s war in Gaza. While one involved content in support of Israel, the 1,049 other depicted “peaceful content in support of Palestine that was censored or otherwise unduly suppressed.”

This is also indicative of the inconsistency in policies that heavily orchestrate allowed content.

Lessons from around the world

International examples highlight the stakes. In Morocco, a 2022 law requires network providers to remove content deemed a threat to national security or public consensus. Critics argue that the language is overly broad and grants wide discretion to authorities, raising concerns about its impact on freedom of expression.

Germany’s 2017 Network Enforcement Act, known as NetzDG, offers another model. The law imposed strict deadlines and heavy fines on major platforms that fail to remove unlawful content under German criminal law.

Studies indicate it contributed to a decline in online hate speech and a modest 1 percent reduction in crimes against refugees. However, the threat of large fines may encourage companies to remove more content than necessary to avoid penalties.

Now imagine these laws applied at AI scale: every vague rule gets amplified to millions of users instantly. The technical act of removal is the easy part.

The real difficulty lies in deciding, legitimately and transparently, what content should be removed. In the era of AI, that decision is increasingly automated. How societies define harm and how AI enforces those definitions will shape the future of online speech.

For over two decades, content moderation online has caused debate due to its fundamental tension: how to protect users from harm, while safeguarding their right to free expression.In Lebanon, as a reaction to the 2019 anti-governmental protest movement, authorities increased enforcement of criminal defamation laws, summoning journalists and activists over online criticism. Rights advocates have pointed out that Lebanon’s vaguely worded laws make room for abuse of power and can be used to silence dissent.Between 2017 and 2023, there were more than 800 violations against journalists in Lebanon. That includes legal actions against online media outlets Sharika Wa Laken, The Public Source and Megaphone News and journalist Dima Sadek, who received a prison sentence for social media posts.Supporters of stricter moderation argue that clear...