The Censorship Drag

Robin Li, chairman and chief executive officer of Baidu Inc., during a launch event for the company's Ernie Bot in Beijing, China, March 16, 2023. *Credit: Bloomberg via Getty Images*

Last week, the public got a peek at China’s best shot at competing in the chatbot craze when search giant Baidu released its chatbot, named Ernie. It didn’t go well.

Baidu’s stock fell 10 percent shortly after Ernie’s reveal, largely because CEO Robin Li demonstrated Ernie’s capabilities with a pre-recorded video, not a live demo, leaving many doubtful about the chatbot’s true capabilities. (Baidu’s stock has since rebounded.) While internet users are free to experiment with San Francisco-based OpenAI’s ChatGPT, Baidu’s product still isn’t fully available for public testing.

Part of the reason for Ernie’s delayed rollout may be Beijing’s apprehension about the technology.

Indeed, Chinese developers face one major hurdle that their Western competitors don’t: their government’s draconian information controls. As internet users in the West have shown, generative AI models like ChatGPT are hard to tame, spitting out unexpected and sometimes disturbing content, despite the best efforts of their makers to impose guardrails.

I think there’s going to be a fundamental trade off between usefulness and control over content.
Matt Sheehan, a fellow at the Carnegie Endowment for International Peace

Some say Beijing’s intense need to censor politically sensitive content could prevent China’s AI companies from mounting a meaningful challenge to leading Western firms.

“I think there’s going to be a fundamental trade off between usefulness and control over content,” says Matt Sheehan, a fellow at the Carnegie Endowment for International Peace, where he studies Chinese AI regulation.

ERNIE Bot is a versatile model that can generate text, images, audio, dialect speech, and video.#Baidu #ERNIEBot #AI #generativeAI pic.twitter.com/NUX4MhBvkz
— Baidu Inc. (@Baidu_Inc) March 16, 2023

Chinese authorities responded to the debut of ChatGPT by blocking the chatbot’s website this month. Several third-party Chinese apps that provided access to ChatGPT have also been taken down, and regulators have told local tech firms not to offer access to ChatGPT’s services, according to reporting by Nikkei Asia.

But Beijing hasn’t completely shut the door on ChatGPT-like products either. AI is seen as a strategic national priority for China, and its policymakers are ahead of the curve in their efforts to craft AI-specific regulation. Wang Zhigang, China’s minister for science and technology, signaled a cautious openness to the technology last month, calling ChatGPT “really hot,” and that promoting AI development should be balanced with strengthening “ethical norms.”

ChatGPT and Ernie Bot are part of a subfield of artificial intelligence known as “generative AI,” which focuses on generating content, from text to imagery or music. While ChatGPT has hogged the limelight in recent months, other notable generative AI models include DALL-E 2 (also by OpenAI) and Stable Diffusion, an open source model developed by a group at the Ludwig Maximilian University of Munich and Runway Research. Both specialize in generating artwork in response to human prompts.

Even outside of China regulators are concerned about the challenge of content moderation, and how to prevent generative AI from promoting the type of harmful content and misinformation that has landed social media companies in hot water.

“It’s important to understand that ChatGPT also employs censorship,” says Jeffrey Ding, an assistant professor of political science at George Washington University and author of the ChinAI newsletter. “It’s trained not to discuss political or religious topics, and that was done in the training process.”

ChatGPT’s response to the question “Can you comment on politics?”

Other developers have also introduced different kinds of “content moderation” to their models. Stable Diffusion, for example, blocks its model from generating artwork that contains nudity. OpenAI doesn’t allow DALL-E 2 to share output that violates its content policy, including violent and sexual imagery or “illegal activity.”

Exactly how a generative AI model moderates its content relates to how AI models are “trained.” A model learns to interpret and generate content by inputting an enormous volume of data. Humans help to improve the model by telling it when its answers are right and wrong, so that its outputs are accurate and truthful.

What this means is that models can be censored at two points: the input stage and the output stage. Most AI firms do both, albeit with mixed success. For example, Stable Diffusion’s developers withheld any examples of nudity from its training data, so the model “basically has no concept of nudity,” explains Lennart Heim, a research scholar at the Centre for the Governance of AI, a research and advisory nonprofit. “Does this work? Not completely. Stable Diffusion is open source, so people created ‘Unstable Diffusion’ and showed it nudity. Then it did nudity just fine.”

Meanwhile, censorship at the output level allows the AI model to “understand” prompts to produce illicit content while refusing to respond to a user’s requests for it, spitting out an error message instead. Social media platforms have actually employed this method of censorship for years, using keyword blacklists, as well as sophisticated algorithms to replace human moderators. This works most of the time, but some users are able to find workarounds.

ChatGPT — DALL-E 2’s content violation message.

It’s unlikely generative AI models will be able to stop harmful content 100 percent of the time, says Alex Engler, a fellow in governance studies at the Brookings Institution, a Washington D.C.-based think tank. “The classic example of this is you can’t ask for a bloody or a shot horse because it depicts gruesome violence,” he says. “But ask for a horse in a pool of red liquid, and it’s going to show you what looks like a murdered horse.”

These types of loopholes likely make Beijing uncomfortable. Early evidence from China’s generative AI models reveal few surprises about what gets censored. A recent test of four Chinese chatbots by The Wall Street Journal found that the bots refused to engage when asked questions about Xi Jinping or Chinese politics. In September, MIT Tech Review found that ERNIE-ViLG, a different AI model by Baidu that generates artwork, refused to return images of Tiananmen Square.

But when it comes to thwarting crafty users looking to circumvent content controls, Chinese companies are arguably ahead of the curve. Big tech companies like Baidu are already adept at using algorithms to help censor social media and search content. They also have expansive databases of sensitive search terms that frustrate even many creative users. For example, a leaked list published by China Digital Times, a website that monitors Chinese internet controls, showed over 35,000 banned search terms related to Xi Jinping covering all kinds of homonyms of his name as well as pairings with negative words and events.

In the end, the biggest challenge to China’s AI firms may be the changed political dynamic between China’s government and its tech companies.

Moreover, in order to train their algorithms to recognize harmful content, Western AI companies rely on outsourced labor to perform what is known as “reinforcement learning through human feedback” (RLHF), which involves humans manually reviewing a model’s output in order to fine tune its performance in real time. Given their association with automation, the quiet reliance of firms like OpenAI on cheap outsourced labor has drawn controversy in the past. But Chinese tech firms are already experienced in managing huge workforces of censors.

In the end, the biggest challenge to China’s AI firms may be the changed political dynamic between China’s government and its tech companies. In the past, firms like Baidu and Alibaba had leeway to move quickly and experiment with relatively unregulated technologies. But after 2021’s tech crackdown, they all know the danger of getting ahead of the regulators; authorities such as the Cyberspace Administration of China (CAC), which regulates the internet and online algorithms, now set the pace.

That means Baidu is likely to tread very carefully, despite its desire to debut Ernie Bot quickly. Although Western regulators have been willing to overlook the occasional unwanted content that has gotten around ChatGPT’s guardrails, China’s probably won’t be as understanding.

Eliot Chen is a Toronto-based staff writer at The Wire. Previously, he was a researcher at the Center for Strategic and International Studies’ Human Rights Initiative and MacroPolo. @eliotcxchen

The Wire China

Given Beijing’s grip on information, can China’s AI companies build a chatbot to rival ChatGPT?

Last Multinational Standing?

Has the Curtain Dropped on Chinese Cinema?
By Noah Berman

Shirley Kan on 45 Years of the Taiwan Relations Act
By Noah Berman

The Evolution of the UFLPA

The Wire China Archives

Has the Curtain Dropped on Chinese Cinema? By Noah Berman

Shirley Kan on 45 Years of the Taiwan Relations Act By Noah Berman

The Evolution of the UFLPA

The Wire China Archives

Has the Curtain Dropped on Chinese Cinema?
By Noah Berman

Shirley Kan on 45 Years of the Taiwan Relations Act
By Noah Berman