In today’s internet era, content moderation has become an essential but often unsung job. Thousands of content moderators are tasked with reviewing and removing user-generated content that violates the policies of large social networks like Facebook. However, this job takes a toll on their mental health, as they are exposed to disturbing content such as child sexual abuse, crimes, and other atrocities. OpenAI, an artificial intelligence research organization, aims to alleviate the burden on human moderators by utilizing their latest large language model, GPT-4, for content moderation. This article examines OpenAI’s approach and its potential impact on the future of content moderation.
OpenAI believes that AI can play a significant role in the moderation of online content. According to their research, GPT-4 trained for content moderation performs better than minimally trained human moderators, although highly trained human moderators still outperform both. OpenAI proposes a three-step framework for training GPT-4 to moderate content based on an organization’s policies.
The first step in OpenAI’s framework involves drafting the content policy. While the blog post does not explicitly mention whether humans are responsible for this task, it can be assumed that human input is crucial in defining the guidelines. In the second step, GPT-4 reads the content policy and reviews a “golden set” of data labeled by human moderators. This dataset includes content that violates policies, as well as examples that adhere to the policies. GPT-4 then assigns its own labels to the dataset based on its understanding of the content policy. Finally, human supervisors compare GPT-4’s labels with those created by humans. In cases of discrepancies or mislabeled content, supervisors can request explanations from GPT-4. This iterative process helps refine the content policies, ensuring clearer instructions for GPT-4 going forward.
OpenAI claims that their AI-based approach offers several benefits over traditional moderation methods. Firstly, it promotes consistency in labeling by avoiding variations that may arise from human interpretation. AI models like GPT-4 can provide more uniform labels, thus reducing confusion and inconsistencies. Secondly, the framework enables a faster feedback loop for updating content policies. As new violations occur, human supervisors can promptly revise policies to address emerging issues, making content moderation more effective. Finally, incorporating AI in content moderation reduces the mental burden on human moderators. Rather than being responsible for all aspects of moderation, human supervisors may focus on training AI models, diagnosing issues, and providing guidance.
OpenAI’s emphasis on content moderation aligns with its recent investments and partnerships with media organizations like The Associated Press and the American Journalism Project. Media organizations struggle to effectively moderate reader comments while maintaining freedom of speech. By utilizing AI for content moderation, OpenAI aims to help media organizations strike a balance between engagement and responsible content management.
OpenAI’s blog post takes the opportunity to differentiate its approach from that of rival organization, Anthropic. Anthropic’s “Constitutional AI” framework relies on an AI model’s internalized judgment, following a single ethical framework. OpenAI’s process, however, prioritizes platform-specific content policy iteration with reduced effort and greater speed. By encouraging trust and safety practitioners to implement this process, OpenAI invites others to explore their approach to content moderation.
Nevertheless, OpenAI’s push for automated content moderation is not without irony. Investigative reports by Time magazine and The Wall Street Journal revealed that OpenAI itself employed human content moderators in Kenya through contractors and subcontractors. These moderators were tasked with reading and labeling content, including AI-generated content, often exposing them to traumatic experiences. The reports highlighted the low wages paid to these workers and the lasting mental health impact of their work. Consequently, OpenAI’s current endeavor in automated content moderation may be seen as an attempt to rectify past harms and prevent similar experiences in the future.
The introduction of AI in content moderation presents a promising solution to alleviate the mental burden on human moderators while improving the consistency and efficiency of the moderation process. OpenAI’s framework, centered around GPT-4, shows potential for effectively moderating content according to specific policies. By embracing AI, platforms and organizations can foster a safer and more reliable digital environment. However, it is crucial to implement these advancements ethically, ensuring fair treatment and protection for those involved in content moderation. The future holds the promise of a safer digital era, where humans and AI work together to create a more responsible online community.