A recent trove of documents leaked from Facebook demonstrated how the social network struggles to moderate dangerous content in places far from Silicon Valley. Internal discussions revealed worries that moderation algorithms for the languages spoken in Pakistan and Ethiopia were insufficient, and that the company lacked adequate training data to tune systems to different dialects of Arabic.
Meta Platforms, Facebook’s owner, now says it has deployed a new artificial intelligence moderation system for some tasks that can be adapted to new enforcement jobs more quickly than its predecessors because it requires much less training data. The company says the system, called Few-Shot Learner, works in more than 100 languages and can operate on images as well as text.
Facebook says Few-Shot Learner makes it possible to automate enforcement of a new moderation rule in about six weeks, down from around six months. The company says the system is helping to enforce a rule introduced in September banning posts likely to discourage people from getting Covid-19 vaccines—even if the posts don’t flatly lie. Facebook also says Few-Shot Learner, first deployed earlier this year, contributed to a decline it recorded in the worldwide prevalence of hate speech from mid-2020 through October this year, but it has not released details of the new system’s performance.
The new system won’t solve all of Facebook's content challenges, but it's an example of how deeply the company relies on AI to tackle them. Facebook grew to span the globe claiming it would bring people together—but its network has also incubated hate, harassment, and, according to the United Nations, contributed to genocide against Rohingya Muslims in Myanmar. The company has long said AI is the only practical way to monitor its vast network, but despite recent advances the technology is a long way short of being able to understand the nuances of human communication. Facebook said recently that it has automated systems to find hate speech and terrorism content in more than 50 languages—but the service is used in more than 100 languages.
Few-Shot Learner is an example of a new breed of much larger and more complex AI systems rapidly gaining currency among tech companies and AI researchers—but also raising concerns about unwanted side effects such as bias.
Models such as Few-Shot Learner can work with less example data carefully labeled by humans because their scale allows them to pick up some fundamentals of a problem by “pretraining” on large volumes of raw, unlabeled data. A relatively small amount of labeled data can then be used to fine-tune the system to a particular task.
Google improved its search engine using a system dubbed BERT after finding that pretraining it on billions of words from the web and books gave the system more power to process text. Two of the company’s top AI researchers were later ejected from the company after a dispute over a paper urging caution with such systems. OpenAI, an AI company backed by Microsoft, has shown its own large language model, GPT-3, can generate fluid text and programming code.
Most PopularThe End of Airbnb in New YorkBusiness
Few-Shot Learner is pretrained on a firehose of billions of Facebook posts and images in more than 100 languages. The system uses them to build up an internal sense of the statistical patterns of Facebook content. It is tuned for content moderation by additional training with posts or imagery labeled in previous moderation projects and simplified descriptions of the policies those posts breached.
After that preparation, the system can be directed to find new types of content, such as to enforce a new rule or expand into a new language, with much less effort than previous moderation models, says Cornelia Carapcea, a product manager on moderation AI at Facebook.
More conventional moderation systems might need hundreds of thousands or millions of example posts before they can be deployed, she says. Few-Shot Learner can be put to work using just dozens—the “few shots” in its name—combined with simplified descriptions or “prompts” of the new policy they relate to.
“Because it’s seen so much already, learning a new problem or policy can be faster,” Carapcea says. “There’s always a struggle to have enough labeled data across the huge variety of issues like violence, hate speech, and incitement; this allows us to react more quickly.”
Few-Shot Learner can also be directed to find categories of content without showing it any examples at all, just by giving the system a written description of a new policy—an unusually simple way of interacting with an AI system. Carapcea says results are less reliable this way, but the method can quickly suggest what would be swept up by a new policy, or identify posts that can be used to further train the system.
The impressive capabilities—and many unknowns—about giant AI creations like Facebook’s prompted Stanford researchers to recently launch a center to study such systems, which they call “foundation models” because they appear set to become an underpinning of many tech projects. Large machine-learning models are being developed for uses not only in social networks and search engines, but also in industries such as finance and health care.
Percy Liang, the Stanford center’s director, says Facebook’s system appears to show some of the impressive power of these new models, but will also exhibit some of their trade-offs. It’s exciting and useful to be able to direct an AI system to do what you want just with written text, as Facebook says it can with new content policies, Liang says, but this capacity is poorly understood. “It’s more of an art than a science,” he says.
Liang says that Few-Shot Learner’s speed also may have drawbacks. When engineers don’t have to curate as much training data, they sacrifice some control and knowledge of their system’s capabilities. “There’s a bigger leap of faith,” Liang says. “With more automation, you have less potential oversight.”
Carapcea of Facebook says that as Facebook develops new moderation systems it also develops ways to check their performance for accuracy or bias.
More Great WIRED Stories📩 The latest on tech, science, and more: Get our newsletters!Amazon's dark secret: It has failed to protect your dataHumans have broken a fundamental law of the oceanWhat The Matrix got wrong about cities of the futureThe father of Web3 wants you to trust lessWhich streaming services are actually worth it?👁️ Explore AI like never before with our new database💻 Upgrade your work game with our Gear team’s favorite laptops, keyboards, typing alternatives, and noise-canceling headphones