That year, researchers found that feeding thousands of images into an algorithm inspired loosely by the way neurons in a brain respond to input produced a huge leap in accuracy. The breakthrough sparked an explosion in academic research and commercial activity that is transforming some companies and industries.
Now a new trick, which involves training the same kind of AI algorithm to turn 2D images into a rich 3D view of a scene, is sparking excitement in the worlds of both computer graphics and AI. The technique has the potential to shake up video games, virtual reality, robotics, and autonomous driving. Some experts believe it might even help machines perceive and reason about the world in a more intelligent—or at least humanlike—way.
“It is ultra-hot, there is a huge buzz,” says Ken Goldberg, a roboticist at the University of California, Berkeley, who is using the technology to improve the ability of AI-enhanced robots to grasp unfamiliar shapes. Goldberg says the technology has “hundreds of applications,” in fields ranging from entertainment to architecture.
The new approach involves using a neural network to capture and generate 3D imagery from a few 2D snapshots, a technique dubbed “neural rendering.” It arose from the merging of ideas circulating in computer graphics and AI, but interest exploded in April 2020 when researchers at UC Berkeley, UC San Diego, and Google showed that a neural network could capture a scene photorealistically in 3D simply by viewing several 2D images of it.
That algorithm exploits the way light travels through the air and performs computations that calculate the density and color of points in 3D space. This makes it possible to convert 2D images into a photorealistic 3D representation that can be viewed from any possible point. Its core is the same sort of neural network as the 2012 image-recognition algorithm, which analyzes the pixels in a 2D image. The new algorithms convert 2D pixels into the 3D equivalent, known as voxels. Videos of the trick, which the researchers called Neural Radiance Fields, or NeRF, wowed the research community.
“I’ve been doing computer vision for 20 years, but when I saw this video, I was like ‘Wow, this is just incredible,’” says Frank Dellaert, a professor at Georgia Tech.
For anyone working on computer graphics, Dellaert explains, the approach is a breakthrough. Creating a detailed, realistic 3D scene normally requires hours of painstaking manual work. The new method makes it possible to generate these scenes from ordinary photographs in minutes. It also provides a new way to create and manipulate synthetic scenes. “It's seminal and important, which is something crazy to say for work that’s only two years old,” he says.
Dellaert says the speed and variety of ideas that have emerged since then have been breathtaking. Others have used the idea to create moving selfies (or “nerfies”), which let you pan around a person’s head based on a few stills; to create 3D avatars from a single headshot; and to develop a way to automatically relight scenes differently.
The work has gained industry traction with surprising speed. Ben Mildenhall, one of the researchers behind NeRF who is now at Google, describes the flourishing of research and development as “a slow tidal wave.”
Most PopularThe End of Airbnb in New YorkBusiness
Researchers at Nvidia, which makes computer chips for both AI and computer games, have published papers that use NeRF to generate 3D images from photo collections, to produce more realistic textures in animation, and point to advances for video games. Facebook (now Meta) has developed an approach similar to NeRF that could be used to flesh out scenes in Mark Zuckerberg’s much-vaunted Metaverse. Yann LeCun, chief AI scientist at Meta and a pioneer of the approach that shook things up in 2012, calls the new work “fascinating” and the results “quite impressive.”
NeRF may be especially useful for machines that operate in the real world. Goldberg, who is one of the world’s leading experts on robotic grasping, and colleagues used NeRF to train robots to make sense of transparent objects, normally a challenge because of the way these objects reflect light, by letting them infer the shape of an object based on a video image.
Makers of self-driving cars are also finding uses for the idea. During a presentation in August, Andrej Karpathy, director of AI at Tesla, said the company was using the technology to generate 3D scenes needed to train its self-driving algorithms to recognize and react to more on-road scenarios.
The ideas behind NeRF may well be important for AI itself. That’s because understanding the physical properties of the real world is crucial to making sense of it.
“These methods, which came out of computer graphics, are having a huge impact on AI,” says Josh Tenenbaum, a professor at MIT who studies the computational principles behind human learning and inference.
Tenenbaum points to the work of Vincent Sitzmann, a newly appointed assistant professor at MIT. In 2019, Sitzmann and others first introduced the idea of using neural rendering to generate 3D representations of objects based on a limited number of 2D images of them.
Sitzmann’s work doesn’t produce a complete photorealistic 3D picture—the algorithm infers an object’s approximate shape from an incomplete picture. This is something that humans routinely do, Tenenbaum notes. “If I want to pick something up, like the coffee cup in front of me, my perception system implicitly makes a guess about where the back of the cup is as I close my hand around it,” he says.
More recently, Sitzmann; Semon Rezchikov, a research fellow at Harvard; and others have shown a more computationally efficient way for a neural network to render a scene. The methods they are working on could let AI programs identify objects by their 3D shapes, recognizing a car or a cup even if the design is radically different from what it has seen before.
In other words, NeRF and related ideas could ultimately let AI learn about the world in a more sophisticated way, paving the path for robots to operate in complex, unfamiliar environments without making blunders.
Tenenbaum says evidence from cognitive science also suggests that the human brain does something similar when a person looks around. “It’s complicated,” he says of the computational steps involved. “But the brain is complicated too.”
Updated, 2-14-22, 3:05pm ET: An earlier version of this article omitted researchers from UC San Diego among collaborators on the April 2020 paper.
More Great WIRED Stories📩 The latest on tech, science, and more: Get our newsletters!The quest to trap CO2 in stone—and beat climate changeWhat it'll take to get electric planes off the groundThe US government wants your selfiesWe Met in Virtual Reality is the best metaverse movieWhat's the deal with anti-cheat software in games?👁️ Explore AI like never before with our new database📱 Torn between the latest phones? Never fear—check out our iPhone buying guide and favorite Android phones