26.4 C
New York
Saturday, June 22, 2024

A Face Recognition Site Crawled the Web for Dead People’s Photos

Finding out Taylor Swift was her 11th cousin twice-removed wasn’t even the most shocking discovery Cher Scarlett made while exploring her family history. “There’s a lot of stuff in my family that’s weird and strange that we wouldn’t know without Ancestry,” says Scarlett, a software engineer and writer based in Kirkland, Washington. “I didn’t even know who my mum’s paternal grandparents were.”

Ancestry.com isn’t the only site that Scarlett checks regularly. In February 2022, the facial recognition search engine PimEyes surfaced non-consensual explicit photos of her at age 19, reigniting decades-old trauma. She attempted to get the pictures removed from the platform, which uses images scraped from the internet to create biometric “faceprints” of individuals. Since then, she’s been monitoring the site to make sure the images don’t return.

In January, she noticed that PimEyes was returning pictures of children that looked like they came from Ancestry.com URLs. As an experiment, she searched for a grayscale version of one of her own baby photos. It came up with a picture of her own mother, as an infant, in the arms of her grandparents—taken, she thought, from an old family photo that her mother had posted on Ancestry. Searching deeper, Scarlett found other images of her relatives, also apparently sourced from the site. They included a black-and-white photo of her great-great-great-grandmother from the 1800s, and a picture of Scarlett’s own sister, who died at age 30 in 2018. The images seemed to come from her digital memorial, Ancestry, and Find a Grave, a cemetery directory owned by Ancestry.

PimEyes, Scarlett says, has scraped images of the dead to populate its database. By indexing their facial features, the site’s algorithms can help those images identify living people through their ancestral connections, raising privacy and data protection concerns, as well as ethical ones.

“My sister is dead,” Scarlett says. “She can’t consent or revoke consent for being enrolled in this.”  

Ancestry spokesperson Katherine Wylie tells WIRED that the site’s customers maintain ownership and control over their data, including family trees, and that its terms and conditions “prohibit scraping data, including photos, from Ancestry’s sites and services as well as reselling, reproducing, or publishing any content or information found on Ancestry.”

Giorgi Gobronidze, PimEyes’ director, tells WIRED: “PimEyes only crawls websites who officially allow us to do so. It was … very unpleasant news that our crawlers have somehow broken the rule.” PimEyes is now blocking Ancestry’s domain and indexes related to it are being erased, he says.

Ancestry’s database is the largest in the increasingly expanding genealogy industry, with more than 30 billion records—including photos and documents from public records—covering 20 million people. Users can access these records to make family trees. Whenever a user makes a family tree public on the site, deceased people’s photos can be seen by any registered user. Living people aren’t viewable in family trees unless tree creators authorize specific accounts to see them. Users can decide what is private or public on their profiles, which are searchable in Ancestry’s member directory.

Most PopularBusinessThe End of Airbnb in New York

Amanda Hoover

BusinessThis Is the True Scale of New York’s Airbnb Apocalypse

Amanda Hoover

CultureStarfield Will Be the Meme Game for Decades to Come

Will Bedingfield

GearThe 15 Best Electric Bikes for Every Kind of Ride

Adrienne So

PimEyes positions itself as a tool for people to monitor their online presence. The company charges users $20 to find the websites where their photos have been found, upwards of $30 a month for multiple searches, and $80 to exclude specific photos from future search results.

The company, which has trawled social media for images but now says it scrapes only publicly available sources, has been criticized for collecting images of children and accused of facilitating stalking and abuse. (Gobronidze, who took over PimEyes in January 2022, says that this criticism predates his tenure at PimEyes, and that the company’s policies have since changed.)

“They are clearly crawling all sorts of random websites,” says Daniel Leufer, a senior policy analyst at digital rights group Access Now. “There’s something very grim, especially about the obituary ones.”

The dead aren’t generally protected under privacy laws, but processing their image and data isn’t automatically fair game, says Sandra Wachter, a professor of technology and regulation at the Oxford Internet Institute. “Just because the data doesn’t belong to a person anymore does not automatically mean you are allowed to take it. If it’s a person who has died we have to figure out who has rights over it.” 

The European Convention of Human Rights has ruled that pictures of dead people can have a privacy interest for the living, according to Lilian Edwards, professor of law, innovation, and society at Newcastle University in the UK, who says that using photos of the living mined from the web without consent can also be a potential violation of the EU’s General Data Protection Regulation (GDPR), which prohibits the processing of biometric data to identify people without their consent.

“If in some way the picture of the dead person … could lead to someone living being likely to be identified, then it could be protected under the GDPR,” says Edwards. This can be done by putting two bits of information together, she adds, such as a photo from PimEyes and information from Ancestry. PimEyes makes itself available in Europe, so it is subject to the legislation.

Scarlett worries that PimEyes’ technology could be used to identify people and then dox, harass, or abuse them—a concern shared by human rights organizations. She says her mom’s name, address, and phone number were just a reverse image search and three clicks away from the family photo scraped from Ancestry.

While it positions itself as a privacy tool, there are few barriers stopping PimEyes users from searching any face. Its home screen gives little indication that it’s intended for people to search only for themselves.

Gobronidze tells WIRED that PimEyes launched a “multistep security protocol” on January 9 to prevent people from searching multiple faces or children; PimEyes’ partners, however, including certain NGOs, are “whitelisted” to perform unlimited searches. PimEyes has so far blocked 201 accounts, Gobronidze says.

However, a WIRED search for Scarlett and her mother—conducted with their permission—retrieved matches unchallenged. WIRED also found evidence of online message-board users with subscriptions taking requests from others to identify women with pictures found online.

Most PopularBusinessThe End of Airbnb in New York

Amanda Hoover

BusinessThis Is the True Scale of New York’s Airbnb Apocalypse

Amanda Hoover

CultureStarfield Will Be the Meme Game for Decades to Come

Will Bedingfield

GearThe 15 Best Electric Bikes for Every Kind of Ride

Adrienne So

Gobronidze says the system is still in the “training process.”

In Washington state, Scarlett filed a consumer complaint about PimEyes to the state’s attorney general and has opted out using its “opt out” form, which promises to remove people’s data from its system, twice—once in March 2022 and again in October after her face reappeared. 

Scarlett’s mother also opted out in January 2023, she says. However, searches by WIRED revealed both their faces were still surfaced by the platform on March 1.

Gobronidze says that PimEyes erased over 22 results of Scarlett after her first request and 400 after the second, and performed a search using the photo she opted out with, failing to locate any images of her in its database. However, if a user opts out with a specific photo, other images may still appear. The “opt out engine will not work with 100 percent efficiency always,” Gobronidze said.

Legal scrutiny is intensifying for AI companies that populate their databases by crawling the web for faces. Clearview AI, a company that mainly sells facial recognition services to law enforcement, is facing a class action in Illinois and fines for breaking data protection laws across Europe, including the UK, where the Information Commissioner's Office (ICO), an independent watchdog, ordered the company to delete all residents’ data. Clearview has denied misconduct and argued that it shouldn’t be subject to neither European data protection laws nor the jurisdiction of the ICO.

In November, UK-based rights group Big Brother Watch submitted a legal complaint about PimEyes to the ICO for “unlawful processing” of people’s data. Germany’s state commissioner for data protection has opened proceedings against the site for processing biometric data. Gobronidze says the company has been “proactively submitting” information to the ICO.

Gobronidze says it is “absolutely impossible” to establish identities using its database. “We gather index data, which connects photographs not with human individuals but to the URL addresses which publish those photographs.” PimEyes says it indexes photos but does not store the images themselves. Gobronidze adds that PimEyes does not process photos to establish identities but to find website addresses. “PimEyes does not identify human beings but only URLs,” he said.

Leufer, however, says PimEyes is “significantly enabling and facilitating the process of identification of people” based on photos. “While I think he’s correct to say that PimEyes won’t directly through their website give you that person’s identity, you’re a click away from a website that does have their name on it,” he says. “It’s going to give you a load of URLs which in many cases will allow you to identify that person.”

Scarlett fears those links could expose entire families to privacy violations.

“I used [Ancestry] for what it’s intended for—to find out where I come from. It was really exciting until it wasn’t,” she says. “Nobody is uploading photos into Ancestry thinking that they’re going to be enrolled into a biometric identifier for facial recognition software without their knowledge or consent … It just feels incredibly violating.”

Related Articles

Latest Articles