Scientist-Citizen Effort Takes Vast Trove of Wildlife Photos Beyond the Animal Selfie
Using 225 automated cameras set up across several hundred miles of protected habitat and running continuously for three years, the Snapshot Serengeti project has gathered more than 300,000 photographs of Tanzanian wildlife, from lions, cheetahs, elephants, and zebras to honey badgers, porcupines, and wildebeests. Even rare aardwolves and zorillas have occasionally tripped the project’s lenses.
Then the real work began: turning this enormous trove of images into useful information for conservationists and ecologists seeking to understand how these species coexist in the same environment.
“The Serengeti National Park is a great ecosystem for answering this kind of question, because there are several apex predators,” said Margaret Kosmala, an ecological biologist at Harvard University who works on the project. “Lions, cheetahs, hyenas, and leopards all exist together. Do they use different parts of the landscape? Maybe they divide up time, day versus night?”
To convert visual images into verifiable data, Kosmala and her colleagues invited volunteers to view photos on the project’s website and tag the different species of animals in the images, as well as what time they appeared, what they were doing, and what other species were around at the same time.
More than 28,000 people have contributed identifications, Kosmala said, many of them avid participants in the site’s online forums, where the scientists also spend time. What do they get out of it? “The volunteers see the pictures first. That means the really cool, awesome images first get seen by whoever pops in to do some work,” she said. “The good ones always end up on the forum, so we always get to see them! And they get that excitement of discovery.”
Kosmala has coauthored a new article explaining the project’s methods, published Wednesday in the journal Scientific Data.
As for the predators on the Serengeti, photo data reveals that relatively peaceful coexistence “has more to do with dividing up time than space,” she said. “They don’t seem to be avoiding each other as much as you might expect,” instead using the same habitat at different times of the day and night.
“Citizen science and informatics has a long history,” Kosmala said, but scientists have usually thought of amateur data gathering in terms of local or small-scale projects: “stream quality, or how many eggs are laid in bluebird boxes.” Snapshot Serengeti has shown “that citizen science can be used on a large scale, and more importantly, the data quality can be good if it’s well done.”
“We show that you can get very high quality by getting multiple people to look at the same image, and having both trained and untrained people look at an image,” she explained.
At least 10 different people looked at each image included in the newly released data set. If there was a lot of disagreement about which species appeared in an image, it was redistributed to as many as 30 volunteers for tagging and was sometimes evaluated by an expert as well, “which makes efficient use of their time,” Kosmala noted.
Then the researchers used a simple vote-counting algorithm to total up the different tags associated with each image.
To be sure the process was creating the most accurate data possible, “we tested this against a data set of about 4,000 images, where we manually classified all the images,” Kosmala said. “We did it in two stages to reduce mistakes, and even then the expert only got to 99 percent” accuracy, she said, compared with the volunteer accuracy rate of 96.6 percent—a statistical dead heat.
The Snapshot Serengeti project is unusually broad for a wildlife study, encompassing 40 mammal species across 700 miles of landscape. “As far as we know, it’s unique,” said Kosmala. “This is the largest single camera trapping effort in the world.”
“The majority of our images are not National Geographic quality,” she said. “They’re just part of an animal, very close or very far away, sometimes with bad lighting.” Computers typically have not been able to get much information out of such photographs, but “people are very good at identifying what’s in these sorts of images.”
Snapshot Serengeti has already been contacted by computer vision researchers who want to use the data to create better automated processing of wildlife images, said Kosmala. “Our set creates a large, labeled image set so that others can use it to create species-specific data processing.”
“Our research shows you can use citizen science methods to get high-quality data out of archives of camera trap images.” she said.
The results will have a big impact on saving threatened and endangered species, she believes, because they can aid computer vision experts in improving automated image analysis. “That would really speed up the ability of conservation projects to go through their image backlogs and get useful information from them,” perhaps in time to stop habitat loss or protect animals in the wild.
Snapshot Serengeti’s data set is helping conservation education as well. “It’s already being used at the University of Minnesota so that students can ask their own questions and get the answers from the data,” rather than solving abstract ecological problems with example data, Kosmala said.
While Serengeti National Park is a well-protected area, she said, “we’re hoping that by using the data that we have in this paper, we’ll be able to learn things about how the species interact with each other and the landscape, to inform conservation projects elsewhere in Africa.”
The Web interface for Snapshot Serengeti was developed by a nonprofit called Zooniverse that specializes in citizen-science projects. “We were looking for a way to do citizen science at the same time they were looking for a project,” Kosmala said. Zooniverse’s design makes it “as fast and painless as possible to get started” looking at and tagging images.