Can citizen science analysis of camera trap data be used to study reproduction? Lessons from Snapshot Serengeti program

Ecologists increasingly rely on camera-trap data to estimate biological parameters such as population abundance. Because of the huge amount of data, the assistance of non-scientists is often sought after, but an assessment of the data quality is necessary. We tested whether volunteers data from one of the largest citizen science projects - Snapshot Serengeti - could be used to study breeding phenology. We tested whether the presence of juveniles (less than one or 12 months old) of species in the Serengeti: topi, kongoni, Grant’s gazelle, could be reliably detected by the “naive” volunteers vs. trained observers. We expected a positive correlation between the proportion of volunteers identifying juveniles and their effective presence within photographs, assessed by the trained observers. The agreement between the trained observers was good (Fleiss’ κ > 0.61 for juveniles of less than one and 12 month(s) old), suggesting that morphological criteria can be used to determine age. The relationship between the proportion of volunteers detecting juveniles less than a month old and their actual presence plateaued at 0.45 for Grant’s gazelle, reached 0.70 for topi and 0.56 for kongoni. The same relationships were much stronger for juveniles younger than 12 months, reaching 1 for topi and kongoni. The absence of individuals < one month and the presence of juveniles < 12 months could be reliably assumed, respectively, when no volunteer and when all volunteers reported a presence of a young. In contrast, the presence of very young individuals and the absence of juveniles appeared more difficult to ascertain from volunteers’ classification, given how the classification task was presented to them. Volunteers’ classification allows a moderately accurate but quick sorting of photograph with/without juveniles. We discuss the limitations of using citizen science camera-traps data to study breeding phenology, and the options to improve the detection of juveniles.


Introduction
Camera trapping is increasingly used for ecological monitoring due to its low cost, relative ease of use, and the variety of data it can supply (O'Connell et al. 2010). For instance, camera trap data are used to study species' occupancy and co-occurrence (Anderson et al. 2016), population dynamics (Karanth et al. 2006), or individual behaviour (e.g. vigilance behaviour: Chamaillé-Jammes et al. 2014, or diel activity patterns: Luo et al. 2019). A potential drawback of camera traps is the huge amounts of data that are generated (> 7 millions photographs for the Snapshot Serengeti initiative alone). Ecologists have realized that the benefits of continuously collecting data in the field can quickly be negated by the burden of database management, and visual inspection and analysis of photographs to record the desired data (Wearn and Glover-Kapfer 2017).
To process such a massive amount of information in a reasonable time, scientists have sought the help of non-specialists who perform diverse tasks like counting objects in photographs, describing picture content or identifying animal and plant species (e.g. McShea et al. 2015). Initially, part of the scientific community was sceptical about citizen science, in particular questioning data quality (Riesch and Potter 2014). However, volunteers have sometimes proven to be as efficient as experts, as for instance for the identification of large herbivore species in savanna ecosystems (Swanson et al. 2016). More recently, the advances in deep learning have led computers to become as efficient as people at identifying species, and, sometimes, behaviour classification problems (Norouzzadeh et al. 2018). However, human judgment is still valuable in particular cases where too little data are available to train models (e.g. active learning, Joshi et al. 2009), or when differences among the objects to be classified are subtle and classification requires some subjectivity (Miele et al. 2020, in prep.).
We believe this is the case for age classification problems, for which to the best of our knowledge the number of precisely labelled pictures is currently too small to allow an efficient and reliable automatization of the process.
Under the assumption that the detectability of juveniles and adult females segments of the population is not biased by camera traps methodology, classifying individuals into ageclasses such as juveniles and adults would allow estimates of key demographic parameters (e.g. reproductive rates) or life-history traits (e.g. breeding phenology), which are essential in estimating population growth rates (Sibly and Hone 2002). For instance, Ogutu and colleagues (2008) highlighted that rainfall influences the abundance of several large herbivore species of the Mara-Serengeti ecosystem, by acting differently on each segment of the populations at specific periods of the year. Furthermore, it would facilitate the study of the relationships between population characteristics and their environments such as between birth phenology, diet and food resource availability (Sinclair et al. 2000), or their potential evolution in the context of climate change (Visser and Both 2005). Until now, the study of those key demographic parameters has been mainly conducted by direct field observations (e.g. Côté andFesta-Bianchet 2001 in mountain goats, Plard et al. 2013 in roe deer).
However, this methodology still requires an intensive and often costly field effort. Identifying and counting juveniles from camera traps could reduce this field effort, or allow larger-scale or longer-term studies, as suggested by Hofmeester et al. 2019, but could also be timeconsuming because of tedious data processing. With the help of citizen science, data handling time could be substantially reduced, but the accuracy of non-specialists in detecting juveniles of large mammals from photographs has not yet been explored.
Here, we evaluate the usefulness of camera trap data annotated by citizen scientists online to assess the presence of juveniles of large herbivores in the photographs. We use  (Swanson et al. 2015), one of the world's largest citizen science programs, on a subset of the data. We focus on the detection of juveniles in three species found in the Serengeti National Park, Tanzania with contrasting social and neonatal behaviours: topi (Damaliscus jimela), kongoni (Alcelaphus cokii) and Grant's gazelle (Nanger granti). We first evaluate the agreement between trained observers from our research team, and then test the ability of the volunteers to detect juveniles by comparing their classification with ours. We predict a better agreement between trained observers for the youngest age class because determination criteria are clearer and easier to identify than for older juveniles (e.g. absence of horns). Consequently, the level of agreement should decrease for age classes that are based on more subjective or difficult-to-assess criteria (e.g. shape of the horns). Under the hypothesis that volunteers could generally identify juveniles correctly, we expect a positive relationship between the proportion of volunteers reporting a juvenile on a photograph and the probability of the actual presence of a juvenile, as determined by the trained observers. Again, we expect the correlation to be stronger for the youngest age class of juveniles because they are easier to differentiate from adults. Across species, we expect a higher agreement and correlation for topi and kongoni than for Grant's gazelle because the former are larger, live in smaller groups and have similar body growth rate between males and females (Wilson and Mittermeier 2011), hence reducing the risk of confusion between young males and older females. Overall, our study details the strengths and weaknesses of camera trap data, in particular when classified by citizen scientists, for the study of reproductive traits such as reproductive rates or breeding phenology.

Study site
The surveyed area within the Serengeti National Park, Tanzania, is composed of open plains and savanna woodlands. Rain mostly occurs between November and June (wet season), with mean annual rainfall increasing from 500 mm in the southeast to 1,100 mm in the northwest.

Camera trap data
The Snapshot Serengeti camera trap grid was deployed in 2010 in Serengeti National Park, Tanzania, to monitor lions and their prey, though the bycatch of numerous other species has proven useful as well. Running continuously since 2010, the grid spans 1,125 km² in the center of the park. We used data provided by Snapshot Serengeti camera survey recorded between July 2010 and April 2013 (Supporting information 1). The camera traps were set ~ 50 cm above ground in the centre of a 5 km² grid cell. The detection radius was approximately 45° and their field of view about 14 m (Swanson et al. 2015). Cameras took a rapid series of three pictures upon trigger of the motion and heat sensors ("capture event" in Swanson et al. 2015, hereafter called a "sequence" following Meek et al. 2014) in a few seconds interval, with a one-minute delay between sequences.

Choice of studied species and sorting steps of the dataset
Among the many large herbivore species present in the study site, we selected topi, kongoni and Grant's gazelle due to their contrasting biology and characteristics useful to assess the age classes of individuals. The criteria considered were (1) number of available sequences, (2) relatively small group size, (3) presence of horns in males and females, (4) relatively large size of the young (young of larger species are larger, and therefore criteria like horns are easier to detect), (5) contrasting anti-predator of behaviour of the young (Supporting information 2).
We selected the final dataset (n = 2,359 sequences) to conduct the analyses following several sorting steps, based on the detection of the species of interest and of the presence of juveniles from the initial complete dataset (n = 1,184,657 sequences) by the volunteers. We then corrected this dataset thanks to the trained observers reclassifications (see details in Table 1). Assessing the presence or absence of juveniles in photographs All Snapshot Serengeti photographs have been uploaded to the online citizen science platform "The Zooniverse" (www.zooniverse.org) to be classified by volunteers. sequences for which no volunteer had reported a juvenile, as these were too many (n = 2,018, 11,141 and 6,628 for topi, kongoni and Grant's gazelle respectively) to be reviewed individually. However, we checked a subset of them (n = 1,000 for each species), and the chance that a trained observer observed a "true" juvenile (i.e. of less than 12 months old, see age classes definition below) was < 6.5% for all three species studied. We did not correct observations for recaptures of the same individuals as we were only interested in the ability of volunteers to detect the presence of juveniles on the sequences, but not the actual number of juveniles.
Three of us (LT, LK and MC), considered here as trained observers, searched all sequences retained for juveniles, which were assigned to an age class when detected. We used previously published morphological descriptions of the studied species (e.g. shape and size of horns, size relating to the adult; Supporting information 3) to identify and age individuals. We distinguished between (1) juveniles < one month, (2) between one and six months, (3) between six and 12 months, (4) between 12 and 24 months, termed yearling hereafter and (5) individuals over 2 years old, termed adults hereafter. We defined age classes according to biological characteristics relevant to juvenile identification for each species (e.g. very young individuals for birth phenology identification, juveniles under one year for recruitment estimation). We recorded observers' classifications with Aardwolf software (Krishnappa and Turner 2014). Ultimately, we produced a dataset describing the presence or absence (M i,s,j ), in each sequence of individuals of any of the five age categories i, for the species s, by trained observer j.

Statistical analyses
We first evaluated the agreement between the three trained observers on the detection of individuals assigned to each age class for each species. We measured this agreement with the Fleiss' κ, implemented in the "raters" R package (Quatto and Ripamonti 2014). Fleiss' κ (Fleiss 1971) is the comparison of agreement between 2+ judges and the level of agreement expected by chance alone. It takes values between -1 and 1, values < 0 indicating an agreement lower to what could be expected by chance, values > 0 indicating a greater agreement than expected by chance, and values = 0 indicating an agreement close to random.
We tested for significance of the difference between the Fleiss' κ using a bootstrap procedure following Vanbelle and Albert (2008) (Supporting information 4).
We tested the relationship between the proportion of volunteers identifying at least one juvenile (Pv) and the probability that trained observers had identified at least one juvenile < 1 month (category i = 1). We also explored the same relationship with juveniles < 12 months (therefore including juveniles of categories 1 to 3 above). We fitted three generalized estimating equation models: one linear (see equation 1 below) and two piecewise models.
The first piecewise model was characterized by a slope on both sides of the threshold (equation 2), the second by a slope before and a plateau after the threshold (equation 3). We fitted piecewise models to search for a potential "saturation" phenomenon, whereby beyond a specific proportion of volunteers the probability to effectively observe a juvenile does not increase anymore. We also fitted the null model for comparison. All the models were fitted for the two age classes and for each species individually, using the wgee function implemented in the "wgeesel" R package (Xu et al. 2018). We selected the best model using

Results
From the Snapshot Serengeti monitoring program, we obtained 281, 1,290 and 1,095 sequences for topi, kongoni and Grant's gazelle respectively (Fig. 1a) sorting step "Young sorting 1" and Table 1). Our selection process led to a dramatic drop in the number of usable sequences (i.e. those containing at least one < 1 month old juvenile), with a tenfold reduction at the species level: it only remained between 7.7% and 18.1% of the complete dataset available for the species (n = 59 ± 9 SD, 137 ± 33 SD and 58 ± 3 SD for each species respectively, Fig. 1a) sorting step "young sorting 3" and Table 1 sequences happened during the phases of selection of sequences containing the studied species and then juveniles (Fig. 1a) and Table 1). Fig. 1b) suggests that we would obtain similar results concerning the loss of sequences after selection for sequences containing juveniles for any large herbivore species recorded in the Snapshot Serengeti program.   The three trained observers identified 993 ± 49 SD, 3,020 ± 180 SD and 2,128 ± 224 SD individuals of any age class. On average, they could not assign an age class to ~11% (n = 118 ± 68 SD), ~20% (n = 775 ± 256 SD) and 32% (n = 1,026 ± 492 SD) of the individuals for topi, kongoni and Grant's gazelle respectively, a significant between-species difference ( ² = 262.83, df = 2, p < 0.001).
For all species, and as expected, the agreement between trained observers was highest for the youngest age class, then declined with age until the yearling class was reached (Fig.   2). Yearlings were reasonably well classified in kongoni, but overly misclassified in topi and Grant's gazelle (Fig. 2). As expected, the agreement among trained observers was the highest for the youngest age class of juveniles, but also for the second age class and the adults, with Fleiss' κ values almost always > 0.61 (denoting a substantial agreement, following Landis and Koch 1977), except for Grant's gazelle aged between one and six months and adult topi.
Agreement among observers was the highest for topi aged < 1 month old (Fleiss' κ = 0.78 [0.72;0.84]). In support of our prediction, we observed the lowest agreement for juveniles aged six months and older, and more obviously so for Grant's gazelle. Agreement among observers concerning the three first age classes pooled together (representing the juveniles) was very good, with Fleiss' κ values largely > 0.61 for all the species. The same holds true for juveniles between one and 12 months old when pooled together. Our results were globally consistent among the three species studied, with the highest agreement for topi, and the least for Grant's gazelle (Fig. 2). All estimated Fleiss' κ were significantly different from an agreement obtained by chance (Supporting information 4). The model best describing our data for the age class < 1 month was the linear model for topi, the piecewise model with slopes on both sides of the threshold for kongoni and the piecewise model with slope before and plateau after the threshold for Grant's gazelle ( Table   2). The model best describing our data for the age class < 12 months was the threshold model with slopes on both sides of the threshold for topi, kongoni and Grant's gazelle (Table 2).  and 100% of volunteers identifying young in the sequences, the probability decreased (Fig. 3   d), e), f)). When investigating the presence of juveniles < 1 month, the probability that a juvenile was actually present was never greater than 0.697 (Fig. 3 a), b), c)). On the other hand, the model predicted that when no volunteer reported the presence of juveniles, the probability that there was juveniles < 1 month was under 1.8%, but there was at least 9.6% chance to observe a juvenile < 12 months. Contrary to our expectations, the piecewise model characterized by two slopes was almost always selected as the best model for the three species and both age classes. This denotes a sudden change in the rate of detection from a certain percentage of volunteers. The detection rate decreases for kongoni and Grant's gazelle for young < 12 months beyond 12 and 41% of volunteers voting for the presence of young respectively. The number of volunteers who did classify a photograph was independent from the probability to detect a young.

Discussion
Our study reveals the strength and weaknesses of using citizen-based assessment of age classes on camera trap pictures. Clearly, citizen involvement through an online platform has been critical for the classification of the millions of photographs collected by the Snapshot Serengeti initiative. Also, despite leaving it to the volunteers to decide what a juvenile looks like, volunteers' classification allows a rough, moderately accurate, but quick sorting of sequences with/without juveniles.
Our study makes clear that, for the species studied and given the minimal guidelines given to volunteers, the absence of very young (< 1 month) individuals on pictures can be reliably assumed when no volunteer reports a presence. The sequences almost never labelled by volunteers as containing "young" were very unlikely to contain young under one month of age according to the trained observers. This suggests that volunteers rarely miss very young juveniles. This likely occurs because very young juveniles are distinctively smaller than adults, and possibly also because very young mammals share some physical characteristics such as relatively large eyes, long legs, short and rounded nose, all belonging to Kindchenschema, known to be very attractive stimuli for humans (Brosch et al. 2007, Golle et al. 2013. By contrast, the presence of very young individuals appears more difficult to ascertain from volunteers' data, and this apparently comes from the lack of guidelines given to volunteers. Indeed, consistent with the idea that volunteers easily identify very young individuals, a large consensus among volunteers around the presence of a very young juvenile could be a reliable indication of actual presence, but only when no older juveniles are present (compare Fig. 3 a), b), c) & Supporting information 4 - Fig. A). Unfortunately, the absence or presence of older juveniles cannot currently be known without a reassessment of the pictures because volunteers were not asked to differentiate between juvenile age classes. Therefore, the presence of very young juveniles remains difficult to ascertain. On the other hand, the presence of juveniles, irrespectively of their age, can be reliably assumed when all volunteers agree about a presence (Fig. 3 a), b), c)), especially for topi and kongoni: the model predicted that when all the volunteers reported the presence of juveniles, the probability that there was juveniles < 12 months was indeed at least 99.8%. This time, the absence of juveniles regardless of their age is less accurate, meaning that they are missed quite often. As they grow, juveniles become increasingly similar to adults, and are more likely to be mistaken for the latter by volunteers. Anyhow, we emphasize that detectability of juveniles is not equal between species (Fig. 3).
Among the three trained observers, we found the best agreement in the detection of age classes for a given sequence for topi and kongoni, suggesting that they are the easiest  individuals Grant's gazelle in the 1970s, whereas the numbers of wildebeest and zebra were 720,000 and 240,000 individuals respectively (Sinclair and Norton-Griffiths 1995). Finally, the anti-predator strategy of juvenile large herbivores, known as the hider-follower gradient (Lent 1974, Rutberg 1987, could influence the number of sequences containing very young individuals. While followers become active and stick with their mother just a few hours after birth, hiders stay concealed in dense vegetation during their first weeks of life. The detection probability of hiders from camera traps should then be much lower than of followers, consistent with our observation of very young topi seen in a greater proportion than kongoni and Grant's gazelle. Volunteer classification provides information that can reliably be used to infer the presence of juveniles < 12 months or the absence of juveniles < 1 month, as these annotations appear robust. However, because volunteers seem able to discriminate between individuals of less than one month old and the rest of the juveniles even in the absence of any stated criteria, more precise results could be achieved by asking them to differentiate between two age classes of juveniles, such as "juvenile" and "newborn". The level of agreement between trained observers in the classification of age classes according to species is also a good indication of what kind of tasks could be successfully conducted by volunteers. When this agreement is low (e.g. in the identification of topi and Grant's gazelle yearlings), one could not expect volunteers to properly identify such an age class. We advise to limit the number of classes the volunteers are asked to identify, and to focus on the most recognisable ones.
Another way to improve results generated via citizen science platforms could be the inclusion of detailed information, as for species identification (Swanson et al. 2015). When a volunteer detects a young on his/her photograph, he or she could be prompted with comprehensive keys to age juvenile from its morphology along with a set of reference pictures or drawings.In general, however, we would recommend to ask citizen scientists to identify newborns, i.e.
individuals under one month of age vs. other juveniles. To identify newborns of bovid species with horns like in our study, volunteers would have to look for the smallest individuals (the head does not come above the back of the female, Ogutu et al. 2008), with no evidence of bud horns, with specific coat color (e.g. darker coat color in Grant's gazelle in our study, or lighter coat color in wildebeest) or even with umbilical cord remnants. One difficulty is to adapt the different criteria to the every species studied.
We finally suggest evaluating volunteers' classification skills by presenting them with images of individuals of known age (captive or tagged animals for instance) and assessing their accuracy compared to labels provided by experts. Volunteers could then be assigned classification tasks adapted to their skills (e.g. species identification would belong to the Zooniverse continue to create new modes of annotation that best leverage the public's interest in contributing to research, and this is a logical next step for the Snapshot Safari initiative.
Overall, we find that by closely investigating the data already collected by volunteer-based programs, data collection procedures can be adjusted to enhance the contributions of citizen scientists to scientific research and conservation efforts.