Butterfly Survey does not Warrant the Equity of Sampling-completeness among Butterfly Families: A Case Study with Tropical Butterfly Fauna in Bhutan

Complete or even sub-complete inventories of biodiversity often remain well out of reach, especially when dealing with speciose taxonomic groups, such as insects in general and butterfly fauna in particular. Moreover, it is even uncertain that similar completeness levels may be reached among taxonomic subsets within a same surveyed taxonomic group. For example, one can wonder whether sampling butterfly fauna according to the commonly implemented “Pollard walk” consistently ensure a similar recording efficiency (i.e. similar sampling completeness) according to the different butterfly families. To address this issue and quantify These findings revealed that “Pollard walk” survey may lead to appreciable differences of sampling completeness level among the different butterfly families. Besides, this study provides a reliable estimate of the total species richness of butterfly fauna in the surveyed ecosystem, amounting to no less than 280 species concentrated in a rather reduced area along Sankosh River.


INTRODUCTION
Incomplete inventories of biodiversity become increasingly frequent, as surveys progressively address taxonomic groups giving rise to highly species-rich assemblages, as is often the case, for example, with invertebrate faunas. Accordingly, most of published inventories of biodiversity are admittedly more or less incomplete [1], at least at the local/regional scales. High costs investments and shortage of available time to be devoted to these studies are arguably the main reasons for the current scarcity of quasi-exhaustive investigations of the kind. As a consequence, reliably assessing the actually reached level of sampling completeness is a major, first issue for most biodiversity surveys. Fortunately, it now becomes possible to address this issue properly, using reasonably accurate numerical extrapolations of the species accumulation process.
A second issue, also directly related to sampling incompleteness, is whether or not the achieved level of completeness substantially differ between taxonomical subsets within the sampled taxonomic set under consideration. For example, would substantial inequity exist or not between the levels of sampling completeness among butterfly families, after a partial inventory of butterfly fauna from a given surveyed ecosystem? This second issue, also, is of great importance since a fair equity between the levels of sampling completeness between taxonomical subsets is obviously a necessary condition to authorize reliable comparisons between the levels of species richness of each of these subsets respectively.
In case of substantial inequity in this respect, a third issue would be to reliably estimate, for each subset, the extent of additional sampling effort that should be required, so as to finally reach a similar level of sampling completeness among all taxonomic subsets involved.
Hereafter, we address these three successive questions, in the course of the extrapolative analysis of the extensive survey of lowland forest butterflies of the Sankosh River Catchment (Bhutan) formerly published by Arun P. SINGH [2]. The derived expectations are challenged with those obtained using the empirical extrapolative model developed by Singh & Pandey [3] for the butterfly fauna of the Indian subcontinent. In particular, we try to highlight to what extent the Pollard Walk monitoring procedure actually allows (or not !) fairly even levels of sampling completeness among the different butterfly families (Papilionidae, Pieridae, Lycaenidae, Nymphalidae, Hesperidae), using the field data recorded by Arun P. SINGH.

MATERIALS AND METHODS
As just mentioned, Arun P. SINGH published a detailed assessment of part of the butterfly fauna in lowland forest in the vicinity of the Sankosh River Catchment (south-west Bhutan), as a part of a biodiversity assessment, prior to a planned hydroelectric power project in this area. The five butterfly families (Papilionidae, Pieridae, Lycaenidae, Nymphalidae, Hesperidae) that cooccur in the vicinity of the Sankosh River Catchment were surveyed simultaneously according to the usually implemented "Pollardwalk" procedure [4,5]. Other details regarding the sampling method and the recorded data are available in the on line publication of the author [2]. Accordingly, this information is not recalled here any longer.
The achieved survey (1731 observed individuals) records the occurrence of 213 species of butterflies. This survey, yet, remains uncomplete, considering the substantial proportion of singletons (species only recorded once in the course of the survey). Accordingly, this survey, as most of the kind, is eligible to the three questions evoked in Introduction.
In this perspective, an accurate estimation of the number of still unrecorded species and reliable extrapolations of the species accumulation curves beyond the actually achieved samplingsize, were to be implemented: (i) for the surveyed butterfly group as a whole and (ii) for each of the represented families within.

Numerical Extrapolation of Species Accumulation beyond the Achieved Sampling Size
As sampling size increases, the number of recorded species is monotonically growing, at first rapidly and then less and less quickly. The so-called 'Species Accumulation Curve' R (N) accounts for the growth kinetics of the number of recorded species R with increasing sampling size N (N: typically, the number of observed individuals during sampling). The mathematical expression (and thus the details of the shape) of the Species Accumulation Curve are dependent upon both the total species richness of the sampled assemblage of species and the degree of heterogeneity of the species abundance distribution within the sampled assemblage of species [1]. This would apparently make the extrapolation of the Species Accumulation Curve rather difficult to compute, since both preceding factors are unknown a priori. Yet, the numbers f 1 , f 2 , f 3 , f 4 ,…, f x , … of those species recorded respectively 1-, 2-, 3-, 4-, …, x-times during sampling are directly dependent also upon the total species richness and the degree of heterogeneity of the species abundances. This explains why these numbers f 1 , f 2 , f 3 , f 4 ,…, may serve as an appropriate basis from which to extrapolate the Species Accumulation Curve, beyond the actual size of the sample under consideration. In particular, the most commonly used estimators of the number of unrecorded species (i.e. non-parametric estimators such as 'Chao' and the series of 'Jackknife') are computed from the recorded values of the first numbers f x [6]. In practice, a problem remains however: as already mentioned, each of these different types of estimators provides a substantially distinct estimate and none among these estimators remains consistently the more appropriate. Accordingly the traditional practice has become to consider together all of them without making any choice [7], an admittedly frustrating situation! Yet, it has been shown recently that although none of the available estimators consistently remains the more accurate [8], each of them may prove, in turn, being the less biased, depending on the value taken by f 1 as compared to the other f x>1 [9]. Accordingly, in practice, the most appropriate -i.e. the least biased -estimator of the number of unrecorded species may be selected by comparing the value of f 1 to the values of the other f x for x > 1 [9,10]. Selecting this way the least-biased type of estimator thereby provides the best possible estimate of the number ∆ of "missing" species and, in turn, the best estimate of the total species richness S t of the partially sampled assemblage. In addition, the less biased expression for the extrapolation of the species accumulation curve R (N) is straightforwardly derived.
In practice, the formulations summarised in Appendix 1 provide (i) the expressions of ∆, S t and R(N), according to each of the most commonly used types of nonparametric estimators and (ii) the key to select, among them, the less biased estimator and, thereby, the lessbiased expressions for ∆, S t and R(N). Also, in order to reduce the influence of drawing stochasticity, which affects the as-recorded values of the f x , it is advisable to regress the asrecorded distribution of the numbers f x versus x. Table 1 details, for each butterfly family (Papilionidae, Pieridae, Lycaenidae, Nymphalidae, Hesperidae) and for all these families together: the number of recorded species R 0 , the selected least-biased type of nonparametric estimator, the estimated number ∆ of still unrecorded species, the resulting estimates of the "true" total species richness S t . From these estimations of true species richness S t are subsequently derived: the estimated level of sampling completeness R 0 /S t and the respective contributions (%) of each family to the overall butterfly species richness. In turn. The respective contributions (%) of each family to the overall butterfly species richness (Table 1) were compared to the results of the Singh & Pandey empirical model [3] for North-East India (including Bhutan) and for two regions close to Bhutan (Sikkim and Darjeeling districts) ( Table 2 and Fig. 4). Table 1. For each butterfly family (Papilionidae, Pieridae, Lycaenidae, Nymphalidae, Hesperidae) and all families together: the number of recorded species R 0 , the selected leastbiased type of nonparametric estimator, the estimated number ∆ of unrecorded species, the resulting estimates of the "true" total species richness S t , the estimated level of sampling completeness R 0 /S t , the respective contributions (%) of each family to the overall butterfly species richness, according to the estimated true species richness S t of each family. NB: the estimated species richness for all five families together (281 species) slightly differs from the sum of the estimated species richness of each of the five families (283), due to the fact that the involved estimators, although selected as least-biased, are not entirely unbiased however   Table 1   Table 3 and Fig. 5 showed a comparison between crude field data (recorded numbers of species per family) and the estimated total species richness per family obtained either (i) by selecting the "least-biased" nonparametric estimator for each family [9] or (ii) by using the empirical model proposed by Singh & Pandey [3] for north-east India. Clearly, crude field data provides unreliable appreciations of the true respective contributions of each family to the butterfly species richness and this, not only in absolute terms, but also in relative terms. On the other hand, the empirical model already provided significant progress, as it better approaches the values provided by the selected least-biased estimates taken as the best available data source.  in particular regarding the uneven degree of detectability of butterfly species encountered along transect walks (see for example [11]). In this respect, the aim of the present case study is no more than trying to quantify, as accurately as possible, the expected consequence of the limitation above, in term of unevenness of sampling completeness among butterfly families after a Pollard walk inventory. Besides, the objective of this contribution was not to discuss the reasons at the origin of the uneven levels of completeness highlighted here, but, more practically, to draw attention to the quantitative consequences of unequal degrees of representativeness of butterfly inventories, according to each surveyed families.

Estimations of Total Species Richness and Sampling Completeness per Family
Reliably estimating sampling completeness per family implies, first, to extrapolate appropriately the process of species accumulation so as to estimate, as accurately as possible, the true species richness. This was conducted using the recently developed procedure of selection of the "least-biased" nonparametric estimator of the number of still unrecorded species. The computed levels of sampling completeness were in partial agreement with the empirical expectations from the Singh & Pandey model [3] for North-East India. Yet, some discrepancy occurs with regard to the two families Pieridae and Hesperidae, which were substantially underestimated and overestimated, respectively, by the empirical model, when compared to the least-biased extrapolations. However, globally, the model is hereby comforted and may serve as an approximate convenient surrogate to extrapolations, for the Indian subcontinent. The worst solution, indeed, would be to rely only upon crude recorded data, which confirmed as being rather unreliable (Table 3 and Fig. 5). Incidentally, the Singh & Pandey model emphasises that species within the family Papilionidae are especially easy to observe, identify and sample [3]( p. 85), which complies with the exceptionally high level of exhaustiveness computed here for the inventory of this family.   each families to the overall butterfly species richness of the surveyed area may well be considered as a first progress only. Continuing the inventory further, so that the less wellsampled families finally attain a same desirable high level of completeness would be a desirable second step of progress desirable to be reached. This implies, however, to plan the additional sampling efforts that would be respectively required for each family. The respective levels of additional efforts may be reliably predicted only by extrapolating the species accumulation curve for each family specifically. This, in turn, is made possible using the procedure of selection of the "least-biased" extrapolation ([9,10]; Appendix 1) already implemented for true species richness estimations. A convenient planning of the additional sampling efforts, differentiated according to each family, may then be derived accordingly. Fig. 6 provides, for each family, the corresponding extrapolation and, thereby, the expected sampling effort required to reach any given level of inventory completeness. For example, aiming to reach a 90% sampling completeness would need to add ≈ 2000, 4200, 4500 and 6400 butterflies observations, considering families Nymphalidae, Pieridae, Hesperidae and Lycaenidae respectively.
At last, coming back to the first aim of the survey conducted by A. P. SINGH -that is the butterfly species richness at the Sankosh River Catchment (Bhutan) -the least-biased estimates (Table 1)

CONCLUSION
Whether or not the commonly implemented "Pollard walk" method ensures fairly even degrees of sampling completeness according to the different families of butterfly was a question which, until then, remained pending. The recently developed "least-biased" procedure suggested that the "Pollard walk" method may actually lead to substantially unequal levels of sampling completeness among the different butterfly families. This, at least, was demonstrated above for the (partial) survey of the butterfly fauna at Sankosh River catchment (Bhutan) under consideration. The estimated levels of sampling completeness ranged from 65% (Lycaenidae) to 99% (Papilionidae), with families Hesperidae, Pieridae, Nymphalidae respectively sampled at 69%, 75% and 80% completeness levels. Besides, this study has provided the opportunity to estimate at best the overall species richness of butterfly fauna around Sankosh River Catchment, which is estimated, at least, to 280 species, out of which 22, 30, 79, 113, 39 species belong respectively to the families Papilionidae, Pieridae, Lycaenidae, Nymphalidae, Hesperidae.
That is the respective ranges within which each estimator will benefit of minimal bias for the predicted number of missing species.
Besides, it is easy to verify that another consequence of these preferred ranges is that the selected estimator will always provide the highest estimate, as compared to the other estimators. Interestingly, this mathematical consequence, of general relevance, is in line with the already admitted opinion that all non-parametric estimators provide under-estimates of the true number of missing species [1,6]. Also, this shows that the approach initially proposed by Brose et al.
[8] -which has regrettably suffered from its somewhat difficult implementation in practice -might be advantageously reconsidered, now, in light of the very simple selection key above, of far much easier practical use.

N.B. 2:
In order to reduce the influence of drawing stochasticity on the values of the f x , the asrecorded distribution of the f x should preferably be smoothened: this may be obtained either by rarefaction processing or by regression of the as-recorded distribution of the f x versus x.