Least-biased Estimations of True Species Richness of Butterfly Fauna in Sub-urban Sites around Jhansi (India) and the Range of Inter-annual Variation of Species Richness

As a rule, most biodiversity inventories at local scales remain more or less incomplete, when dealing with relatively speciose taxonomic groups, such as butterfly in tropical regions. Yet, it remains possible to take maximum advantage of partial inventories and to develop reliable predictions by extrapolating the species accumulation curves beyond the already achieved samplings. Besides, due to the wide diversity of available estimators of total species richness, selecting for the less-biased estimator and the associated expression of the species accumulation curve is desirable. Accordingly, the “least-biased extrapolation procedure” is recommended in this respect. Least-biased extrapolation procedure was applied to nine inventories carried on by Ashok Kumar in (sub-) urban sites in the vicinity of Jhansi (Uttar Pradesh, India), thus providing more accurate evaluations of remnant butterfly species richness in these sites. The range of estimated sampling completeness of inventories was comprised between 65% and 99%, depending on sites and years and the estimated true species richness was comprised between 25 species (along Highway in Original Research Article Béguinot; AJEE, 2(1): 1-12, 2017; Article no.AJEE.32040 2 2010) and 44 species (Jhansi Fort in 2011). Importantly, the levels of sampling completeness prove to be poorly correlated with sampling size. This highlights the fact that, contrary to still a current opinion, comparisons between levels of species richness may well remain irrelevant, even when made at a same sampling size (for example by using appropriate “rarefaction” procedure). Four, out of the nine studied inventories, were conducted at two same sites for two successive years (2010-2011) and, thus, provide opportunity to evaluate the range of inter-annual variations of true species richness of butterfly fauna in this sub-urban context. Inter-annual variations within the range 24% to 48% were registered, according to sites.


INTRODUCTION
Incomplete inventories of biodiversity are likely doomed to become increasingly frequent, as surveys progressively address new taxonomic groups more difficult to cope with, in particular those groups giving rise to species assemblages with high number of species [1,2,3]. In addition, more commonly investigated taxonomic groups, also, are likely doomed to remain more or less incompletely surveyed at the local scale, due to sampling efforts often being far less intensive at these small scales than they usually are through wider areas.
Accordingly, the vast majority of ongoing published inventories are admittedly more or less incomplete. This incompleteness may be partially compensated (yet in numerical terms only) by the estimation of the number ∆ of "missed" (i.e. unrecorded) species, thereby leading to the evaluation of the total species richness S t of the sampled assemblage of species. Many different (nonparametric) estimators of the number ∆ of "missed" species have been proposed in recent decades (reviewed in [1,2]). As expected, these different types of estimators provide divergent evaluations of ∆, without any consensus having ever been reached in favor of one or the other of those estimators, supposedly being more accurate than the others. And the commonly accepted suggestion to consider all these divergent estimates without being able to choose between them [3] remains frustrating. This, in turn, probably contributes to explain why many partial inventories are still not extrapolated numerically, as would be highly desirable, in order to derive a reliable estimation of the total species richness. Indeed, evaluating the richness of species assemblages is highly desirable, at least in relative if not in absolute terms. Note that even in relative terms, a relevant comparison of species richness between two or several assemblages requires that inventories are actually compared at a same level of completeness. A mandatory condition that neither standardised sampling procedures nor rarefaction to a same sampling size may actually secure, contrary to what is still too often asserted in literature (and this, simply because the level of completeness reached at given sample size is tightly dependent on the degree of heterogeneity of species abundances distribution which may usually differs between sampled assemblages). Now, a rational method of selection of the leastbiased estimator (among the most commonly referenced ones) has recently been developed [4,5], enlarging the path initiated by BROSE et al. [6]. This newly derived procedure avoids the above mentioned frustration of having to deal with divergent estimates without knowing how to choose the most accurate of them all.
Hereafter, advantage is taken from using this procedure to extrapolate a series of inventories of Butterflies in and around the City of Jhansi (Uttar Pradesh, India) carried out and published by Ashok KUMAR of Lucknow University, making use of the recorded data published by this author [7,8]. Thereby, reliable estimates of the total species richness of each of the nine sites are expected, thus providing a more accurate appreciation of the true local diversity of butterfly fauna. Moreover, reliable predictions of the additional sampling efforts required to improve the completeness of inventories are derived from the least-biased extrapolation of samplings. At last, these extrapolations are also considered to address appropriately several questions of more general interest, in particular the evaluation of the degree of inter-annual variability of true species richness in butterfly faunas.

MATERIALS AND METHODS
Nine inventories of butterfly fauna in Jhansi and the vicinity (Uttar Pradesh, India) have been conducted during years 2010 and 2011 and the results published in detail by A. KUMAR, including the respective abundances of each recorded species [7,8]. Accounting for species abundances is of prime interest in the perspective of the extrapolation of partial samplings, since abundance data provides estimates of the numbers f 1 , f 2 , f 3 ,…, f x , … of those species recorded respectively 1-, 2-, 3-, …, x-times during the realised partial sampling. These numbers are required, in turn, to reliably extrapolate the species accumulation curve, as explained below.
All details relative to the environmental context of each of the nine inventories and the list of species with their respective abundances are provided on-line with free access [7,8] and, accordingly, will not be recalled here. Sampling

Numerical Extrapolation beyond Achieved Sampling Size
As sampling size increases, the number R of recorded species is monotonically growing, at first rapidly and then less and less quickly. The so-called 'Species Accumulation Curve' R(N) accounts for the growth kinetics of the recorded species richness R with increasing sampling size N (N: typically, the number of observed individuals). The mathematical expression (and thus the details of the shape) of the Species Accumulation Curve are dependent upon both the total species richness of the sampled assemblage of species and the degree of heterogeneity of the species abundance distribution within the sampled assemblage of species. This would apparently make the extrapolation of the Species Accumulation Curve rather difficult to compute, since both preceding factors are unknown a priori. Yet, the numbers f 1 , f 2 , f 3 ,…, f x , …, of those species recorded respectively 1-, 2-, 3-, …, x-times during sampling are directly dependent also upon the total species richness and the degree of heterogeneity of the species abundances. This explains why these numbers f 1 , f 2 , f 3 ,…, f x , … may serve as an appropriate numerical basis from which to extrapolate the Species Accumulation Curve, beyond the actual size of the sample under consideration. In particular, the most commonly used estimators of the number of unrecorded species (i.e. 'Chao' and the series of 'Jackknife') are computed from the recorded value of the numbers f x [1]. In practice, a problem remains however: as already mentioned, each of these different types of estimators provides a substantially distinct estimate and none among these estimators reveals being consistently more appropriate. Accordingly the traditional practice has become to consider together all of them without making any choice [3], an admittedly frustrating situation! Yet, it has been shown recently that although none of the available estimators consistently remains the more accurate, each of them may, in turn, reveal being the less biased, depending on the value taken by f 1 as compared to the other f x>1 [4]. Accordingly, in practice, the most appropriate -i.e. the least biased -estimator of the number of unrecorded species may be selected by comparing the value of f 1 to the values of the other f x >1 [4,5]. Selecting this way the least-biased type of estimator hereby provides the best possible estimate of the number ∆ of "missing" species and, in turn, the best estimate of the total species richness S t of the partially sampled assemblage. In addition, the less biased expression for the extrapolation of the species accumulation curve is straightforwardly derived.
In practice, Appendix 2 provides (i) the expressions of ∆, S t and R(N), according to each of the most commonly used types of estimators and (ii) the key to select the less biased estimator and, thereby, the less-biased expressions for ∆, S t and R(N). Also, in order to reduce the influence of drawing stochasticity, which affects the as-recorded values of the f x , it is advisable to regress the as-recorded distribution of the f x versus x (cf. Appendix 1).

Least-biased Estimations of the Total Species Richness and of the Extrasampling Effort Required for Improving Sampling Completeness
For the nine inventories of butterfly fauna carried out within and around Jhansi during years 2010 and/or 2011, Table 1 provides: the achieved sample size N 0 , the number of recorded species R 0 (= R(N 0 )), the type of least-biased estimator selected according to Appendix 2, the estimated number ∆ of missing species, the estimated true (total) species richness S t and the resulting estimate of sampling completeness. Figs. 1 and 2 provide a convenient graphic overview of the main results.
A few examples of extrapolations of the Species Accumulation Curves are also presented at Fig.  3 (where the extrapolations associated to six different types of estimators are compared for a same inventory) and at Fig. 4.
These extrapolations allow to gauge the predicted extra-sampling effort that would be required to obtain any given increment in recorded species richness. This is, in particular, of practical intest to make a rationnally informed decision, as regards the opportunity (or not) to extend further the inventory. Fig. 3 examplifies the importance of selecting the least-biased extrapolation among the set of possible extrapolations. Not only the predicted number ∆ of "missing" species may vary from simple to double, according to the considered extrapolation, but the predicted extra-sampling effort required to reach a given level of completeness may vary in still larger ranges, as examplified in Fig. 3.

Evolution of the Numbers of Species Recorded 1-2-3-4-5-times with Increasing Sampling Completeness
The series of the nine inventories of butterfly diversity conducted in and around Jhansi also offers the opportunity to address a rather theoretical, but nevertheless quite an interesting question: how does each of the numbers f 1 , f 2 , f 3 , …, f x , … of species recorded respectively 1-, 2-, 3-,…, x-times vary with increasing level of sampling completeness. The more straightforward way to cope with the subject would be, of course, to simply monitor progressive sampling (all along its actual progression), thus registering directly the variations of the f x with increasing sampling size. This, yet, has rather rarely been achieved. Yet, an alternative, indirect, procedure may be envisaged, however, as a possible surrogate. This would consist in taking the opportunity of a series of inventories addressing a similar type of fauna, each of them being conducted at a different level of completeness. Here, we dispose, precisely, of such a series with the nine inventories of butterfly fauna conducted in Jhansi. Accordingly, the dependence of each of the f x upon the level of completeness were computed directly from the regressed values of the f x (Appendix 1) and the values of R 0 , S t and R 0 /S t given at Table 1 stage. For completeness levels up to ≈ 80%, f 1 remains higher than all the other f x , but, then, falls successively under f 2 , f 3 , f 4 , f 5 , as completeness increases further. The same will happen, in turn, to f 2 , which, although still being in its ascending phase, is surpassed by f 3 at ≈ 94% completeness. Later on, after sampling has reached exhaustivity, the same process, of course, will happen sequentially for the successive f x . If the thresholds values of sampling completeness just mentioned above are case-specific, the global trends outlined above have general relevance and, indeed, conform to intuitive expectation. In particular, clearly highlighted here is the tight dependence between the level of sampling completeness and

Estimations of the True (Total) Species Richness of the Nine Sampled Assemblages
According to sites locations and years, the recorded species richness ranges from 21 to 38 species and the estimated true (total) species richness ranges from 25 to 44 species (Figs. 1 and 2, Table 1). Thus, most inventories prove being more or less incomplete (as was already expected from the remanence of various numbers of "singletons" among the recorded species), with the levels of completeness varying substantially according to sites and years: Table  1 and Figs. 1 and 2. The as-recorded species richness, thus, does not allow any reliable prediction regarding the true (total) richness. And this stands not only in term of absolute value but, as well, in term of relative value, i.e. when trying to compare several samples.

Fig. 5. The numbers f 1 , f 2 , f 3 , f 4 , f 5 of species recorded 1-, 2-, 3-, 4-, 5-times according to the estimated level of sampling completeness. As the nine inventories involved in the computation have different species richness S t , it is the ratio f x /S t , rather
than f x itself, which makes sense and is relevantly plotted in this Figure

Equality of Sampling Sizes Does Not Mean Equality of Sampling Completeness and, thus, Does Not Allows Any Reliable Comparison Between True Species Richness
Moreover, even made at a same sampling size, the comparison between inventories does not allow any reliable prediction of total species richness, in general. This is simply because, as a rule, completeness and sampling size are very poorly correlated, as demonstrated at Fig. 6. Thus, in general, the equality of sampling sizes does not guarantee the equality of the levels of sampling completeness. As equal levels of sampling completeness are required to make meaningful comparisons between partial inventories, it follows that, in any case, extrapolation is mandatory, prior to any further speculation! And, of course, least-biased extrapolations are especially desirable in this perspective.  comparisons between inventories having same sampling sizes (or with sampling sizes brought back to a same value using the classical procedure of "rarefaction" [1,3]). Yet, referring to the equalisation of sampling sizes, using "rarefaction" procedure still remains, regrettably, a common practice. For example, DE VRIES & WALLA [9] still implement "rarefaction" to compare inventories of butterfly fauna carried on at different height, areas and periods of investigation in an Ecuador tropical forest. And this, although the authors actually recognized that Species Accumulation Curves may well intersect (their Fig. 3). This, indeed, is no lack of rigour from the authors but, as mentioned above, an understandable reluctance to consider extrapolation methods, as long as no reliable procedure was made available to select the appropriate, minimum-bias solution among the wide series of nonparametric estimators of total species richness desribed in the literature. Now that such a selection procedure is made available, this reluctance is no longer justified! This clearly demonstrate the pitfalls attached to systematically trust in the validity of comparisons between inventories, even having same sampling sizes (or with sampling sizes brought back to a same value by the classical procedure of "rarefaction"): see reference [10] in particular. Only implementing a reliable extrapolation procedure allows to conclude relevantly (here, that Jhansi Fort actually has the larger total species richness).

Variations of True Species Richness between Two Successive Years
Least-biased extrapolations show that the levels of sampling completeness are globally higher in 2011 as compared to 2010 (Figs. 1 and 2 and Table 1). This contributes to the higher levels of recorded species richness in 2011 as compared to 2010, but higher true (total) species richness in 2011 may also be involved. This, at least, is the case considering the two sites -"University Campus" and "Jhansi Fort" -for which interannual comparisons are possible: the estimated total species richness S t actually reveals higher in 2011 as compared to 2010: Fig. 9. More precisely: * at "Jhansi Fort", the relative increment in total species richness in 2011 reaches 48% (incidentally, this is the same relative increment as for recorded species richness, simply due to the purely coincidental equality of sampling completeness for years 2010 and 2011 [as a rule, sampling completeness would more or less differs substantially between inventories at different dates or different locations]). * at "University Campus", the estimated increment of total species richness in 2011 is 24%, while the increment of recorded species is far larger: 52%, due to 2011 inventory being far more complete -90% as compared to 74% in 2010. Accordingly, the 52% increment of recorded species cumulates both the true increase of total species richness and the consequence of more complete sampling in 2011.
Thus, in conclusion, year 2011 actually shows an appreciable enrichment in butterfly true diversity (24% or 48%, as compared to 2010), at least for the two investigated sites: Fig. 9.
While monthly or seasonal variations of species richness along one year have been studied and reported rather often, yearly (inter-annual) variations of species richness have rarely been addressed otherwise. Yet, a continuous five years-long study of the variations of butterfly species richness at Mount Gariwang-San (S-Korea) has been reported [11]. After having subjected the crude, as-recorded data to least-biased extrapolation (BÉGUINOT unpublished data), the inter-annual variations of estimated true richness were quantified as +101%, -41%, +25%, -107% and +57%, successively, during the period 2010 to 2015:  On the other hand, a much more limited range of inter-annual variations (5% to 15% variations, yet based on crude, as-recorded richness only) is reported in [9] for the butterfly fauna of an undisturbed tropical forest in Ecuador.
The yearly variations of the butterfly fauna around Jhansi, between 2010 and 2011, thus fall in an intermediate range.

CONCLUSION
Incomplete inventories of local biodiversity, which are the ordinary rule in practice, at least for speciose taxonomic groups, may yet provide much more information than the crude recorded data would let suppose. Releasing this additional information requires, however, that inventories include also the respective abundances of the recorded species. Under this condition, extrapolating the Species Accumulation Curve, beyond the actually realised inventory, may easily be considered. Reliable extrapolation, however, is conditioned by the rational selection, for each inventory, of the least-biased estimator of total species richness, among the series of estimators made available in the literature. This selection may now be implemented using the procedure described in [4] and summarised in practice at Appendix 2.
In turn, such reliable extrapolations may allow to address a series of issues that could not have been answered properly otherwise, as shown above with a few examples.  Consider the survey of an assemblage of species of size N 0 (with sampling effort N 0 typically identified either to the number of recorded individuals or to the number of sampled sites, according to the inventory being in terms of either species abundances or species incidences), including R(N 0 ) species among which f 1 , f 2 , f 3 , f 4 , f 5 , of them are recorded 1, 2, 3, 4, 5 times respectively. The following procedure, designed to select the less-biased solution, results from a general mathematical relationship that constrains the theoretical expression of any theoretical Species Accumulation Curves R(N) [4,12,13,14]: