Inter-annual Variations of True Species Richness in a Subtropical Butterfly Assemblage: An Estimation Based on Least-biased Extrapolations of Species Accumulation Curves

More or less strong inter-annual variations of species richness are well known in insects in general and in butterfly in particular. Yet, such variations generally rely upon more or less partial samplings and are rarely reliably documented in relevant terms of “true” total species richness (instead of “simply” observed number of species). In addition, equity of sampling completeness between annual inventories proves hard to ensure in practice and had long remained difficult to check properly. Thus, as neither standardized sampling procedures nor rarefaction procedure at a same sampling size may reliably warrant the equity of sampling completeness, asymptotic estimations of total species richness should imperatively be implemented for each annual inventory. This implies that, among the different estimators of the number of missing species available in the literature, the leastbiased one should be adequately selected at first, for each annual inventory. As such a procedure of selection is now made available, it becomes possible to tackle relevantly the issue presented above. Applying this procedure to the field data recorded (and already published) by Lee et al. for butterfly assemblages at Mount Gariwang-san (south Korea), surveyed during several years, I show that (i) “true” inter-annual variations of species richness may indeed vary in a large range from simple to Original Research Article Béguinot; AJOB, 2(4): 1-16, 2017; Article no.AJOB.33876 2 double along successive years; (ii) annual total species richness represents only a limited part comprised between less than ≈ 40% and at most ≈ 80% of the potential species richness of the site (i.e. the true species richness along several years pooled together) .


INTRODUCTION
Most insects assemblages undergo both intraannual (seasonal) and inter-annual variations of species richness in temperate as well as tropical regions worldwide [1][2][3][4][5][6][7][8]. Due to their attractiveness and relative easiness of determination at the species level, butterfly assemblages have been particularly considered as a convenient model for studying temporal variations of insects' diversity [3][4][5][6][7][8]. Under tropical climates, the seasonal fluctuations of species diversity may be of variable amplitude while inter-annual variations are sometimes considered as clearly prominent [6]. Yet, the temporal fluctuations of species richness has been much more often addressed at the short time scale, i.e. seasonal variations [1][2][3][4]7] than during longer periods, i.e. substantially longer time series involving several successive years (see [8] however). Higher costs investments and shortage of available time to be devoted to such studies are arguably the main reasons for the current scarcity of long-term investigations. For the same reasons, the levels of completeness of the successively scheduled inventories are usually far from being exhaustive, which seriously hamper the significance to be given to as-recorded results, in terms of "true" variations of total species richness. Appropriate extrapolations of species accumulation beyond actually achieved sampling-sizes are thus needed to derive reliable estimates of the true total species richness that occurs at any stage of the time series. Therefore, the current difficulties to obtain relevant extrapolations, providing reliable estimates, urge to implement appropriate procedures involving least-biased extrapolations (thus following the path first initiated by BROSE and coworkers [9] and more recently improved by Béguinot [10][11]. Hereafter, I consider the field data issued from a long-term study (seven years) of butterfly assemblages at Mount Gariwang-san (South Korea) carried on by Lee and et al. [12]. As mentioned by the authors, each successive samplings performed along the seven investigated years, remain more or less substantially incomplete. Yet, they offer valuable crude data from which to extrapolate species accumulation and estimate total species richness each year, in order to be able to relevantly address the three following questions: (1) how large are the inter-annual variations of the "true" (i.e. total) species richness of the butterfly assemblages at Gariwang-san ?
(2) which proportion of the "overall potential butterfly richness" at the studied site (equated to the true species richness cumulated during the seven years, as a first approximation) actually occurs at any given year ?
Now, as the currently achieved inventories remain substantially incomplete, relevantly answering these two questions requires either (i) to further continue sampling efforts until closely approaching sampling exhaustively or, (ii) to extrapolate, with minimized bias, the species accumulation process, thereby being in capacity to provide reliable estimations to answer properly these two questions above. Option (i) would, of course, be ideal and, thus, to be privileged insofar as it proves compatible with the available resources in terms of time and costs. Yet, in common practice, option (ii) offers an economic, convenient and straightforward solution, to be considered at first. Then, before possibly considering the ideal option (i), a third question is to be addressed: (3) which additional sampling efforts would be required to closely approach exhaustively (say, for example, reaching 95% sampling completeness), so as to be in capacity to predict, on a rational basis, the practicability (or not) of option (i): further continue sampling efforts.

MATERIALS AND METHODS
Lee and coworkers [12] conducted a series of samplings of the butterfly fauna at Mount Gariwang-san (South Korea), during 1987 and from years 2010 to 2015. All details relative to the sampling procedure, the environment context, the list of recorded species with their respective abundances are provided in [12] with free access and, accordingly, will not be recalled here. Accounting for species abundances is of prime interest in the perspective of the extrapolation of partial samplings, since abundance data provides estimates of the numbers f 1 , f 2 , f 3 , f 4 ,…, f x , … of those species recorded respectively 1-, 2-, 3-, …, x-times in the realised partial sampling. These numbers are required, in turn, to reliably extrapolate the Species Accumulation Curve, as explained below. As substantial numbers of singletons (i.e. species recorded only once) are retained in the inventories performed during each of the seven years (as well as in the seven inventories pooled together), it follows that all these inventories remain substantially incomplete. Extrapolating the species accumulation, beyond the actually achieved sampling sizes, is thus necessary to predict at best the true total species richness of butterfly assemblage for each year.

Numerical Extrapolation of Species Accumulation beyond the Achieved Sampling Size
As sampling size increases, the number of recorded species is monotonically growing, at first rapidly and then less and less quickly. The so-called 'Species Accumulation Curve' R(N) accounts for the growth kinetics of the number of recorded species R with increasing sampling size N (N: typically, the number of observed individuals during sampling). The mathematical expression (and thus the details of the shape) of the Species Accumulation Curve are dependent upon both the total species richness of the sampled assemblage of species and the degree of heterogeneity of the species abundance distribution within the sampled assemblage of species [13]. This would apparently make the extrapolation of the Species Accumulation Curve rather difficult to compute, since both preceding factors are unknown a priori. Yet, the numbers f 1 , f 2 , f 3 , f 4 ,…, f x , … of those species recorded respectively 1-, 2-, 3-, 4-, …, x-times during sampling are directly dependent also upon the total species richness and the degree of heterogeneity of the species abundances. This explains why these numbers f 1 , f 2 , f 3 , f 4 ,…, may serve as an appropriate basis from which to extrapolate the Species Accumulation Curve, beyond the actual size of the sample under consideration. In particular, the most commonly used estimators of the number of unrecorded species (i.e. non-parametric estimators such as 'Chao' and the series of 'Jackknife') are all computed from the recorded values of the first numbers f x [14]. In practice, a problem remains however: as already mentioned, each of these different types of estimators provides a substantially distinct estimate and none among these estimators remains consistently the more appropriate. Accordingly the traditional practice has become to consider together all of them without making any choice [15], an admittedly frustrating situation! Yet, it has been shown recently that although none of the available estimators consistently remains the more accurate [9], each of them may prove, in turn, being the less biased, depending on the value taken by f 1 as compared to the other f x>1 [10]. Accordingly, in practice, the most appropriate -i.e. the least biased -estimator of the number of unrecorded species may be selected by comparing the value of f 1 to the values of the other f x for x >1 [10][11]. Selecting this way the least-biased type of estimator thereby provides the best possible estimate of the number Δ of "missing" species and, in turn, the best estimate of the total species richness S t of a partially sampled assemblage. In addition, the least biased expression for the extrapolation of the species accumulation curve R(N) is straightforwardly derived.
In practice, the formulations summarized in Appendix 1 provide: (i) the expressions of Δ, S t and R(N), according to each of the most commonly used types of nonparametric estimators and (ii) the key to select, among these estimators, which one reveals the less biased and, thereby, which expressions for Δ, S t and R(N) are the less-biased. In practice, the selection in favour of the less-biased estimator proceeds among a limited but rather large range of nonparametric estimators, including, not only, three commonly used estimators, Chao and Jackknife at orders 1 and 2, but also the following Jackknife at orders 3, 4, 5 : see reference [10] and also Appendix 1 for more details). Here, for the seven investigated years, the estimators that were selected, for each years, as being the less biased were either Jack-3, Jack-4 or Jack-5. That is, seeking for bias reduction had imposed, here, to rely only on those kind of estimators which remain uncommonly used. In turn, this highlights that conventional practices of estimations, still in current use however, may occasionally lead to substantial bias, as already cautioned, in particular, by Brose et al. [9].
Also, in order to reduce the dispersive influence of drawing stochasticity (which inevitably affects the as-recorded values of the f x ), it is advisable to regress the as-recorded distribution of the numbers f x versus x.

Extrapolations and Asymptotic Estimates for Each Sampled Year
The estimated numbers Δ of missing (= unrecorded) species, according to each of six types of nonparametric estimators (Jackknife at orders 1 to 5 and Chao), are provided at Table 1 for the seven investigated years. Note that the selected least-biased estimator may differ from year to year: Jackknife type estimators at order 3, 5, 3, 5, 4, 3, 5 are respectively selected for years 1987, 2010, 2011, 2012, 2013, 2014, 2015; while Chao and Jackknife 1 & 2 were never selected. And selecting the least-biased estimator in each case is important since the estimated number of missing species may vary in a wide range for a same sample, typically from simple to double according to estimator type, as shown in Table 1. The least-biased estimations of the 'true' (total) species richness, S t = R 0 + Δ, and the level of sampling completeness (= R 0 /S t ) are derived immediately: Table 2. Note, incidentally, that the levels of sampling completeness achieved for each of the seven studied years are very weakly related to either sampling-size or total species richness (Figs. 1 and 2).
In addition to the asymptotic estimates of Δ, S t and R 0 /S t , the full range of extrapolation of the species accumulation curve was computed for each type of estimator (the expression of the least-biased extrapolation of the species accumulation curve being, of course, the one associated to the least-biased estimator).   [16] and more extensively argued by Béguinot [17].

Estimation of the Additional Sampling Efforts Required to Improve Completeness
One major interest of extrapolating species accumulation curves is the possibility to predict the additional sampling effort that would be required to reach any given level of sampling completeness beyond the already achieved completeness. This prediction provides, in turn, a rational basis to decide whether or not it seems worth to continue any further the sampling operations, putting in balance the additional effort required and the expected benefit in terms of newly recorded species. And, in this respect, as for the asymptotic estimates above, selecting the least-biased extrapolation is very important to derive reliable predictions, as shown in Table 3 which exemplifies the large scatter of predictions according to the type of estimator involved. Accordingly, Fig. 7 and Table 4 highlight the least-biased estimates of the additional sampling efforts that would have been required to reach different higher levels of sampling completeness, for each of the seven investigated years.

Inter-annual Variations of the Total Species Richness of the Butterfly Assemblage at Mount Gariwang-san
Figs. 8, 9, 10, 11 provide graphical representations of the data given in Table 2, which help visualizing the range of inter-annual variations of the true species richness of butterfly fauna at Mount Gariwang-san. The estimated total species richness per year may vary substantially (up to a factor 2) and apparently erratically, according to years: Fig. 10.
Still more interesting is the estimated proportion of the "potential species richness" of the site which actually occurs each year: Fig. 11. Here, the "potential species richness" is approximately equated, as a surrogate, to the total species richness estimated for the seven years (1987 & 2010 to 2015) taken together, i.e. 120 species. This proportion of the "potential species richness" occurring each year is comprised between a little less than 40% and a little more than 80% of the "potential species richness" of the site. In fact, this figure might be slightly overestimated since the cumulated number of species along seven years might well remain a slight underestimation of the real "potential species richness" of the site. Therefore, a figure comprised between one third and two third of the "potential species richness" might be perhaps more realistic.   1987, 2010, 2011, 2012, 2013, 2014, 2015) and for the pooled inventories.
Accordingly, approaching sampling exhaustively would require to consent considerably higher sampling efforts than actually performed. For example, aiming at reaching 95% completeness would have required to multiply the actually achieved sampling sizes by a factor ranging from 8 A final remark regarding the results above: no estimation of error and confidence interval are provided for the estimated total species richness at each years. This is admittedly regrettable but results from the current lack of formulation of standard deviation for Jackknife's estimators at higher orders (> 3), to my knowledge. This being said, I emphasise that reducing bias (as was the main object of the procedure implemented here) is a priority over trying to estimate the confidence interval.  Accordingly, any attempt to directly extrapolate the inter-annual variations of as-recorded species richness in terms of "true" inter-annual variations of total species richness would be at least questionable. Specifically, the involved issue would be: which part of the recorded inter-annual variations is really attributable to true inter-annual variations of total species richness and which part may (artificially!) results from inter-annual inequity of sampling completeness between successive annual inventories?

Fig. 10. Histogram of the estimated total species richness S t for the seven investigated years
and for all seven years pooled together Fig. 11. The estimated proportion (%) of the "potential species richness" of the site (estimated to 120 species [at least]) which actually occurs each year. Data directly derived from Fig. 8, i.e. on the basis of least-biased extrapolations of total species richness for each year. Along these seven years the proportion of the "potential species richness" of the site which actually occurs annually varies between 38% and 82% (average: 59%) One alternative solution that may be considered to answer this issue would be, of course, to further continue sampling operations, each year, so as to closely approach sampling exhaustively. While not strictly impossible this ideal procedure would have involved huge sampling efforts, year after year (Figs. 3, 4, 7 and Table 4), which, in practice, would have probably exceed available resources in terms of both time and costs expenditure.
It is therefore necessary to consider the less satisfying, but practically unavoidable alternative solution: to extrapolate numerically the species accumulation curves up to their asymptotic levels. Nonparametric estimators may help in this respect, provided the least-biased type of estimator is selected, separately, for each annual survey. This precaution proves being indispensable, as highlighted by the large scatter between the estimates issued from the different types of available nonparametric estimators: Figs. 5 and 6 and Tables 1 and 3.
Thus, implementing the selection procedure in favour of the least-biased extrapolation ( [10][11], see also Appendix 1) provides more reliable expectations of the "true" (total) species richness, year by year, for the surveyed butterfly assemblages of Mont Gariwang-san.
As expected, the estimated sampling completeness actually achieved vary according to years: from 54% (in 2015) to 72% (in 2012): Table 4. Incidentally, the level of sampling completeness proves weakly (and nonsignificantly) related to the sampling size ( Fig. 1) and independent of the estimated total species richness (Fig. 2). The selected least-biased estimator also differs from year to year: Jackknife estimators at order 3, 4, 5 were selected according to studied years and Jackknife order 2 was selected for the pooled inventories along the seven studied years, while Jackknife 1 and Chao were never retained ( Table 2). And the appropriate selection of the type of Jackknife estimator proves being important since the estimation of the number of missed species by inventories most often vary from simple to double (Table 1, Fig. 8) according to estimator type. In particular, this seriously question the traditional approaches that consist in either choosing a priori one given type of estimator, on the basis of its alleged particular appropriateness, or considering all types of estimators together without choosing among them [15].
The levels of total species richness, derived from estimates of the number of missed species, substantially vary from year to year: from 46 (in 2014) to 99 (in 2011). That is more than simple to double (Table 2 and Figs. 8, 10).
The degree of inter-annual variability of true species richness may be quantified by the "interannual species richness ratio", defined as the ratio (>1) of the estimated total species richness between two years, successive or not. The histogram of values taken by this inter-annual species richness ratio at Mount Gariwang-san along the years 2010 to 2015 (Fig. 12) provides an estimate of the probabilities for the interannual variability of total species richness being more or less large. Thus, the estimated probabilities that the ratio exceeds 1.2, 1.4, 1.6, 1.8, 2.0, are 73%, 46%, 26%, 13%, 7%, respectively (Fig. 12).
At last, one additional interesting question is which proportion of the "overall potential butterfly richness" of the site (equated, as a first approximation, to the total species richness considering the seven years pooled together) actually occurs at any given year ? From the estimations above, the proportion of the "potential species richness" at the sampled site which actually occurs annually varies between 38% and 82% (average: 59%), along these seven years of field survey (Fig. 11). This appreciably differs from what would be (inappropriately!) deduced from the crude (nonextrapolated) inventories: based on nonextrapolated data, the proportion of the "potential species richness" would vary from 28% to 58% (average: 42%): Table 2 and Figs. 8, 9. Now, from a strictly local point of view, the effective representation of only a limited part (38% to 82%) of the "potential species richness" during each year might raise the paradox of the local perpetuation of those species that are not locally represented during one or several successive years. The likely answer lies, of course, in the concept of meta-population, according to which more or less regular flows of exchanges actually take place between adjacent localities.

CONCLUSION
While the as-recorded annual species richness of the butterfly fauna at the sampled locality of Mount Gariwang-san was comprised between 29 and 61 species, the actual annual species richness was estimated ranging from 46 to 99 species, using the "least-biased" procedure of extrapolation of species accumulation curves. The total species richness for the seven years pooled together is estimated around 120 species. This figure may arguably be considered as likely approaching the "potential species richness" of the studied locality.
Besides, it is easy to verify that another consequence of these preferred ranges is that the selected estimator will always provide the highest estimate, as compared to the other estimators. Interestingly, this mathematical consequence, of general relevance, is in line with the already admitted opinion that all non-parametric estimators provide under-estimates of the true number of missing species [13,14]. Also, this shows that the approach initially proposed by BROSE et al. [9] -which has regrettably suffered from its somewhat difficult implementation in practice -might be advantageously reconsidered, now, in light of the very simple selection key above, of far much easier practical use.

N.B.2:
In order to reduce the influence of drawing stochasticity on the values of the f x , the as-recorded distribution of the f x should preferably be smoothened: this may be obtained either by rarefaction processing or by regression of the as-recorded distribution of the f x versus x.

N.B.3:
For f 1 falling beneath 0.6 x f 2 (that is when sampling completeness closely approaches exhaustivity), then Chao estimator may be selected: see reference [11].