Extrapolation of total species richness from incomplete inventories: application to the gastropod fauna associated to coral reefs in ‘Mannar Gulf Biosphere Reserve’, India.

Tropical coral reefs are known to harbor considerable biodiversity, especially among invertebrates. Gastropod fauna, as an important component of this biodiversity, yet remains poorly surveyed across most tropical reefs. Moreover, the few published inventories are generally far from being exhaustive, as is almost inevitable in practice with species-rich faunas. Hence the necessity of implementing a numerical extrapolation of species accumulation, providing both (i) estimates of the total species richness of the partially sampled sites and (ii) a way to predict the This, in turn, calls for further effective sampling but, also, immediately raises the question of how far to extend the extra effort with, in return, a reasonable expected benefit, in terms of the ratio between the expected number of newly recorded species and the corresponding additional sampling effort required. The least-biased extrapolation of species accumulation curves proves a convenient tool for rationally addressing this important question.


INTODUCTION
Marine Gastropods not only constitute a major component of marine biodiversity per se; they also have significant economic importance: their flesh is commercially exploited for food, their shell is used for ornamental items or, more pragmatically, for lime extraction. Also some species (in particular among Conidae) feature emerging promises in medical applications. For all these reasons, addressing Gastropods species diversity, especially, now, at local scales, is considered important [1][2][3][4].
A major basic element to be considered, when quantifying biodiversity at a site, is the corresponding level of true species richness, that is, the total number of species that would be recorded by an ideally exhaustive inventory of the species assemblage under consideration [5][6][7][8]. In particular, the level of total species richness at a given site usually features as a positive predictor for places of higher conservation value [9]. Yet, practical evaluation of the total species richness at a site is usually problematic. This is because, in most circumstances, local samplings are doomed to remain far from being exhaustive, due to limited available time devoted to each field investigation when concurrent ongoing investigations are becoming increasingly numerous. Hence the practical necessity to deal with "quick surveys" of biodiversity in the urgent context of ecological monitoring and conservation planning with, consequently, limited achieved levels of sampling completeness [10]. The trend for incomplete samplings is even more pronounced, as expected, when dealing with invertebrates groups comprising very large numbers of species including a significant proportion of more or less rare taxa.
In practice, incomplete inventories can, however, benefit from a kind of compensation, by implementing a "numerical extrapolation" of the species accumulation curve until reaching ideal exhaustivity. As a result, an estimate of the number of still unrecorded species and, in turn, the evaluation of true (total) species richness are derived. This is usually achieved by implementing one or the other among a series of nonparametric estimators [11][12].
Regrettably, however, many published accounts of local biodiversity still provide only the asrecorded data issued from incomplete inventories, without further estimation of the number of unrecorded species and, accordingly, without any evaluation of the total species richness of the sampled species assemblage. This restriction, which deprives inventories of an essential piece of information, likely results from the former confusing situation arising from the multiplicity of different kinds of nonparametric estimators, that provide divergent estimations. Yet, this confusing situation is now over, as a new procedure is, at last, become available, finally making it possible to select rationally the least-biased type among all these estimators.
Hereafter, we have implemented this new procedure to estimate, as accurately as possible, the total species richness of Gastropod fauna associated to coral reefs at three sites in 'Mannar Gulf Biosphere Reserve' (south India), based on recorded data reported by MOHANRAJ et al. [1]. These authors carried out samplings along coral reefs around three small islands in Mannar Gulf, opportunely taking care of noting the respective abundance of each sampled species, which is necessary to compute nonparametric estimators. The proportions of singletons (species recorded only once) amount around 20% at each investigated site, thus indicating substantially incomplete samplings [12], which justifies the implementation of a numerical extrapolation procedure.

Field Data
The coral reefs at short distance from the shores of three small islands, located in "Mannar Gulf Biosphere Reserve", namely 'Hare', 'Vaan' and 'Koswari', were sampled for associated gastropods fauna. The corresponding recorded data issued from these partial surveys (on which numerical extrapolations will be based) is reported in details by MOHANRAJ et al. [1] and the reader is invited to browse this publication for contextual information regarding the sites and the implemented sampling methods. The number N 0 of collected individuals, the number R 0 of recorded species and the number f 1 of singletons (species recorded only once) are given at Table  1 for each investigated site. As already mentioned, the proportions of singletons (f 1 /R 0 ) recorded at each site are closed to 20%, which denotes substantial incompleteness of each of the three samplings, a common situation in practice, as already underlined. The species lists reported by the authors include the respective abundances of the recorded species, which makes possible to implement the extrapolation of species accumulation and to derive least-biased estimates of total species richness.

Procedure of Selection of the Leastbiased Estimator
A series of nonparametric estimators of the number of still unrecorded species at the end of partial inventories have been proposed, all of them based on the numbers f x of those species already recorded x-times (namely, the numbers f 1 , f 2 , f 3 , … of singletons , doubletons, tripletons, etc…). The more commonly used types of nonparametric estimators are Chao's estimator and the series of Jackknife estimators at increasing orders (JK-1, JK-2, JK-3, etc…). A serious problem arises, however, in practice: each of these different types of estimatorsbeing formulated differently -provides a substantially distinct estimate. No consensus had never been obtained as to which of these estimators would be the more accurate [13][14][15]. Hence, the traditional practice has become to consider together all of them without making any choice [16], an admittedly rather frustrating situation! This unsatisfactory situation has probably largely contributed to still a certain reserve regarding the use of these estimators.
Yet, more recently, BROSE et al. [13,17] rightly suggested that, although none of the available estimators may consistently remain the more accurate, each of them may prove, in turn, being the more appropriate. They further argued that the criteria to select among these estimators might be related to the estimated degree of sampling completeness. Yet, as emphasised by these authors themselves, this quantitative relationship holds only under the explicit restriction of a given theoretical type of species abundance distribution or, at least, under the condition of a given degree of unevenness of the species abundance distribution in the sampled assemblage of species under consideration [18]. Due to this explicit restriction, the (partially distinct) keys successively proposed by BROSE et al. [13,17] cannot benefit from general reliability and undifferentiated applicability, which severely limits their range of practical use. This limitation applies, as well, to other subsequently reported procedures still subordinated to particular types of species abundance distributions, for example: [18,19]. Nevertheless, this explicit restriction of applicability appears to have been often overlooked by end-users of these procedures! In fact, it can be demonstrated (see Appendix 1) that, without any particular restriction, the leastbiased type, within the set of available nonparametric estimators is, simply, this estimator which provides the highest estimate, as compared to the others nonparametric estimators. This, indeed, is the straightforward consequence of the fact that all nonparametric estimators admittedly provide under-estimates of the number of unrecorded species, [11,12,18,20]. So that it is the estimator which provide the highest estimate which is, necessarily, the leastbiased one, among them all.
Selecting this way the least-biased type of estimator thereby provides the best available evaluation of the number Δ of still unrecorded species and, in turn, the best evaluation of the total species richness of the partially sampled assemblage, S t (= R 0 + Δ). In addition, the leastbiased expression for the extrapolation of the species accumulation curve is, in turn, straightforwardly derived (Appendix 2).

Estimation of the Total Species Richness at Each Site
The least-biased type of estimator, selected according to the key in Appendix 1, was JK-5 for both Hare and Vaan islands and JK-4 for Koswari island (Fig. 1). The resulting leastbiased estimates of (i) the number Δ of still unrecorded species, (ii) the total species richness S t and (iii) the sampling completeness R 0 /S t are provided at Table 1 for each of three locations. Table 2 and Fig. 2 allow to compare the leastbiased estimates of the numbers of unrecorded species, as derived here, to the corresponding estimations according to the procedure proposed by BROSE et al. [13]. The discrepancy between both methods is substantial, with gaps ranging from 45% to 71%.  Table 1. The number of collected individuals N 0 , the number of recorded species R 0 , the selected least-biased type of nonparametric estimator, the estimated number Δ of unrecorded species, the resulting estimate of the "true" total species richness S t , the resulting estimated level of sampling completeness R 0 /S t . Estimations according to the least-biased procedure: selection key in Appendix 1.     Fig. 3 predicts the sampling efforts which would be necessary to reach, say, 90% or 95% sampling completeness for Hare, Vaan and Koswari islands (instead of the actually achieved levels 71%, 78%, 72%, respectively, Table 1). Clearly the additional sampling effort required to detect one new species is accelerating very quickly as sampling is going on further, which is logically expected since (i) less and less species remain still unrecorded as sampling is progressing and (ii) those species still remaining unrecorded are -statistically -among the less abundant. Besides, Fig. 4 -a zoom of Fig. 3 focused on the beginning of extrapolationsshows that the slopes of the species accumulation curves may notably differ according to sites, so that the curves may intersect. This is the case, here, for the accumulation curve at Koswari reef, which increases more rapidly than the two others and thus came to intersect the accumulation curve of Vaan at a sample-size around N = 800. Accordingly, the recorded richness of Koswari, which was less than that of Vaan as long as sampling-size N remains less than 800, then exceeds that of Vaan beyond N = 800.
At last, Fig. 5 provides quantitative prediction regarding the marginal additional sampling effort needed to increase by one the number of recorded species.

DISCUSSION
Pristine tropical coral reefs are well known for hosting an exceptionally high level of associated biodiversity. Reefs provide unequaled variety of resources in terms of food and shelter for an incredibly large number of animals, all across size and taxonomical ranges. Shelled macrogastropods just make a subset of this wide variety of animals' forms, with, yet, large biological and commercial importance. This justifies special extra efforts in investigating the gastropods fauna of tropical coral reefs along Indian coast, as elsewhere, because still too few reports are made available on the subject.   Fig. 2).

The Estimated Total Gastropod Species Richness of Each of the Three Investigated Reefs
The selected, least-biased type of estimator of the number of unrecorded species were Jackknife-4 (for Koswari) and Jackknife-5 (for Vaan and Hare). Accordingly, the total species richness amounts to 53, 51, 49 species respectively, for the shelled macro-gastropods associated to the coral reefs along Koswari, Vaan and Hare islands (instead of 38, 40, 35 recorded species respectively). Thus, the three investigated reefs have fairly similar total species richness, around fifty. In turn, this confirms that the three inventories were actually incomplete, with 72%, 78%, 71% completeness levels respectively (Table 1). Now, this local account from coral reefs fringing three small islands should be considered by comparison to the more extensive and ecologically diverse coral reefs of Mannar Gulf as a whole, which encompasses still other types of marine habitats and ecosystems, in particular seagrass habitats and mangrove habitats [2]. According to the general survey by Melkani et al. [21], the entire Gulf of Mannar is host to some 260 species of Gastropods, a figure which, yet, also includes the non-shelled species. On the other hand, a survey of coral reefs fringing two other neighbouring islands, in addition to Koswari and Vaan islands [2], numbers a total of 34 species of shelled macrogastropods, out of which no less than 23 were not listed among the 40 species recorded in the inventories by Mohanraj et al. [1]. Thus, adjoining the two surveys by Mohanraj et al. [1,2] leads to a total recorded species richness of 63. This figure substantially exceeds the average 38 recorded species per individual reef (Table 1), thus suggesting a rather patchy distribution of Gastropod fauna, differing appreciably from one coral reef to another one. Accordingly, with (i) a figure of 38 recorded species per individual reef on average and (ii) a figure of 63 species for 4 reefs pooled together, it follows that a potential of 70 to 80 recorded species seems likely for Mannar Gulf as a whole. At last, accounting for the level of completeness of inventories (around 75%, Table 1), this finally suggests a true total richness around 100 species for the Gastropod fauna associated to coral reefs all across the Gulf of Mannar. This, indeed, highlights the biodiversity conservation value of the "Gulf of Mannar Biosphere Reserve" for this specific fauna as well.
Now, at a methodological view point, it has been noticed that the kinetics of species accumulation notably differs according to sites, so that the species accumulation curves of Koswari and Vaan come to intersect at a given a sample-size (around N = 800). The occurrence of such situations -which are far from being uncommon -highlights the fact that comparisons between non-exhaustive inventories cannot be reliably extrapolated towards the total species richness, even when the sampling-sizes are equal [22]. In particular, this invalidates the yet still currently admitted opinion that rarefaction procedure would guarantee for reliable predictions in terms of total species richness of the compared species assemblages: see also other similar examples [23][24][25].

Predicting the Additional Samplingeffort Required for a Given Improvement of Sampling Completeness
Both (i) the total sampling effort necessary to reach a given improvement of completeness and (ii) the marginal sampling effort for increasing by one the number of recorded species (Figs. 3 and 5 respectively) show that both appreciations of required sampling effort dramatically increase when still higher levels of completeness are targeted. The practical interest of numerical extrapolations is, precisely, to be able to accurately quantify this dramatic increase and, thus, to be able to address rationally the inevitable question: when to "reasonably" stop additional sampling effort?

When to Reasonably Stop Sampling Effort?
Clearly, the answer to this question is a matter of compromise and, as such, a problem debatable, especially when the determinants involved in the question can be appreciated only qualitatively. Hence, the valuable contribution to be expected from quantifying the terms of the balance between the gain (i.e. the expected increase of the number of recorded species) and the cost (i.e. the corresponding additional sampling effort that is required). As just mentioned, the question can be handled two ways, according to either the total or the marginal costs (i.e. the 'integral' or the 'derivative').
In the first case, the criterion to be considered is the ratio, N/R(N), of sampling-size N to the corresponding number R(N) of recorded species, which should not exceed an "acceptable" maximum threshold value, above which the "sampling yield" would be judged insufficient.
In the second option, the criterion to be considered is the derivative, ∂N/∂R (= 1/(∂R(N)/∂N)). This derivative is easily valued in practice, since, according to equation [A.1] (see Appendix 1), it is equal to the ratio, N/f 1 , of the sample size N to the corresponding number f 1 of recorded singletons. Here again, the criterion ∂N/∂R (= N/f 1 ) should not exceed an "acceptable" maximum threshold value above which the "sampling yield" would be judged insufficient.
Let consider, for example, the case of Vaan Island. Figs. 6 and 7 (both derived from Fig. 3) show the variations of the criteria N/R and ∂N/∂R with sampling completeness. In practice, these figures allow to predict the maximum level of sampling completeness that may be reached as a function of the maximum "reasonably acceptable" value for one or the other of these two criteria. That is the level of completeness obtained when sampling has to be reasonably stopped on the rational basis of the minimum "sampling yield" considered acceptable in practice, in the context of study.

CONCLUSION
An accurate procedure of numerical extrapolation of the partial inventories carried out on the Gastropods fauna associated to the three investigated coral reefs in "Mannar Gulf Biosphere Reserve" [1] provided similar leastbiased estimates of the true (total) species richness for each site, that is around fifty species at each site, an appreciably higher figure than suggested by the numbers of species recorded only. The 11 to 15 still unrecorded species (according to sites) call for further sampling in each site, in order to get access to the identities of these still unrecorded species. Yet, since additional sampling efforts required to detect new species unfortunately increase exponentially with further sampling, it is essential to be able to predictively quantify the additional sampling efforts/costs that would be needed as a function of the targeted increase in sampling completeness. Or, conversely, to predict the maximum completeness to be expected from a given granted additional sampling effort. Here also, the implementation of an accurate procedure of numerical extrapolation of species accumulation has proved being an essential tool.
More generally, the accurate estimation of true (total) species richness -using least-biased extrapolation of the species accumulation during progressive sampling [26][27][28][29] -features all the more necessary that species richness tends to be recognized -now again -as the best numerical parameter to qualify local biodiversity [30].  and, more generally, for the Jackknife estimator at order 'i': With C (i, x) as the number of combinations of x items among i (see BÉGUINOT [26] for a demonstration).

* first demonstration
Let first consider as axiomatic the generally recognized fact that nonparametric estimators all provide under-estimates of the true number of unrecorded species, whatever the type of estimator being considered [11,12,20]. It immediately follows that the least-biased estimator is also the one which provides the highest estimate among them all. In turn, this allows to define explicitly, and in all generality, the respective domains within which each estimator performs best. For example, the domain associated to Jackknife JK-3, when the latter actually provides the highest estimate, is such that, within this domain JK-3 > JK-2 and JK-3 > JK-4. Accounting for the respective expressions of JK-2, JK-3, JK4, it immediately follows that this domain associated to JK-3 is defined as follows, in terms of values of the f x : 3f 1 -3f 2 + f 3 > 2f 1 -f 2 and 3f 1 -3f 2 + f 3 > 4f 1 -6f 2 + 4f 3 -f 4 ; that is: which defines the boundaries of the domain where JK-3 provides the least-biased estimate, since the latter exceeds the estimate of the neighbouring estimators.
Extending the same reasoning to other types of nonparametric estimators leads to the following general key of selection of the least-biased estimator, based on the value of the recorded number f 1 of singletons as compared to the recorded numbers f 2 , f 3 , f 4 ,… of doubletons, tripletons, quadrupletons, etc … : This selection key -in terms of values of f 1 compared to f 2 , f 3 , f 4 , f 5 -is, thus, strictly equivalent to the selecting procedure according to which the estimator that provides the highest estimate is the leastbiased one.

* second, alternative demonstration
It is not even necessary to call upon the axiom above (according to which nonparametric estimators all provide under-estimates). The same domains of optimality as those just defined above may be derived, independently of this axiom, by simply accounting for the constraint of continuity at the borders of the domains of optimality respectively associated to each type of estimator. This rule of continuity implies that the values taken by two successive Jackknifes (say JK-[i] and JK-[i+1]) should be equal at the boundary between their respective domains. For example, at the boundary between JK-2 and JK-3, both Jackknifes should be equal: 2f 1 -f 2 = 3f 1 -3f 2 + f 3 , that is, finally, f 1 = 2f 2 -f 3 , at the boundary between JK-2 and JK-3. Similarly, at the boundary between Chao and JK-1, both estimators should provide the same estimate: f 1 2 /(2f 2 ) = f 1 , that is, finally, f 1 = 0.5 f 2 at the boundary between Chao and JK-1. Extending the same reasoning to the other types of estimators leads to the same domains, as defined above, for the values of f 1 compared to f 2 , f 3 , f 4 , f 5 .
Thus, according to this second demonstration, the same key of optimality (in terms of values of f 1 compared to f 2 , f 3 , f 4 , f 5 ) is derived as above, without resorting to the axiomatic preliminary. In fact, this second alternative approach, based on nothing more than the simple rule of continuity, actually provides a demonstration of the "axiom" that was considered as such in the first approach. And this second approach thus provides an alternative independent demonstration that the leastbiased nonparametric estimator is well the one which provides the higher estimate.

* third demonstration
A third, yet less straightforward, demonstration was derived previously [26], leading to the same key of selection of the least-biased type of nonparametric estimator. As a compensation for its substantially longer development, this third demonstration offers the advantage of addressing also two additional points of interest: (i) it proves that the Jackknife series (and Chao only in very specific circumstance) are the only nonparametric estimators, expressed in terms of the f x , which comply with the compulsory rule of additivity, according to which, in an assemblage of species that encompasses several mutually exclusive categories (that is categories that share no species in common; for example taxonomic subsets such as, genus, families, orders, etc…), the estimated number of unrecorded species for the whole assemblage should equal the sum of the estimated numbers of unrecorded species in each of the member categories; (ii) it provides the expressions for the extrapolation of the Species Accumulation Curves respectively associated to each type of nonparametric estimator [26][27], based on the general mathematical relationship that constrains the theoretical expression of any theoretical Species Accumulation Curves R(N) [26,[28][29]: