How to Extrapolate Species Abundance Distributions with Minimum Bias When Dealing with Incomplete Species Inventories

The total number of co-occurring species (“true species richness”) and the way their respective abundances are distributed (“species abundance distribution”) are two major descriptive traits of species assemblages, in numerical terms. Moreover, beyond mere description, the species abundance distribution may help to infer how ecological factors/constraints are currently shaping the hierarchical structure of species assemblages and thereby, may contribute to shed light upon general traits regarding the functional organisation within communities of species. Unfortunately, both total species richness and exhaustive abundance distributions are not available when dealing with more or less incomplete species inventories, a situation which becomes increasingly frequent with the generalisation of the so-called “quick surveys” and “rapid biodiversity assessments”, which are almost unavoidable when addressing very species-rich assemblages, such as, for example, invertebrate communities. Hence, the necessity of extrapolating with minimum bias (i) the species accumulation curve, thereby deriving reliable estimates of the total species richness of sampled assemblage and (ii) the species abundance distribution to get an exhaustive pattern including the Method Article Béguinot; AIR, 13(4): 1-24, 2018; Article no.AIR.39002 2 full set of co-occurring species. Previous reports from the same author already dealt with the leastbiased extrapolation of species accumulation and the associated derivation of total species richness. Now, an appropriate method is proposed, hereafter, to extrapolate with minimum bias the Species Abundance Distribution itself, when having to deal with only partial species inventories. The method shares in part some theoretical results that had already served to support the extrapolation of species accumulation process. The procedure leading to the extrapolation of the Species Abundance Distribution is first detailed in principle and then put into practice utilising a few examples. Improvements as compared to an earlier attempt at the same goal are discussed.


INTRODUCTION
Numerical characterisation of species communities requires, first of all, the evaluation of the total (true) species richness, as accurately as possible. As important as the evaluation of total species richness may be [1,2], this remains, however, insufficient to thoroughly characterise a species assemblage numerically. Assessing also the more or less uneven distribution of species abundances within species communities is an essential complement to the estimation of total species richness [3][4][5][6][7][8][9][10]. This, in turn, can help shed light on the processes at work to shape the internal structuring of species communities. Hence the long-standing interest devoted to the specific shape of the Species Abundance Distribution (the "S.A.D.") -also designed as "Ranked Abundance Distribution" when species abundances are conveniently ranked by decreasing order of values, as will be the case all along the text below.
Thorough representations of "S.A.D.s", including the whole set of species that occur in the studied assemblage of species, is indeed required to get a deeper understanding of the processes involved in the hierarchical structuration of species assemblages [11]. This, however, would imply, first of all, achieving a (sub-) exhaustive inventory of the whole set of co-occurring species within the community of interest. Unfortunately, samplings may rarely be carried out exhaustively in practice, especially when dealing with highly multi-species communities, as is often the case, for example, with invertebrate faunas. Hence, the increasingly frequent recourse to the so-called "quick surveys" or "rapid biodiversity assessments", imposed as a frustrating but inevitable compromise between the desires of thorough investigations on the one hand and the limiting practical constraints on the other hand. Thus, time-limited sampling scenarios often result in sample sizes that are far too small to capture the (sub-) total species richness of the sampled communities [1,7,12].
Numerical extrapolation is a suitable surrogate to compensate for the lack of completeness of sampling, taking account of unrecorded species, at least numerically.
Various numerical extrapolation procedures have been implemented for the estimation of the number of unrecorded species and, thereby, for the evaluation of the total species richness of the studied community, using nonparametric estimators of the number of unregistered species [13,14]. Several attempts to improve the accurateness of estimations have been proposed more recently: see [15][16][17][18][19][20].
On the opposite, the numerical extrapolation of "S.A.D.s" has hardly been addressed so far. Yet, a significant attempt in this direction has recently been proposed by Chao et al. [21], but with some substantial limitations, resulting in particular from the sole recourse to "Chao-1" estimator, known to exceedingly underestimate the number of still unrecorded species in most cases [15][16][17][18][19][20].
Hereafter, I report a new procedure to extrapolate Species Abundance Distributions with minimised bias, based on the prior, least-biased extrapolation of the corresponding "Species Accumulation Curve" [19,20]. The practical application of this new procedure is subsequently illustrated using a series of examples that, moreover, highlight the interest of considering fully extrapolated rather than incomplete Species Abundance Distributions.

A BRIEF DESCRIPTION OF THE NEW PROCEDURE OF LEAST-BIASED EXTRAPO-LATION OF "SPECIES ABUNDANCE DISTRIBUTIONS"
As already underlined by Chao et al. [21], two main steps are to be considered in order to elaborate a relevant and complete representation of the Species Abundance Distribution: 1) first, it is recommended to provide relevant bias corrections to the as-recorded part of the "S.A.D.", by inferring, as accurately as possible, the true abundances of species on the basis of their crude recorded frequencies; 2) then, comes the extrapolation of the missing part of the "S.A.D." (i) by estimating the number of the still unrecorded species and then, (ii) by extrapolating their expected abundance distribution.

Conversion of the as-recorded frequencies into true species abundances
The frequencies of occurrence of recorded species within a sample of finite size provide a biased evaluation of the true abundances of species within the sampled assemblage. This is already intuitive and may be confirmed by numerical simulations. Consider, for the argumentation, a community of species all of them having ideally equal levels of abundance. Obviously, a finite sampling extracted from this community will inevitably lead to some scatter among the recorded frequencies of occurrence of these species, with the range of scatter increasing with decreasing sampling size. Crude recorded frequencies of species occurrence will thus provide biased evaluations of the real proportional abundances of these species in the community. Typically, this bias (i) will show up differences between abundances even when such differences yet do not really exist and (ii) will tend to exaggerate the magnitude of differences of abundances when differences truly exist. So that converting frequencies in true abundances actually requires bias corrections.
According to Appendix 1 (equation A1.14), the estimated true abundance ã i of species 'i', having a recorded frequency p i = n i /N 0 in a sample of size N 0 , is given by: where N 0 is the achieved sample size, R 0 (=R(N 0 )) the number of recorded species, among which a number f 1 are singletons (species recorded only once), n i is the number of recorded individuals of species 'i', so that p i = n i /N 0 is the recorded frequency of occurrence of species 'i', in the sample.
The crude recorded part of the "S.A.D."expressed in terms of the series of as-recorded frequencies p i = n i /N 0 -should then be replaced by the corresponding series of expected true abundances, ã i , estimated according to equation (1).

Extrapolation of the missing (unrecorded) part of the "S.A.D."
The extrapolation of the "S.A.D." beyond the (previously bias corrected) recorded part, i.e. for ranks i > R(N 0 ), involves two complementary aspects: i. the least-biased estimation of the number Δ of still unrecorded species, ii. the estimation of their abundance distribution, thereby extrapolating the "S.A.D." beyond its recorded part until reaching its full completeness.
As regards point (i), a relevant approach has now become instrumental for the extrapolation of the species accumulation curve. Applying this method allows, in particular, to select the leastbiased type of estimator of the number Δ of still unrecorded species, among the large set of now available types of estimators [19,20].
As regards point (ii), several options are possible: ** Option n° 1: this is the simplest option which consists, following Chao et al. [21], in simply assuming a uniformly log-linear shape of the "S.A.D." all along its extrapolated part. Given the number Δ of unrecorded species and the fact that the cumulated abundances of unrecorded species is equal to A u(N) = f 1 /N 0 (Appendix 1, equation (A1.5)), the uniform slope of the loglinear extrapolation of the "S.A.D." is thereby entirely determined.
** Option n° 2: the extrapolation of the "S.A.D." is now "articulated" in two successive parts; this offers better opportunities to approach a little more realistically the various possible shapes of "S.A.Ds." towards their end. Indeed, beyond their archetypal, grossly log-linear shape [8], "S.A.D.s" most often display, in details, a large range of variations in shape. In conventional, logtransformed abundance representation, one may roughly distinguish three main categories [11]: (i) more or less symmetric sigmoidal shapes as in Preston "log-normal" or MacArthur "broken stick" distributions, (ii) shapes remaining more or less (sub-) log-linear as in geometric-or log-series and (iii) shapes consistently retaining, all along, a positive curvature (power law models). In any cases, an articulated, two-part extrapolation may comply more easily with such variations of shape towards the end of the "S.A.D.".
In practice, let Δ 1 and Δ 2 (with Δ 1 +Δ 2 = Δ) be the respective extents of the two successive stages of the extrapolation (respective ranks [R 0 +1 to R 0 +Δ 1 ] and [R 0 +Δ 1 +1 to R 0 +Δ]). These two successive stages may be either: *option n° 2.1: both log-linear, with different slopes s 1 and s 2 , that is a i /a i+1 = s 1 and then a i /a i+1 = s 2 ; the first slope, s 1 , being chosen to match the slope of the end of the recorded part of the "S.A.D."; *option n° 2.2: a log-linear first-part, with the slope s 1 (once again chosen to match the slope at the end of the recorded part of the "S.A.D.") is then followed by an incurved second-part along which the slope is consistently increasing in module (as in log-normal or broken-stick models) or decreasing (as in Zipf's models). For this second part, an appropriate expression may be of the type (a i /a i+1 ) = s 1 .(1+(i -i 0 ) a ), with i 0 as the species rank at the beginning of the second stage (i 0 = R 0 +Δ 1 +1) and the exponent 'a' being positive, for example a = 3.
Let focus a little further upon this more flexible and adaptable option. Four parameters characterise this type of extrapolation, namely: the numbers Δ 1 and Δ 2 of species involved in each of the two successive stages, the slope s 1 of the log-linear first part and the rate of variation of the slope during the second stage (related to 'a'). In turn, four parameters are thus necessary to determine the shape of the extrapolation; these will be: i. the least-biased estimates of Δ, which constrains the sum Δ 1 +Δ 2 ; ii. the recorded ratio f 1 /N 0 which constrains the cumulated abundances of the Δ unrecorded species, according to TURING relationship (see equation (A1.5) in Appendix 1); iii. the slope at the end of the recorded part of the "S.A.D.", to which the slope s 1 of the first part of the extrapolation is expected to conform in order to respect the continuity of the first derivative in theoretical "S.A.D.s"; iv. the estimated abundance a min (= a (Ro+Δ) ) of the last, rarest unrecorded species.
The estimation of this last parameter proceeds from equation (1) with N f as the sample size when the last species is just being recorded for the first time. N f is obtained from the least-biased extrapolation of the species accumulation curve R(N) ( [19,20]; see also Appendix 1 for the expression of this extrapolation). In practice, as the species accumulation curve reaches the last species asymptotically, we follow the convention suggested by CHAO and coworkers [22]: N f is the computed sample-size which allows to reach total species richness minus 1 or 0.5 (i.e. R(N f ) = R 0 + Δ -1 or 0.5).
In practice, as N f >> R 0 and N f >> f 1 , it comes: ** Option n° 3: a third (and quite preferred) alternative solution to relevantly extrapolate "S.A.D.s" takes advantage of the prior, leastbiased extrapolation of the species accumulation curve itself [19,20]. This, approach features particularly relevant since the rate of species accumulation along progressive sampling is directly dependent upon the distribution of species abundances in the sampled assemblage of species [12,21]. Indeed, consider the species 'i' of rank 'i' in the "S.A.D." and let N i be the sample size when this species is detected at first during progressive sampling. At sampling size N i , the number n i of individuals of species 'i' is thus n i = 1 and species 'i' is then assigned a frequency p i = 1/N i .
Then, according to equation (A1.14) in Appendix 1, it comes: that is: with N i defined by R(N i ) = i. This equation provides the extrapolated distribution of the species abundances a i (for i > R(N 0 )) as a function of the extrapolated species accumulation curve R(N) (for N > N 0 ), with 'i' being equal to R(N i ). The expression of R(N) to be selected is provided at Appendix 2.
Nota -In actual "S.A.D.s", the abundances a i are expressed as ratios of integers, thus giving "S.A.D.s" a discontinuous, "staircase-like" shape. In particular, the lowest level of recorded abundance (that is when n i = 1) is represented by a step often comprising not only one species but a number usually > 1, since the number f 1 of singletons may often exceeds 1. The connection in continuity between the end of the recorded part of the "S.A.D." and the beginning of the extrapolated part is thus located at i = R 0 -f 1 /2 (rather than i = R 0 , see Figs. 5 and 8); therefore, equation (4) is standardised accordingly.
At last, from a more "heuristic" point of view, it should be noted that equation (4) clearly highlights the tight articulation that exists between: 1) the "Species Abundance Distribution" [i.e. the species abundance a i as a function of the species rank i : a i = a(i) ] and 2) the "Species Accumulation Process" [i.e. the sampling size N i when the species of rank i is expected to be first detected : i = R(N i )]; finally leading to the linkage a i = a(i) = a(R(N i )), detailed by equation (4).

What can be reasonably learned from Species Abundance Distributions
Beyond their purely descriptive contribution, it is usually expected from the "S.A.D.s" some additional insights on the procedural pathways that makes the corresponding assemblage of species hierarchically structured as it actually is, in terms of abundances distribution.

* comparison to classical "S.A.D.s" models
A common practice consists in trying to select, among a series of referential models, which of them looks closest to the studied "S.A.D." A large -and steadily increasing number of referential models (see [23,24]) -is currently available.
Most of these models often seem, however, more or less equally appropriate, for the bulk of empirical "S.A.D.s" [23]. For example, similarly high correlation coefficients, comprised between 0.90 and 0.94, are reported by ALROY [24] when 1055 empirical "S.A.D.s" are tested against each of four classic models, namely: geometric series, double-geometric series, log series, log-normal (although these models respectively refer to quite different causal mechanisms!). Strong disagreements may thus occur among the resulting interpretations, see for example the sharply different points of view that oppose Baldridge et al. [25] to either McGill et al. [23] or Alroy [24]. Indeed, this somewhat confusing situation has already been emphasised previously [8]. This especially holds true when having to deal with incomplete surveys.
An alternative or comparative avenue would consist to compare in quantitative terms -rather than trying to identify -the studied "S.A.D" with an appropriate model that may serve as a "null" model, having explicit simple significance. In this perspective, models referring to either strictly or statistically even abundances distributions would feature particularly adequate. The purpose being, here, to characterise (i) the degree of unevenness of abundance distribution as a whole and, moreover, (ii) to evaluate the respective contributions to unevenness of such or such particular species. Two types of "null" models can fairly well match these objectives: -a basically deterministic model, the trivial "ideally strictly even abundance distribution" {e i }, with abundance e i of the i th species (labelled 'e' for 'even'), defined, independently of rank 'i', as: with S t as the total species richness of the assemblage of species (previously derived by least-biased extrapolation); -a basically stochastic model, the MacArthur "broken-stick" distribution {r i }, expected to provide the stochastic outcome issued from the randomisation of an ideally even distribution of abundances (that is, in practice, the statistically random apportionment of abundances values between all the co-occurring species [26]: with r i as the abundance of the i th species (labelled 'r' for 'random') and the summation Σ being extended from n = i to n = S t . The "brokenstick" distribution had already been suggested as an appropriate "null" by WILSON [27].
The degree of unevenness of any empirical "S.A.D." is, of course, always greater than the zero unevenness of the deterministic "strictly even" model, while it may be either larger, equal or inferior to the unevenness level of the corresponding "broken-stick" model, depending on whether the structuring process at work in the species assemblage has a stronger, equal or weaker influence than has the randomisation process involved in the "broken-stick" model. In this respect, the "broken-stick" model may be considered a more suggestive and interesting referential model than the deterministic "ideally even" model (but in fact both approaches are complementary).
Note that both "null" models require the previous knowledge of the total species richness St (equations (5) an (6)). This is a second strong reason to implement extrapolations of "S.A.D.s".

*comparison between two or several "S.A.D.s"
Two or several "S.A.D.s" may also be compared directly, by considering in each "S.A.D." the abundances of species having a same rank 'i' (or for a same range of ranks) in each of compared "S.A.D.s". Yet, if the "S.A.D.s" to be compared come from species assemblages that substantially differ by their respective species richness, the direct comparison between abundances may become somewhat irrelevant and can suggest "misleading conclusions" [11]. This is because, in such case, a trivial contribution from species richness, of purely numerical order, inopportunely adds to the direct influence of ecological factors upon the distribution of species abundances. Indeed, there is an unavoidable trend for species dominance to decrease when total species richness increases; the dominance tending to be somewhat "diluted" by the number of co-occurring species [11,28,29]. This trend -and its essentially numerical rather than biological origin -are clearly exemplified (i) by the inversely proportional decrease of the flat level of abundances in the "ideally even abundance distribution" and (ii) by the decrease of the average steepness of the "broken-stick" distribution, when species richness S t increases (see equations (5) and (6)). And it is precisely why, both "null" models can serve to cancel the influence of this non-biological trend, when "S.A.D.s" issued from communities having substantially different species richness are to be compared.
Therefore, to ensure relevant comparisons in practice, the respective species abundances of compared "S.A.D.s" should be rationalised by reference to one or the other of the two "null" models: the accordingly "rationalised" abundance at rank 'i' is then identified to the ratio: a i /e i or a i /r i (see below).

*synthetic indices to reflect the intensity of structuration within species assemblages
The distribution of species abundances in a community may be understand either in term of: -pattern: the "S.A.D." being, by itself, the complete and detailed description of the internal structuring of the assemblage; -process: the relative abundance of each species being, then, considered as reflecting the species relative "performance" in the particular context of the assemblage. "Performance" being understand, here, sensu latissimo, that is encompassing the factors of all kinds which together contribute to increase (or decrease) the relative abundances of each species: these factors may be, for some of them, intrinsic to the species (its own capacities facing the ecological and syn-ecological context within the assemblage) and, for some others, opportunistic or stochastic (depending, in particular, upon the historical and environmental context which contribute also to the actual structuration): see schematic sketch in Fig. 1. > Indexation per species: quantifying the relative "performance" of a species in particular The degree of "performance" of each species makes full sense when compared to either the "ideally even distribution" {e i } or the randomly apportioned abundance distribution ("broken stick") {r i }, giving rise to the two following indices, respectively: IPe = a i /e i IPr = a i /r i Testing the statistical significance of the index IPe, comes down to test the significance of the gap between the true abundance, a p i .(1+1/n i )/(1+R 0 /N 0 ).(1-f 1 /N 0 ) (see equation (1)) and the reference value e i = 1/S t (equation (5)). Which finally amounts to compare frequency of occurrence p i = (n threshold (1/S t )/[(1+1/n i )/(1+R 0 /N 0 ).(1 Similarly, testing the statistical significance of the index IPr, comes down to test the significance of the gap between true abundance, a p i .(1+1/n i )/(1+R 0 /N 0 ).(1-f 1 /N 0 ) and value r i = (1/S t ).Σ(1/n) (equation (6)). Which finally amounts to compare the recorded frequency of occurrence p i = (n Schematic sketch showing how the relative "performance" -sensu latissimo assemblage of species, depends upon both the historical and ecological contexts which are peculiar to this assemblage Accordingly, representative indices may address rticular or the Indexation per species: quantifying the relative "performance" of a species in particular The degree of "performance" of each species makes full sense when compared to either the or the randomly apportioned abundance distribution ("broken-}, giving rise to the two following indices, Testing the statistical significance of the index IPe, comes down to test the significance of the true abundance, a i = ) (see equation (1)) (equation (5)). the recorded (n i /N 0 ) to the ).
Similarly, testing the statistical significance of the index IPr, comes down to test the significance of true abundance, a i = the reference ).Σ(1/n) (equation (6) These two kinds of composite indices are either > 1 or < 1, depending on whether the species abundance a i is larger or smaller than the corresponding abundance r i in the "broken model.
Moreover, the second composite index, IPC2, equals zero if the species abundance a to the ideally even abundance e i makes this index being scaled, profiting "zero" threshold level and by the definition of a "unit", equal to the difference (r i -e randomly apportioned abundance distribution and the ideally even abundance distribution.
> Indexation relative to the whole assemblage: quantifying its relative degree of structuration Once again, this indexation makes full sense when compared to either the "ideally even distribution" {e i } or the "broken-stick distribution" {r i }.
/e i -1) (10) These two kinds of composite indices are either > 1 or < 1, depending on whether the species is larger or smaller than the in the "broken-stick" Moreover, the second composite index, IPC2, equals zero if the species abundance a i is equal (=1/S t ), which profiting both by a by the definition of a e i ) between the randomly apportioned abundance distribution and the ideally even abundance distribution.
Indexation relative to the whole assemblage: quantifying its relative degree of structuration Once again, this indexation makes full sense when compared to either the "ideally even stick distribution" Two complementary factors may be considered: i. the number of species, s e or s r , whose abundances exceed the corresponding abundance in either the "ideally even" or the "broken-stick" models respectively, ii. the average values, IPe* or IPr*, of the indices of performance IPe or IPr (defined above), for each of these s e or s r species respectively.
A composite index may be derived accordingly: Ir = s r .IPr* The steepness of the decreasing abundance slope, along all or part of the "S.A.D.", also offers a complementary synthetic characterisation.
Comparisons may be simply carried out between slopes as such. But, as already emphasised, slopes may advantageously be previously standardised to the slope of the corresponding "broken-stick" (or "ideally even") distribution, so as to cancel the direct contribution of total species richness and, thereby, highlight separately the influence of all the other parameters determining the species assemblage structuration.

Testing the statistical significance of differences between "S.A.D.s" or between "S.A.D." and a referential model
The statistical significance of differences between recorded abundances, as well as the statistical significance of the indices derived above, can be tested using conventional statistical methods. Yet, the Bayesian inference approach (see equation A1.7 in Appendix 1) now offers an improved way to conduct accurate tests.

PRACTICAL IMPLEMENTATION OF THE NUMERICAL EXTRAPOLATION OF THE SPECIES ABUNDANCE DISTRIBUTION
In accordance with the mainly methodological objective of this contribution, the following examples are, first of all, intended at illustrating the practical implementation of the procedure of extrapolation of "S.A.D.s" described above. That means that the resulting ecological and biological implications pertaining to each of the two examples below, as interesting as they are, will be treated elsewhere.
Among the different options of treatments provided above, the option n° 3 is duly selected in these examples for its better accurateness. Indeed, option n° 3 takes advantage of the leastbiased extrapolation of the corresponding species accumulation curve, serving as the steering guide to the extrapolation of the "S.A.D.s" itself.

Example 1: partial inventory of butterfly fauna at "Manas Range Park" (Bhutan)
The first example relates to a subtropical butterfly community at "Royal Manas Range National Park" (Bhutan), partially surveyed by Nidup et al. [30]. Based on the reported field data (a list of R 0 = 91 recorded species including their respective abundances issued from sampling of N 0 = 1319 individuals), an extrapolation of the Species Accumulation Curve was computed, after selection of the least-biased type of estimator of the number of still unrecorded species: in this case, the 'Jackknife-5' estimator, leading to an estimated number of 28 unrecorded species. The total species richness of butterfly fauna in the sampled ecosystem at "Manas Range" is thus evaluated at 119 species: 91 recorded + 28 unrecorded (resulting completeness level of the inventory: 76%). The results above, as well as the extrapolated Species Accumulation Curve were derived in [31].
Based on this prior extrapolation of the Species Accumulation Curve, the extrapolation of the "S.A.D." is subsequently obtained, by applying equation (4).
The completed "S.A.D.", including the leastbiased extrapolation (ranks i = 92 to 119), is provided in Figs. 2 to 5. While Fig. 2 is according to classical representation, using log-transformed abundances, the following figures comply with the convention of representation originally adopted by MACARTHUR [26], involving untransformed (rather than log-transformed) species abundances, a representation which provides a more straightforward visual appreciation of the relative abundances.
Note that, restricted to its as-recorded part, the shape of the "S.A.D." would likely comply with a "log-series" distribution ( Fig. 2) leading to assume that only one (or, at most, very few) major factor(s) are expected being at work to shape the abundance structuration of the butterfly assemblage. Now, considering the whole "S.A.D." -fully completed thanks to extrapolation -the pattern of distribution and the associated conclusion will strongly differ -hence the importance of implementing reliable numerical extrapolation. The completed distribution (Fig. 2) shows a sigmoidal shape, slightly dissymmetric (as is also, for example the "broken-stick") which now looks closer to a "lognormal" distribution, but slightly dissymmetrically skewed. This is more in favour of a multiplicity of mutually independent factors involved together in the process of hierarchical structuration of the community [3].
Interestingly, Ulrich et al. [11] also emphasised the fundamental importance of distinguishing between fully censused and incompletely sampled communities, when trying to provide a relevant interpretations of "S.A.D.s". Also, they reported that "S.A.D.s" issued from completely censused animal communities often tend to follow the "log-normal" model.

the practical computation procedure briefly reviewed step by step
The prior least-biased extrapolation of the Species Accumulation Curve for the butterfly assemblage at Manas Range Park, reported in detail in [31], provides all numerical data necessary [ N 0 = 1319, R 0 = R(N 0 ) = 91, f 1 = 17 and the least-biased expressions of the extrapolated species accumulation R 5 (N) and its first derivative ∂R 5 (N)/∂N)] to proceed, in turn, with the corresponding least-biased extrapolation of the Species Abundance Distribution. This data is subsequently introduced:  in equation (1), which provides the bias-corrected estimates of abundances for the already recorded part of the "S.A.D." ;  in equation (4), which provides the leastbias extrapolation of the abundance distribution of the still unrecorded species.
Figs. 2 to 5 provide the graphical expressions of the results derived from equations (1) and (4).

Example 2: partial inventory of butterfly fauna at "Sankosh River catchment" (Bhutan)
This second example relates to a tropical butterfly inventory at "Sankosh River catchment", partially surveyed by SINGH [32]. Based on the reported field data (a list of R 0 = 213 recorded species including their respective abundances issued from sampling of N 0 = 1731 individuals),     Fig. 6, with expanded scale for abundances, providing a more easy reading of the following part of the "S.A.D." Note that the extrapolated part (ranks 214 to 281) definitely supports the expectation that the unrecorded species abundances distribution actually continue staying slightly lower than the "broken-stick" model, as initiated from rank ≈ 20.

Making relevant comparisons between species abundance distributions, issued from different species assemblages: comparing "Manas Range" and "Sankosh River"
The butterfly assemblages considered above, at "Manas Range" and "Sankosh River", markedly differ as regards their total species richness, with respectively S t = 119 and S t = 281. The (recorded) abundances of, say, the ten most abundant species (i = 1 to 10) in each assemblage are plotted in Fig. 9. On average, apart from some differences rank by rank, the abundances in both assemblages are relatively similar. But, in fact, the comparison is appreciably biased by the substantial difference of true species richness S t existing between the two assemblages, as already underlined above. Thus, in the comparison, the abundances at "Sankosh River" (twice as species-rich as "Manas Range") might be considered as "disfavoured" due to the highest number of cococcurring species at "Sankosh". Accordingly, to put aside the effect of difference in total species richness, abundances can be rationalised by reference to an appropriate "null" model, the latter opportunely taking account of the specific contribution of total species richness alone. The rationalisation of abundances thus makes it possible to identify and evaluate separately the respective contributions to the hierarchical structuring of species abundances of (i) the total species richness and (ii) the bulk of other ecological factors, which, they, are of specific interest.
The rationalised abundances values Ipr (see equation (8)), computed by reference to the "broken-stick" model are plotted in Fig.10: Ipr = a i /r i = a i /[(1/S t ).Σ (1/n)], with S t = 281 for Sankosh and S t = 119 for Manas. This rationalisation of abundances reveals that the processes at work in both assemblages, leading to the hierarchical structuring of abundances, are considerably stronger (≈ 3 times) at "Sankosh" than at "Manas", once deduced the specific effect of the difference of species richness. This, indeed, could not have been suspected from looking at the crude data from Fig. 9 alone. Hence the interest of this rationalisation with respect to "null" models.
The rationalised abundances values Ipe (see equation (7)), by reference to the other "null" model, the "ideally even abundance distribution" (Ipe = a i /e i = a i /(1/S t )), are plotted in Fig. 11. The highlighted trend remains quite similar to those derived from rationalisation to "broken-stick" model. "rationalised" species abundance species rank i Manas Sankosh Fig. 10. The normalised abundances of the ten first (most frequent) species in the butterfly assemblages at "Manas Range" and at "Sankosh River" (Bhutan), after rationalisation of the corresponding abundances by reference to the "broken-stick" model. The rationalisation relevantly cancels the specific contribution of the difference of species richness (twice larger at "Sankosh River") Fig. 11. The normalised abundances of the ten first (most frequent) species in the butterfly assemblages at "Manas Range" and at "Sankosh River" (Bhutan), after rationalisation of the corresponding abundances by reference to the "ideally even abundance distribution" model. The rationalisation relevantly cancels the specific contribution of the difference of species richness (twice larger at "Sankosh River")

DISCUSSION
Species richness is often considered as the major numerical parameter dedicated to characterize a community of species [34][35][36][37]. Yet, an ecological community is not simply a collection of species, the number of which would suffice to summarize all that can be said about this community. The particular pattern of the species abundance distribution within the community admittedly conveys a great deal of additional information about its internal structure and functionality [3][4][5][8][9][10]38]. This is why incomplete "S.A.D.s", resulting from only partial inventories, would remain deprived of a significant part of valuable data, unless they are properly extrapolated. And, first of all, a reliable evaluation of the number of still unrecorded species and, thereby, a reliable estimation of the total species richness of the focused assemblage of species is needed.

A relevant procedure of estimation of the number of unrecorded species is a prerequisite to the appropriate extrapolation of "S.A.D.s"
As mentioned in Introduction, Chao et al. [21] have already derived a method to extrapolate Species Abundance Distributions. The procedure advocated by these authors relies upon two main assumptions: 1) that Chao 1 estimator of the number of still unrecorded species provides relevant estimates of the true value, which, indeed, supposes either: (i) unusually even distributions of species abundances [18,39], which, unfortunately, almost never occurs in practice and / or (ii) species inventories having already reached a level of completeness close to exhaustivity [15][16][17]. But, here also, this second requirement is quite difficult to satisfy in practice, at least with highly species-rich assemblages which, most often, are the subject of only "rapid surveys", thus remaining substantially incomplete. Table 1 shows more precisely that "Chao" estimator becomes appropriate only when sampling completeness reaches 95% at least, a scarcely reached level in common practice. Brose et al. [15,16] even went so far as to discard the "Chao" estimator whatever the levels of sampling completeness, substituting Jackknife-1 estimator at highest completeness levels. Improper selection of the type of estimator generally results in considerable bias in the estimation of the number of unrecorded species, as shown, for example, in Fig. 12.
As the reliable estimation of the number of unrecorded species is crucial for the relevant computation of the extrapolation of "S.A.D.s", the arguments above should draw attention to what could well be a severe limitation of the range of application of the method proposed by ChaO et al. [21], as, indeed, suggested by the authors themselves, in a somewhat ambiguous manner: only "when sample size is large enough, this lower bound approaches the true number of undetected species" [21, p. 1195].
2) that the extrapolated part of the "S.A.D.s" follows a log-linear trend which, although qualified of "natural", is far from being adequate in most circumstances [3,23]. Indeed, models commonly taken as reference, such as lognormal, broken-stick, double-geometric series, are all ending by a more or less pronounced downwards curvature in the log-transformed representation. Table 1

CONCLUSION
When dealing with partial inventories -as it becomes most often the case in practice -it is highly desirable trying to extrapolate the Species Abundance Distribution beyond its uniquely recorded part. As discussed and illustrated above, completing Species Abundance Distributions by proper extrapolation has, indeed, major implications in both descriptive and functional perspectives (pattern and process).
A first attempt in this direction was achieved by Chao et al. [21], but yet remains likely reserved to the too scarce species inventories already enjoying "large enough sampling size", that is, in fact, with completeness level 95% at least.
Otherwise, when sampling completeness is less than ≈ 95% -which, indeed, encompasses the great majority of cases in practice -another alternative approach should be considered, implying: i.
to select first of all, in each case, the leastbiased type among classically available types of nonparametric estimators of the number of unrecorded species; ii. then, to compute the least-biased estimation of the number of still unrecorded species and derive, accordingly, the corresponding leastbiased extrapolation of the Species Accumulation Curve, associated to the selected estimator; iii.
at last, to derive, from the latter, the related extrapolation of the Species Abundance Distribution, which will thereby benefit from a minimised level of bias.

Preliminary: the sum of the abundances of the unrecorded species
According to the two first equations of Appendix 1 in [19,20,42]: -the expected number Δ (N) of still unrecorded species in a sample of size N is: where p i is the proportional abundance (identified to the probability of drawing during sampling) of species 'i' and Σ i is the summation extended to the totality of the 'S t ' species 'i' present in the sampled assemblage; -the expected number f x of species recorded x times in a sample of size N, is: and accordingly, for x = 1, the expected number f 1 of singletons is: Now, the expected value A u(N) of the cumulated abundances of the Δ (N) still unrecorded species in a sample of size N is: Accordingly, As, in practice, samplings of interest have sizes N (N recorded individuals) considerably larger than the species richness of the sampled assemblage, the difference between Δ (N) and Δ (N-1) is quite negligible, as also negligible is the difference between A u(N) and A u(N-1) , so that, with a very good approximation: A u(N) = f 1 /N (A1.5) Accordingly, the cumulated abundances, A r(N) , of the already recorded species is the complement to 1 of A u(N) , that is: This general relationship was originally derived by A. TURING [43].

Correction to be applied when estimating the true abundance of species, based on the corresponding recorded frequency of occurrence, in a sample of finite size
Consider a sample of size N (N recorded individuals) with R(N) recorded species among which a number f 1 of them are singletons (species recorded only once). Let p i = n i /N be the frequency of occurrence of species 'i', and let 'a i ' be the true proportional abundance of this species in the sampled community. A way to evaluate the bias of the recorded frequency p i relative to the corresponding true abundance a i is to consider a Bayesian inference based on the binomial distribution. Accordingly, the probability '∂π i ' that the abundance of species 'i' is comprised between 'a' and 'a + ∂a' is: The probability π i (a) reaches its maximum (modal) value for a = p i (= n i /N), as is easily demonstrated. And the average value ã i of a i , which will be considered as providing the least-biased evaluation of the true abundance a i ,, is computed as follows.
In a first step, and accounting for equation (A1.7), ã i is identified to: It then follows that a standardisation coefficient (1 -f 1 /N)/(1 + R/N) is to be applied to the preceding first step evaluation of ã i . In the frame of Bayesian approach, this standardisation coefficient corresponds to the initial setting of the so called "probabilities a priori"). The bias correction applied to p i , to obtain the true abundance estimates ã i , thus includes: (i) the correction (1+1/n i )/(1+R/N) for the bias resulting from the finite size N of the sample, a bias which cancels, as expected, when N (and thus also n i = N.p i ) tend to infinity; (ii) the correction (1-f 1 /N) resulting from the existence of the set of still unrecorded species, which cancels, as expected, when sampling reaches exhaustivity, that is when f 1 is falling down to zero.
Note that the estimated true abundances are less scattered than are the recorded frequencies.