On general mathematical constraints applying to the kinetics of species discovery during progressive sampling: consequences on the theoretical expression of the Species Accumulation Curve.

The “Species Hereafter, I first derive these correlative relationships and then I show how they link together the variations of the numbers of species respectively recorded 1-, 2-, 3- …, x- times and their cumulative contributions to the Species Accumulation Curve. This, in turn, provides suggestive insights regarding the remarkably regulated mechanism of species discovery and accumulation during progressive sampling effort.


INTRODUCTION
The process of continuous discovery of new species during progressive sampling of an assemblage of species is expressed graphically in term of the so called "Species Accumulation Curve", also formerly designed as "Discovery Curve" or "Collector Curve" [1,2]. The Species Accumulation Curve is the basic tool which is systematically referred to when dealing with inventories of biodiversity [2][3][4][5][6][7][8].
Species Accumulation Curves are quite polymorphic, apart from some common basic and intuitive traits shared by all of them (monotonic increase of the number of recorded species with sampling size, at consistently decreasing rate, see Fig. 1 for an example).This polymorphism of the detailed shape of the Species Accumulation Curves results from its narrow dependence upon the particular species abundance distribution within the sampled assemblage of species under consideration. Accordingly, there are virtually as many different shapes of Species Accumulation Curves as there are species assemblages differing from each other by either their species richness and/or their particular distribution of species abundances. In spite of these causes of polymorphism, the theoretical expressions of all Species Accumulation Curves are compelled to satisfy a common constraining mathematical relationship which applies to the whole series of its successive derivatives. This constraining relationship explicitly determines the boundaries of the yet wide range of polymorphism mentioned above for the Species Accumulation Curves. On a more practical point of view, accounting for this constraining relationship also has major importance to improve the accuracy of extrapolations of the species accumulation process beyond actually achieved sampling. Thereby, more precise estimations of total species richness and more reliable predictions of the additional sampling effort needed to achieve a given increase in sample completeness are made possible (details in reference [9]). Now, coming back to more theoretical ground, several corollaries which can be derived from this fundamental relationship also provide useful insights into the details of the complex process of species discovery during progressive sampling.
Let R(N) be the number of recorded species after sampling of N individuals (N thus quantifies the sampling size). Obviously, R(N) results from the additive contributions of the numbers f 1 (N), f 2 (N), f 3 (N),…, f x (N),… of those species respectively recorded 1, 2, 3, .., x-times at the end of this sampling of size N: Thereby, the Species Accumulation Curve reveals its "composite" dependence upon the whole series of the f x (N). A composite dependence which is made still more complex by the fact that each function f x (N) has its own dependence upon N. Yet, this mutual independence of the f x (N) is not total: a kind of regulation links, step by step, the respective variations of the successive functions f x (N), as will be shown later. This regulation, indeed, is at the hearth of the mechanism of species progressive discovery and accumulation, which plays, of course, a decisive role in shaping the Species Accumulation Curve.
The main purpose of this article is precisely to highlight the mathematics underlying this regulation by mutual linkage between the successive f x (N). This, in turn, will provide a more deep understanding of the fundamentals of Species Accumulation process during progressive sampling.
Indeed, deriving the mathematical constraints that actually regulate the theoretical expression of any Species Accumulation Curves along progressive sampling is, obviously, of prime importance, not only at the theoretical level but also at more practical points of view. In particular, accounting for these mathematical constraints is necessary to reliably extrapolate the Species Accumulation Curve beyond the actually achieved sampling size of uncomplete species inventories. Extrapolation makes it possible to accurately estimate the total species richness of only partially-sampled species assemblages and, also, to properly predict the level of additional sampling effort needed to improve the degree of sampling completeness. And this is all the more important, in practice, that dealing with incomplete inventories is now fast becoming a fairly general issue for an increasing part of local or regional biodiversity surveys worldwide, as more and more speciose and complex taxonomic groups are progressively addressed.

The Fundamental Mathematical Relationship
Constraining the Theoretical Expression of all Species Accumulation Curves The successive derivatives ∂ x R(N)/∂N x , of the Species Accumulation Curve R(N) satisfy the following equation: where f x (N) is the number of species recorded x-times in the sample of size N and C N,x = N!/x!/(N-x)! is the number of combinations of x items among N. A detailed proof of this general theorem is provided in Appendix.
Leaving aside the very beginning of sampling (of no practical relevance here), the sampling size N rapidly widely exceeds the numbers x of practical concern, so that, in practice, the preceding equation simplifies as: This relation has a general relevance because its derivation does not require any specific assumption relative to the particular shape of the distribution of species abundances in the 4 sampled assemblage of species. Accordingly, equations (2), (3) actually constrain the theoretical expression of any kind of Species Accumulation Curves.
One particular consequence of this relationship is that the successive derivatives of the Species Accumulation Curve have alternating signs, since the numbers f x(N) of species recorded x-times are necessarily positive or nil. More precisely, the derivatives of even and odd orders are respectively negative and positive.

THE MATHEMATICS UNDERLYING THE REGULATION PROCESS APPLYING TO THE NUMBERS f x OF SPECIES RECORDED x-TIMES
From equation (3) it comes: The derivation of equation (4) according to sample size N then gives: Note that an alternative, independent demonstration of equation (6) is provided at Appendix A.2, equation A2.1.
Being a corollary of relationship (3) above, equation (6) thus benefits from the same general relevance and, thus, is valid for all kinds of Species Accumulation Curves. Equation (6) establishes a mathematical linkage between the variations of f x+1 (N) with N and the variations of f x (N) with N. Thereby, all the f x(N) are ultimately linked together by this "iterative chaining". In other words, although each function f x (N) has its own dependence upon sampling size N, the series of f x (N) nevertheless admits a kind of connection which, one may say, "propagates" from each f x (N) to the next one, f x+1 (N) .

Mathematical "Chaining" between the Successive Numbers f x (N)
The consequence of this regulation may be more easily grasped graphically, by considering how the maximum of each f x (N) is linked to the value taken by f x+1 (N) at the same sample size N. When f x(N) reaches its maximum value, its first derivative, ∂f x(N) /∂N, falls to zero and, accordingly, from equation (6), it comes: Thus, when f x(N) reaches its maximum, in the course of progressive sampling, the corresponding value taken by f x+1 (N) is then exactly [x/(x+1)] times the (maximum) value taken by f x (N) . By reiteration of this relationship, a kind of "linkage pattern" is generated, that constrains the relative locations of the successive curves f x(N) .

Mathematical "Chaining" between the Successive Numbers x.f x (N)
Alternatively, equation (7) may be written equivalently as: Equation (8), as equation (7), stands for ∂f x (N) /∂N = 0, and thus stands as well for ∂(x.f x (N) )/∂N = 0. It follows that the curve (x+1).f x+1 (N) intersects the curve x.f x (N) exactly when the latter reaches its maximum value (i.e. when ∂(x.f x (N) )/∂N = 0) : Fig.  4. Keeping in mind the significance of x.f x (N) , which is the total number of recorded individuals belonging to any one of those species recorded x-times. The regularly repetitive shift from any one curve, x.f x (N) , to the next one, (x+1).f x+1 (N) , resulting from this regulating process ( Fig. 4) is particularly demonstrative. This, indeed, likely offers the best visual evidence of the sequential linkage existing between each of the numbers f x(N) successively.
Note, incidentally, that while the cumulative addition of all the f x (N) leads to the number R(N) of recorded species (cf. equation (1)) ; on the other hand the addition of all the x.f x(N) leads "symmetrically" to the number N of recorded individuals:

Mathematical "Chaining" between Each f x (N) and the Series of the First Derivatives of All the Preceding f x (N)
This is a third alternative way to express the inter-relationship within the series of the f x (N). Referring once more to equation (6), that is: Let now consider the successive forms taken by this equation for increasing values of x.
It comes: By summing these equations, the following relationship is immediately derived: with the summation Σ i extended from i = 0 to i = (x -1).
That is, namely, the number f x(N) of species recorded x-times in a sampling of size N is proportional [via the factor -(N/x)] to the sum of the first derivatives (with respect to N) of the series of all the preceding f i (N) . In more practical terms, this means that the number f x (N) Another way to understand relation (10) results from re-writing it as follows: with the summation Σ i extended from i = 0 to i = (x -1).
This means that the proportion, among all the sampled individuals, of those ones that belong to anyone species recorded x-times [ = (x.f x (N) )/N ] is equal to minus the sum of the variations of all the preceding f x(N) , when sampling size increases of one observation.
Accordingly, relationships (10) or (11) both express, once again but in another way, the continuous linkage that exists between each f x (N) and the whole series of its predecessors, thereby highlighting still more clearly the strong "chaining" between the successive numbers f x (N) , which together rule the kinetics of species accumulation during progressive sampling.
Still another remarkable relationship may be derived from equation (10), which only involves, this time, the first derivatives of all the f x(N) .
Let X be the recorded number of individuals belonging to the species most frequently met in the sample under consideration. In other words, X is the largest value of x for which f x (N) ≠ 0 in this particular sample. The sum of the numbers of sampled individuals that belong to anyone of those species recorded x-times [ = (x.f x (N) )], for x up to its maximum value X, is equal to N. Accordingly, the summation of equation (11) for x up to its maximum value X yields: with the summation Σ x extended from x = 1 to x = X and the summation Σ i extended from i = 0 to i = (x -1). This finally leads to: with the summation Σ i extended from i = 0 to i = (X -1).

Butterfly Inventory on the Slopes of Mount Gariwang-san (S-Korea)
Field data from reference [10].

Butterfly Inventories at Bifeng Valley (Ghansu, China)
Field data from reference [11].  Now, let consider, alternatively, a more usual and pragmatic approach, now paying attention to those observations only giving rise to the detection of a new species and neglecting, accordingly, all the other observations (in spite of their equal role in the analytical approach considered above). In this purely "accounting" approach, the focus is put on the proportion p(N) = R (N)/N of those observations exclusively, which have provided positive records of new species. In other words, instead of paying attention to R(N) = Σ x f x (N), as was the case previously, the focus is now placed upon: This proportion p(N) is pragmatically interesting in that it quantifies the gradual weakening of sampling efficiency, i.e. the ever-slowing rate of detection of newly recorded species, as sampling is going on further.
As for the Species Accumulation Curve, the proportion p(N) of those observations providing positive records of new species is highly polymorphic and this polymorphism, here also, is limited by a constraining relationship applying to the expression of p(N).
I derive below this general relationship which constrains the proportion p(N).
The derivation of R(N) yields, accounting for equation (13) and then equation (3) and more generally: At last, from equations (1)

DISCUSSION
Five main features are emerging from the theoretical treatment (and the corresponding illustrative examples), regarding the variations, with sampling size N, of the numbers f x (N) of species respectively recorded x-times during sampling. It should be well understood that these features, all derived on a theoretical basis, are focal tendencies, towards which the empirical data, obtained from real samplings, actually converges (but may yet more or less slightly deviate, due to sampling stochasticity).
Two of these trends were expected, being in accordance with intuition: 1) All the numbers f x (N) of species recorded xtimes are first increasing, then pass by a maximum and finally decrease to zero. Also, in addition, the curves describing the variations of each f x (N) (and the positions of their respective maxima) are regularly shifted towards higher values of sampling size N, when x takes increasing values (Fig. 2); 2) The same holds true, mutatis mutandis, for the numbers x.f x(N) of those individuals belonging to anyone species recorded xtimes, whatever the value of x. Now, three other trends, by no means intuitive, were newly derived above, as a consequence of the general mathematical relationship (6)  The three latter trends have major importance in that they determine the "chaining linkage" between the successive numbers f x (N) of species recorded x-times. And this is of importance because the successive numbers f x (N) actually regulate the process of cumulative species discovery during progressive sampling.
As already stressed, the general mathematical relationship (6) x

CONCLUSION
The increasing number of newly recorded species (i.e. the "species accumulation") during progressive sampling gives rise to a rather simply shaped "Species Accumulation Curve". Paradoxically, this apparent simplicity does not lead to imagine the underlying complexity of the detailed process of species discovery and accumulation, as detailed above. In fact, each new individual observation may alternatively result in one or the other of a series of different consequences. More precisely, each observation of a new individual (i.e. N  N + 1) will contribute to increase by one unity either f 1 (N) , or f 2 (N) , f 3 (N) , …,f x (N) , … Now, although each of the numbers f x (N) of species recorded x-times varies with N at its own pace and out of phase with the others (Fig. 2), the process of species accumulation proves to be regulated, however, due to the above mentioned "chaining linkage" between the successive f x (N) (Figs. 4,5,6). And this, indeed, is at the very heart of the detailed process of species discovery and accumulation during progressive sampling. A process of major practical importance since it is involved in all biodiversity surveys and, more specifically, it is involved in the accurate extrapolation of the Species Accumulation Curve. Accurate extrapolation which, in turn, determines the precise estimate of the total species richness of a partially sampled assemblage of species and allows the reliable prediction of the additional sampling effort required to obtain a given increase in sample completeness.
The constraining mathematical relationships highlighted above are summarized as follows:

ACKNOWLEDGEMENTS
To Nick GOTELLI for his stimulating appreciation regarding my derivation of the general relationship constraining the successive derivatives (and thereby the shape) of the theoretical expression of all kinds of Species Accumulation Curves. Four anonym reviewers are also acknowledged for their relevant comments.

A.1 -Derivation of the constraining relationship between ∂ x R (N) /∂N x and f x(N)
The shape of the theoretical Species Accumulation Curve is directly dependent upon the particular Species Abundance Distribution (the "S.A.D.") within the sampled assemblage of species. That means that beyond the common general traits shared by all Species Accumulation Curves, each particular species assemblage give rise to a specific Species Accumulation Curve with its own, unique shape, considered in detail. Now, it turns out that, in spite of this diversity of particular shapes, all the Species Accumulation Curves are, nevertheless, constrained by a same mathematical relationship that rules their successive derivatives (and, thereby, rules the details of the curve shape since the successive derivatives altogether define the local shape of the curve in any details). Moreover, it turns out that this general mathematical constraint relates bi-univocally each derivative at order x, This fundamental relationship may be derived as follows.
Let consider an assemblage of species containing an unknown total number 'S' of species. Let R be the number of recorded species in a partial sampling of this assemblage comprising N individuals. Let p i be the probability of occurrence of species 'i' in the sample This probability is assimilated to the relative abundance of species 'i' within this assemblage or to the relative incidence of species 'i' (its proportion of occurrences) within a set of sampled sites. The number Δ of missed species (unrecorded in the sample) is Δ = S -R.
The estimated number Δ of those species that escape recording during sampling of the assemblage is a decreasing function Δ (N) of the sample of size N, which depends on the particular distribution of species abundances p i : with Σ i as the operation summation extended to the totality of the 'S' species 'i' in the sampled assemblage (either recorded or not) The expected number f x of species recorded x times in the sample, is then, according to the binomial distribution: We shall now derive the relationship between the successive derivatives of R (N) , the theoretical Species Accumulation Curve and the expected values for the series of 'f x '.
According to equation (A1. Equation (A1.13) makes quantitatively explicit the dependence of the shape of the species accumulation curve (expressed by the series of the successive derivatives [∂ x R (N) /∂N x ] of R(N)) upon the shape of the distribution of species abundances in the sampled assemblage of species.

A.2 -An alternative derivation of the relationship between ∂ x R (N) /∂N x and f x(N)
Consider a sample of size N (N individuals collected) extracted from an assemblage of S species and let G i be the group comprising those species collected i-times and f i(N) their number in G i . The number of collected individuals in group G i is thus i.f i(N) , that is a proportion i.f i(N) /N of all individuals collected in the sample. Now, each newly collected individual will either belong to a new species (probability 1.f 1 /N = f 1 /N) or to an already collected species (probability 1-f 1 /N), according to reference [12]. In the latter case, the proportion i.f i(N) /N of individuals within the group G i accounts for the probability that the newly collected individual will contribute to increase by one the number of species that belong to the group G i (that is will generate a transition [ i-1 → i ] under which the species to which it belongs leaves the group G i-1 to join the group G i ). Likewise, the probability that the newly collected individual will contribute to reduce by one the number of species that belong to the group G i (that is will generate a transition [ i → i+1 ] under which the species leaves the group G i to join the group G i+1 ) is (i+1).f i+1(N) /N. Accordingly, for i> 1: