Finding confidence limits on population growth rates: Bootstrap and analytic methods

When predicting population dynamics, the value of the prediction is not enough and should be accompanied by a conﬁdence interval that integrates the whole chain of errors, from observations to predictions via the estimates of the parameters of the model. Matrix models are often used to predict the dynamics of age-or size-structured populations. Their parameters are vital rates. This study aims (1) at assessing the impact of the variability of observations on vital rates, and then on model’s predictions, and (2) at comparing three methods for computing conﬁdence intervals for values predicted from the models. The ﬁrst method is the bootstrap. The second method is analytic and approximates the standard error of predictions by their asymptotic variance as the sample size tends to inﬁnity. The third method combines use of the bootstrap to estimate the standard errors of vital rates with the analytical method to then estimate the errors of predictions from the model. Computations are done for an Usher matrix models that predicts the asymptotic (as time goes to inﬁnity) stock recovery rate for three timber species in French Guiana. Little difference is found between the hybrid and the analytic method. Their estimates of bias and standard error converge towards the bootstrap estimates when the error on vital rates becomes small enough, which corresponds in the present case to a number of observations greater than 5000 trees.


Introduction
Matrix models are population dynamics models for structured populations [1].They have been widely used to address the conservation and management of animal or plant species [2][3][4][5][6].Structured populations are described by the number of individuals in each category of the structuring variable (denoted by vector NðtÞ).Transition rates between categories are gathered into a transition matrix A. Time is discrete, and the state of the population at time t is related to that at time t þ 1 by the recurrence relationship: Nðt þ 1Þ¼ANðtÞ.Given an initial composition of the population, Nð0Þ, the composition any time later can be predicted as: NðtÞ¼A t Nð0Þð 1Þ As t tends to infinity, the proportions of individuals in the different categories will grow at an exponential rate, k, which is the dominant eigenvalue of A [1].The population growth rate k is a model's prediction that is often use to assess if the population is decaying ðk < 1Þ or increasing ðk > 1Þ.
Two types of matrix models are most often distinguished by biologists depending on whether the structuring variable is age or ontogenetic stage [7].The former corresponds to Leslie models whereas the latter corresponds to Lefkovitch models [8,9].The Lefkovitch matrix is the most general.The Leslie matrix has non null elements on its first row and on its sub-diagonal.A third type of matrix model that is familiar to foresters is the Usher model, for size-structured populations [10,11].The Usher model has non null elements on its first row, its main diagonal and its sub-diagonal.
The parameters of the transition matrix A are composed of individual growth, survival or fecundity rates, and are thus known as vital rates.As vital rates are estimated from experimental or census data, their estimates are subject to uncertainties.Then any prediction of the matrix model is also subject to uncertainty.This sampling uncertainty has often been disregarded [12] in preference to other sources of uncertainty such as environmental or demographic variability [3,5,13].Yet assessing sampling uncertainty is essential to the statistical reasoning, including statistical tests and confidence intervals.
From a statistical point of view, the estimates of the vital rates are realized values of random variables whose distribution depends on the distribution of experimental or census data.The prediction of the matrix model is a function of the vital rates, and its estimator is obtained by plugging the estimator of the vital rates into this function (Fig. 1).Difficulties arise because the function is often non-linear, thus leading to complicated relationships between the distribution of the vital rates and the distribution of the predicted quantity.
Alvarez and Slatkin [14] have reviewed three methods for computing sampling uncertainty on matrix models predictions: the analytic method, the Monte Carlo method, and resampling methods (including bootstrap, jackknife and related methods).The analytic method relies on a second order Taylor expansion of the predicted quantity with respect to the vital rates, and is approximate.Furthermore it requires that the moments of the vital rates are known.Monte Carlo and resampling methods substitute intensive computer simulations to approximation.The Monte Carlo method is based on the assumption that the distribution of vital rates is known, whereas it is supposed for resampling methods that a reference dataset is available.
Comparisons between the Monte Carlo method and the analytic method have been performed [15][16][17] and it was concluded that the analytic method is reliable if errors on vital rates remain small.These studies focused on the last arrow of the diagram shown in Fig. 1: they supposed that the uncertainty on the vital rates was known and assessed how this uncertainty contributed to the model's prediction.No link was made with the sampling variability of observations (the first two arrows of the diagram in Fig. 1).Comparisons between the bootstrap and the jackknife have also been performed (see references in [1,14]) and it was concluded that bootstrap should be preferred.However, as far as we know, no comparison has been made between the bootstrap method and the analytic method, integrating the whole chain of uncertainty from observations to predictions as shown in Fig. 1.
This study aims at comparing the analytic method and bootstrap to infer the bias, standard error and confidence interval of the population growth rate predicted by a matrix model for a size-structured population.A hybrid method, using the bootstrap to infer the uncertainty on vital rates from observations, and then the analytical method to incorporate this uncertainty into the model's prediction, will also be compared.These methods will be used to assess the sustainability of logging scenarios for the three major timber species in French Guiana.

Study site and focus species
Data for this study comes from the Paracou experimental site (5°18 0 N, 52°23 0 W) in French Guyana.Paracou is an experimental site dedicated to studying the effects of logging damage on stock recovery.The site lies in a terra firme rain forest on the coastal plain with equatorial climate.A dry season occurs from August to mid-November.From March to April, a short drier period interrupts the rainy season.The physiography of the site shows smooth slopes incised by minor streams.Part of the site is covered by permanently waterlogged areas.The experimental design of the site consists of three blocks of four 300 Â 300 m permanent sample plots with a 25 m inner buffer zone.In each central 250 Â 250 m square, all trees over 10 cm dbh (diameter at breast height) were identified and georeferenced.Since 1984, girth at breast height, standing deaths, treefalls and newly recruited trees over 10 cm dbh have been monitored annually [18].Three species were selected: angelique (Dicorynia guianensis Amshoff, Caesalpiniace), pink gonfolo (Qualea rosea Aublet, Vochysiace) and grey gonfolo (Ruizterania albiflora (Warming) Marcano-Berti, Vochysiace).These three species are the most important timber species in French Guiana.Angelique alone represents 34% of the total wood production whereas the two gonfolos represent 32% of it [19].Angelique is a large canopy tree species endemic to the Guiana shield.Its ecological characteristics are described in [20].Angelique trees are usually felled above a diameter at breast height (dbh) of 60 cm and natural populations are exploited with a 40-year felling cycle.Pink and grey gonfolos are also large canopy trees that are found on the Atlantic coast of South America.Their ecological characteristics are described in [21,22].
Data from 1992 to 1994 were used for this study.The time interval for subsequent computations thus is 2 years, and the length of the felling cycle corresponds to 20 time intervals.The number of observations was 467 for angelique, 484 for pink gonfolo, and 53 for grey gonfolo.Trees were distributed in diameter classes with equal width and highest bound 60 cm (which is the felling threshold).The number of classes was computed using Sturges' [23] formula.Hence angelique and pink gonfolo trees were distributed in m ¼ 8 diameter classes with breakpoints 10-17.1, 17. 1-24.3, 24.3-31.4, 31.4-38.6, 38.6-45.7,45.7-52.9,52.9-60 cm and P60 cm, whereas grey gonfolo trees were distributed in m ¼ 5 diameter classes with breakpoints 10-22.5, 22.5-35.0,35.0-47.5, 47.5-60 and P60 cm.

From observations to population growth rate
In this section, all the elements that compose the diagram shown in Fig. 1 are specified, starting from the matrix model up to the distribution of observations.

Usher model
The Usher model is a matrix model for size-structured populations [10,11].Size is divided into m size classes.The special feature of Usher models is Usher's hypothesis, that states that the dynamics of an individual between times t and t þ 1 comprises three, and only three, options: it stays alive in the same class, it stays alive and moves up to the next class, or its dies; moving up by more than one class or moving backwards is not allowed.The Usher hypothesis is appropriate when individual growth is slow and linear on short time intervals, which is the case for trees.It implies that the m Â m transition matrix, U, has non-null elements only on its first row, its main diagonal and its sub-diagonal.
It is convenient to break down U into: U ¼ GS þ R, where G (the growth matrix) is a stochastic m Â m matrix with non null elements on its diagonal and its sub-diagonal, S (the survival matrix) is a diagonal m Â m matrix, and R (the recruitment matrix) is a m Â m matrix with non-null elements on its first row.The ith element of the diagonal of G is 1 À p i and the ith element of its subdiagonal is p i , where p i is the conditional probability that an individual moves from class i to i þ 1 knowing that it stays alive.As no transition can occur beyond the last class, p m ¼ 0. The ith element of the diagonal of S is 1 À d i , where d i is the death probability for an individual in class i.The ith element of the first row of R is f i , where f i is the fecundity rate of the individuals in class i. Estimating f i requires to know which individual is the mother of any newly recruited individual.In many situations, and in particular in forestry, this is not possible.Then, only an average fecundity rate f is used, that is to say: f 1 ¼ ... ¼ f m ¼ f .It represents the ratio of the number of recruited individuals over the number of individuals in the population.Notice that if p i ¼ 1 for all i ¼ 1; ...; m À 1, the Usher matrix simplifies to a Leslie matrix.Hence the Leslie model is a special case of the Usher model.In summary, the Usher matrix has 2m vital rates ðf ; d 1 ; ...; d m ; p 1 ; ...p mÀ1 Þ, that will be denoted in short h 2ðR þ Þ 2m .

Model's prediction: stock recovery rate
The model's prediction that motivated this study is the stock recovery rate, that is a forestry concept used to assess the sustainability of forest logging.The management of tropical natural forest is based on a periodic selective harvest of all trees with a diameter greater than a threshold called the ''minimum harvest diameter" (MHD).The time interval between two successive logging operations, denoted T, is the length of the felling cycle.During this time, the forest is left so that the wood stock naturally regenerates.The exploitable wood stock is defined as the number of stems with a diameter greater than a threshold called the ''minimum diameter for exploitation" (MDE).The MDE is legally fixed by national forest services whereas the MHD (which is necessarily greater than the MDE) and the felling cycle T are chosen by forest managers to ensure sustainability.
We assume that logging is instantaneous with respect to forest dynamics.Harvest is defined by a m Â m diagonal matrix, L, such that the ith element of the diagonal of L equals zero if the corresponding diameter class is harvested and one otherwise.Under a periodic felling regime with a felling every T time steps starting from 0, the composition of the population at the end of the kth felling cycle is given by: NðkTÞ¼A k Nð0Þð 2Þ where A ¼ U T L. A is a transition matrix that relates the composition of the population at the end of a felling cycle to its composition at the end of the previous felling cycle.Eq. ( 2) is similar to (1) up to a change of time scale.Then, as k tends to infinity, NðkTÞ will grow at an exponential rate, k, that is the dominant eigenvalue of A.
By definition, the stock recovery rate at the end of the kth felling cycle, X k , is the ratio of the exploitable stock at time kT over the exploitable stock at time ðk À 1ÞT.Given the asymptotic behaviour of (2), the asymptotic stock recovery rate, that is the limit of X k as k goes to infinity, identifies with the dominant eigenvalue of A: In summary, the asymptotic stock recovery rate of the Usher model with transition matrix U identifies with the population growth rate of the matrix model with transition matrix A ¼ U T L. The population growth rate k depends on the parameters of A but is independent of the initial composition Nð0Þ.The asymptotic stock recovery rate thus depends on the 2m vital rates h of U. To highlight this dependency, we shall write kðhÞ, where k here denotes a function from The stock recovery rate is used by forest managers to assess the sustainability of logging scenarios.A logging scenario is defined by the felling cycle T and the MHD (that is L).The asymptotic stock recovery rate corresponds to a long-term assessment of sustainability: k ¼ 1 indicates that logging is sustainable; k < 1 indicates that the population is over-logged (then T or MHD should be increased); k > 1 indicates that logging does not remove the natural ingrowth.However, the estimate of k is not an end in itself: it is necessary to see if one belongs to the confidence interval of k.

Distribution of observations
In the context of size-structured populations, an observation consists of a pair ðS 1 ; S 2 Þ giving the size S 1 of an individual at initial measurement time t 0 , and the size S 2 of the same individual at remeasurement time t 0 þ 1.By convention, S 1 ¼ 0 when the individual is newly recruited at t 0 þ 1, and S 2 ¼ 0 when the individual dies between t 0 and t 0 þ 1 (but ð0; 0Þ is not a valid observation).A dataset of size n, denoted s n , consists of n observations fðS 1k ; S 2k Þg k¼1;...;n .
Size classes are defined by their breakpoints u i .The ith size class is the interval ½u i ; u iþ1 ½, where u mþ1 ¼1.An individual is recruited when its size reaches the minimum size for inventory that is u The unknown distribution of observations is denoted F. It can be broken down into: is the probability for an observation to be a newly recruited individual, p 1 ¼ PrðS 2 ¼ 0Þ is the probability for an observation to be a dead remeasured individual, p 2 ¼ 1À p 0 À p 1 ¼ PrðS 1 -0; S 2 -0Þ is the probability for an observation to be a live individual at both measurement times, d o is the Dirac mass at observation o, F 1 ðxÞ¼PrðS 1 < xjS 2 ¼ 0Þ is the univariate conditional size distribution at t 0 knowing that the individuals are dead at t 0 þ 1, and F 2 ðx; yÞ¼PrðS 1 < x; S 2 < yjS 1 -0; S 2 -0Þ is the bivariate conditional distribution of sizes at t 0 and size increments between t 0 and t 0 þ 1 knowing that individuals are alive at t 0 and t 0 þ 1.

Vital rates estimators
The parameters h of the Usher model are estimated from a dataset s n .The estimate of h is the realized value of its estimator ĥ, which is a random variable whose distribution follows from F. Many estimators of vital rates exist, with contrasted performances in terms of statistical efficiency [24,25].We selected a biased estimator with a low standard error [26].The average fecundity rate was estimated using the classical proportion estimator [27]: where #fS 1 ¼ 0g¼ P n k¼1 1ðS 1k ¼ 0Þ is the number of newly recruited individuals at remeasurement time.Mortality rates were taken equal in all classes, which introduces a small bias but considerably reduces the standard error.The common mortality rate was estimated by the ratio-of-means estimator [27]: Finally, the conditional upgrowth rate from class i to i þ 1 was estimated using a revised version of the bias-corrected increment estimator [24][25][26]: where âi is an estimator of the average size increment in class i, and G 2i ð:; jÞ is a parametric expression depending on parameter j for the conditional diameter distribution at t 0 knowing that individuals are neither dead nor recruits and that they belong to size class i at t 0 .The truncated exponential distribution on ½u i ; u iþ1 ½ was used for G 2i [28].Its cumulative distribution function is: The average size increment was simply estimated using the empirical mean of size increments in class i: whereas the maximum likelihood estimator of the j parameter was used.Its expression is: is the empirical mean of sizes at t 0 in class i excluding dead trees, and h is a continuous monotone decreasing function (and thus invertible) from R onto À1; 0½, defined as: hðxÞ¼x À1 À½1À expðÀxÞ À1 if x-0 and hð0Þ¼À0:5.Estimator (6) for growth rates makes use of the continuous information on size rather than simply counting the number of individuals that grow up from class i to i þ 1 [29][30][31].It is a biased estimator but is still more efficient than the classical proportion estimator in terms of quadratic error [25].

Computing sampling uncertainty
The estimator ĥ of vital rates h is a random vector whose distribution depends on the distribution F of observations.Then any prediction of the Usher model, and in particular the asymptotic stock recovery rate kð ĥÞ, is also a random variable whose distribution depends on F. Now that all components of the diagram in Fig. 1 have been specified, we clarify the methods for computing the uncertainty on model's predictions from the uncertainty on observations.

Bootstrap method
Bootstrap methods basically consist in replacing in the diagram of Fig. 1 the unknown distribution F of observations by the empirical distribution of observations in dataset s n : b F n ¼ð1=nÞÂ P n k¼1 d ðS 1k ;S 2k Þ (where d o , for any observation o, is the Dirac mass at o). Drawing a sample of n independent and identically distributed observations according to b F n is then equivalent with drawing randomly n observations with replacement from s n .Bootstrap algorithms for matrix models have been described in [1] and are a direct application of the algorithms presented by Efron and Tibshirani [32].The only point to clarify to comply with Efron and Tibshirani's [32] notations is to prove that there exists a functional H such that h ¼ HðFÞ.The plug-in estimator of h then can be defined as: Given a set of B bootstrap replications kð ĥÃ 1 Þ, ..., kð ĥÃ B Þ, the bias of kð ĥÞ was estimated by [32, p. 125]: where kÃ is the empirical mean of the B replications: The standard error of kð ĥÞ was estimated using the usual bootstrap estimator of standard error (see [1, Eq. (12.15), p. 306] or [32, algorithm 6.1 p.47]).The confidence interval of kð ĥÞ was estimated using the percentile method [32, § 13.3, p.170].All computations were done using B ¼ 50; 000 replications.Bootstrap algorithms were implemented in C language interfaced with R software [33].The code is available at http://agents.cirad.fr/index.php/Nicolas+Picard.

Analytic method
Analytic methods approximate the variance of an estimator by r 2 =n, where n is sample size and r 2 is the asymptotic variance of the estimator (obtained as a limit when n tends to infinity).Assuming that there exists a functional G with suitable properties for differentiability, such that ĥ ¼ Gð b F n Þ where b F n is the empirical distribution of the observations, the asymptotic covariance matrix R of the estimator ĥ is the 2m Â 2m matrix: where IC G;F ðxÞ is the influence curve of G at observation x relative to distribution F [34,35], and X > for any matrix or vector denotes the transpose of X.Given an observation x, the influence curve is the vector of length 2m given by: where d x is a Dirac mass at x. Assuming that G exists, then functional k G relates the estimator of the asymptotic stock recovery rate to the empirical distribution of the observations: The asymptotic variance of kð ĥÞ can then be computed in the same way, using the influence curve.
The computation of asymptotic variances using the influence curve provides a rigorous mathematical justification of the second-order Taylor expansions (sometimes also referred as the dmethod) previously used to compute the variances in an analytic way [14][15][16][17].In particular it can be used to prove that the asymptotic variance of kð ĥÞ, denoted r 2 k , is related to the asymptotic covariance matrix of ĥ by: r 2 k ¼ðD GðFÞ kÞRðD GðFÞ kÞ > ð10Þ where D h k is the Jacobian matrix of k at h, that is a 1 Â 2m matrix whose ith element is equal to the partial derivative of k with respect to the ith element of h.The influence curve can also be used to state that kð ĥÞ has an asymptotic normal distribution with mean k½GðFÞ.
Then the bias of kð ĥÞ in the analytic approach is k½GðFÞ À k½HðFÞ, whereas its confidence interval are computed using the percentiles of the normal distribution.
The estimates of the asymptotic moments of kð ĥÞ were obtained by replacing the unknown distribution F by the empirical distribution b F n .Hence the bias was estimated as: R is obtained by replacing F by b F n in the expression of R.

Hybrid method
The hybrid method makes a compromise between the accuracy of the bootstrap approach (whereas the analytic method is approximate) and the speed of computation of the analytic method (whereas the bootstrap method requires potentially long simulations).It is based on the fact that, once the covariance matrix of the vital rates estimator is known, the variance of any prediction of the matrix model can be computed from them (see last arrow of the diagram in Fig. 1).Hence the hybrid method uses the bootstrap approach to compute the moments of the vital rates estimator with good accuracy, and then uses the analytic method to compute the moments of kð ĥÞ.Notice that with this approach, bootstrap simulations are needed once for all, whatever the prediction of the model to be subsequently studied is.The bias was estimated as: where ĥÃ is the empirical mean of the B bootstrap replications: The variance of kð ĥÞ was computed using the same formula as (10) but replacing GðFÞ by ĥÃ , and replacing the asymptotic covariance matrix R of ĥ by its bootstrap estimate.
The confidence interval was again computed using the percentiles of the normal distribution.

Plug-in estimator of vital rates
The relationship between the distribution of observations F and the vital rates h of the Usher model is given by the following Proposition.
Proposition 1.There exists a functional H such that h ¼ HðFÞ, where h are the parameters of the Usher model and F is the distribution of observations given by (3).The functional H is defined by: The first element of HðFÞ, that corresponds to the mean fecundity rate, is: The ði þ 1Þth element of HðFÞ for i ¼ 1; ...; m, that corresponds to the mortality rate in class i, is: is the size distribution at measurement time t 0 (including a Dirac mass at 0 for individuals that will be recruited).
The ðm þ 1 þ iÞth element of HðFÞ for i ¼ 1; ...; m À 1, that corresponds to the conditional upgrowth rate from class i to i þ 1 is: The proof is given in Appendix A. This Proposition then yields: Proposition 2. The plug-in estimator of h is given by: The plug-in estimator of the mean fecundity rate is: The plug-in estimator of the mortality rate in class i is: The plug-in estimator of the conditional upgrowth rate from class i to i þ 1 is: Hence the plug-in estimator of h is the classical proportion estimator of the transition rates [24,27].The estimator of f that we used is identical to its plug-in estimator.The estimator of d i that we used is a weighted mean of the plug-in estimators of the d i s.However the estimator of p i that we used is completely different from its plug-in estimator.This means that di and pi are biased.One may also notice that formula (7) for the bootstrap estimate of the bias differs from that given by Caswell [1, Eq.(12.23),p. 318].The formula given by Caswell is valid only if the estimators of the vital rates are unbiased, and is thus inexact in the present case where di and pi are biased.

Asymptotic covariance matrix of vital rates
Let M be the functional that maps any distribution F on its expectation:

Z
x dFðxÞ The relationship between the empirical distribution of observations b F n and the vital rates estimator ĥ is given by the following Proposition.
Proposition 3.There exists a functional G such that ĥ ¼ Gð b F n Þ, where ĥ is the estimator of vital rates that was previously defined and b F n is the empirical distribution of observations.Given a distribution F that can be written as ( 3), the functional G is defined by: The first element of GðFÞ, that corresponds to the mean fecundity rate, is: The ði þ 1Þth element of GðFÞ for i ¼ 1; ...; m, that corresponds to the mortality rate in class i, is:

.; mÞ
The ðm þ 1 þ iÞth element of GðFÞ for i ¼ 1; ...; m À 1, that corresponds to the conditional upgrowth rate from class i to i þ 1 is: is the conditional size distribution at t 0 knowing that the individuals are alive at both measurement times and belong to class i at t 0 , 1ðpÞ is the indicator function of proposition p, and x@y ða; bÞ db no da is the conditional distribution of size increments between t 0 and t 0 þ 1 knowing that the individuals are alive at both measurement times and belong to class i at t 0 .
The proof is similar to that of Propostion 1 and is not given here to save space.Carrying forward this expression of G in (9) gives the influence curve IC G;F ðxÞ for any observation x.Carrying forward this expression of the influence curve in (8) gives the asymptotic covariance matrix of ĥ.Computations are long and are not shown here to save space.They give the following expression for the asymptotic covariance matrix R of the vital rates estimator ĥ (where R ij is the element on the ith row and jth column of R): The asymptotic variance of the estimator of the mean fecundity rate is: The asymptotic covariance matrix of the estimators of mortality rates is:

; mÞ
The asymptotic variance of the estimator of the conditional upgrowth rate from class i to i þ 1 is: where 2i is the conditional variance of size increments in class i, r 2 1i is the conditional variance of sizes in class i, and c i is the conditional covariance between size and size increment in class i.
All other elements of R are null.
To complement the latter expression, we give the expressions for the derivatives of h À1 and G 2i : @G 2i @j ðx; jÞ¼

Asymptotic variance of the stock recovery rate
Given the expression of the asymptotic covariance matrix of vital rates given in the previous Section and Eq. ( 10), the expression of the Jacobian matrix D h k will give the asymptotic variance of the stock recovery rate.Function k that maps the vital rates h onto the asymptotic stock recovery rate can be broken down into: kðhÞ¼ðA B C DÞðhÞ where D is the function that maps the vector of vital rates h onto the Usher matrix, C is the function that maps any m Â m matrix onto its Tth power, B is the function that maps any m Â m matrix onto its right product with L, and A is the function that maps any m Â m matrix onto its dominant eigenvalue.Using chain rule for Jacobian [36, p.91

],
D h k ¼ðD BCDðhÞ AÞðD CDðhÞ BÞðD DðhÞ CÞðD h DÞ Let Z be the vector of length m whose elements are all equal to zero.Let U i be the vector of length m whose elements are all equal to zero except the ith that is equal to one.Let where, by convention, p m ¼ 0 and U mþ1 Z. Then the Jacobian of D is the m 2 Â 2m matrix: The Jacobian of C is the m 2 Â m 2 matrix: where denotes the Kronecker product [36].Let c be the last size class that is not harvested, that is to say: L is the diagonal matrix whose ith element on the diagonal equals one if i 6 c and zero if i > c.Then the Jacobian of B is the m 2 Â m 2 diagonal matrix whose ith element on the diagonal equals one if i 6 mc and zero if i > mc.

Comparison between the three methods
Table 1 shows the estimate of the asymptotic (as time tends to infinity) stock recovery rate using the bootstrap, the analytic or the hybrid method, for the three species at Paracou.For all three species, one is included in the 95% confidence interval.However the confidence interval is centred on a value higher than one for angelique and grey gonfolo, whereas it is centred on one for pink gonfolo.Moreover this estimate is an overestimation for angelique and pink gonfolo whereas it is an underestimation for grey gonfolo.Estimates for angelique and pink gonfolo rely on a similar number of observations (n ¼ 467 for angelique and n ¼ 484 for pink gonfolo), whereas sample size ðn ¼ 53Þ is lower for grey gonfolo.The standard error is correspondingly higher for grey gonfolo than for angelique or pink gonfolo.
Reference values in Table 1 are those given by the bootstrap method since this method does not make any approximation.On the contrary the analytic and the hybrid methods approximate the true values by their asymptotic values as the sample size tends to infinity.The analytic method underestimates the asymptotic stock recovery rate and its standard error.It either underestimates (for angelique and pink gonfolo) or overestimates (for grey gonfolo) the bias.The hybrid method slightly outperforms the analytic method (angelique, grey gonfolo) or does as well (pink gonfolo) in estimating the standard error.However the hybrid method does as well (angelique) or worse (pink and grey gonfolo) than the analytic method in estimating the bias.
Table 1 is complemented by Figs. 2 and 3 that show the speed of convergence of the bootstrap and hybrid estimates of the bias and Table 1 Estimate of the asymptotic stock recovery rate, and of the bias, standard error and 95% confidence interval of its estimator using the bootstrap, analytic or hybrid method, for the three species angelique (Dicorynia guianensis), pink gonfolo (Qualea rosea) and grey gonfolo (Ruizterania albiflora) at Paracou. standard error of the stock recovery rate estimator as sample size tends to infinity.For all three species, convergence is reached for sample sizes greater than 5000 observations.Again, the reference values are those given by the bootstrap method.Both the analytic and the hybrid method underestimate the bias and the standard error of the stock recovery rate estimator.The hybrid method does slightly better than the analytic method in estimating the standard error, whereas the analytic method does better than the hybrid method in estimating the bias.

Discussion
With the bootstrap method we can estimate the bias, standard error and confidence interval of the stock recovery rate estimator without any approximation.This method requires potentially long computing time.The analytic method, which relies on the asymptotic values of these quantities as the sample size tends to infinity, is comparatively very fast but is approximate if the sample size is small.The convergence between the bootstrap and the analytic method was obtained for sample size greater than 5000 for the three species at Paracou.Although the hybrid method makes a compromise between the bootstrap and the analytic method by using the former for the second arrow of Fig. 1 and the latter for the last arrow of Fig. 1, its estimates are very close to those of the analytic method, with slightly better results for standard error and slightly worse results for bias.This means that most of the discrepancy between the bootstrap and the analytic method for small samples is due to the last arrow of Fig. 1.As a first conclusion, the hybrid method has little interest as compared to the analytic method.As a second conclusion, the analytic method in unreliable unless sample size is large enough.Bootstrap then should be preferred.The threshold for the analytic method to be reliable (about 5000 observations on the basis of the three species at Paracou) is beyond most sample sizes commonly found in tropical rainforests.The value of this threshold depends on the type of estimator used for vital rates and on the prediction of interest.For instance, Zetlaoui et al. [37] and Chagneau [38] found a threshold of about 10 000 observations to predict the stock recovery rate at the end of the first felling cycle or the population growth rate, using the proportion estimator.Due to the high species richness of tropical rainforests and the corresponding low densities of their species, capturing 5000 trees (a fortiori 10 000) most often require several hundreds (if not thousands) of hectares.
Applications of these methods to assess sampling variability are numerous [39][40][41][42][43][44][45].In the present case they allow us to test whether the asymptotic stock recovery rate differs from one.Although the three species at Paracou have contrasted expected values for k, none of them is significantly different from one at level 5%.More observations would be required to estimate k with greater accuracy, and to be able to decide whether it is significantly different from one or not.The accuracy (sensu [46]) of the estimate of k presently is 38% for angelique, 46% for pink gonfolo, and 98% for grey gonfolo at confidence level 95%.Taking b F n for the theoretical distribution F of observations (see first arrow of Fig. 1), it would require 6870 observations for angelique (to compare to 467 presently available), 9860 observations for pink gonfolo (to compare to 484), and 5060 observations for grey gonfolo (to compare to 53) to get an accuracy of 10% at confidence level 95%, which is generally considered as an acceptable accuracy.For these sample sizes, and assuming again that the theoretical distribution of observations is b F n , k is significantly greater than one at level 5% for angelique (p-value < 10 À4 ) and grey gonfolo (p-value = 10 À4 ), and is not significantly different from one at level 5% for pink gonfolo (pvalue = 0.67).
Another application would consist in comparing sampling variability with other sources of variability, in particular environmental variability.For instance we used data from 1992 to 1994, but it would be interesting to assess how the variability generated by the use of another time step compares to sampling variability.Depending on this comparison, it will be possible or not to detect significant differences in vital rates between years.This is left for future work.
The approximate expression for bias corresponds to a Taylor expansion to order zero with respect to 1= ffiffiffi n p ÀÁ (where n is sample size), whereas the approximate expression (10)

&' À k½HðFÞ
where H h k is the Hessian matrix of k at h, that is a 2m Â 2m matrix whose term on the ith row and jth column is the second derivative of k with respect to the ith and jth elements of h, and vec is the vec operator, that is the operator that vectorizes a matrix by stacking its columns.In this equation, the term between braces corresponds to a Taylor expansion up to order ð1= ffiffiffi n p Þ 2 .Similarly, the approximate expression for standard error could be expanded up to higher order with respect to 1= ffiffiffi n p ÀÁ , which would entail computing the successive derivatives of k with respect to h.These computations are not complicated in the principle, but are tedious in practice.Motivation for computation speed is then needed to prefer the analytic method to bootstrap.The results of the present study are consistent with those of previous studies [15][16][17] in the sense that the analytic method is reliable only if errors on vital rates remain small.However these previous studies focused on the last arrow of Fig. 1 and ignored sampling variability.The present study relates sample size to the acceptable errors on vital rates.As sample size increases, the errors on vital rates vanished and the conclusions of the previous studies are found.
The present study also extends previous results that were obtained for other estimators of vital rates and other predicted quantities.Zetlaoui et al. [37,47] and Chagneau [38] used for the statistical model for observations (i.e. the first arrow of Fig. 1)a multinomial distribution with the vital rates as parameters.The corresponding maximum likelihood estimator of vital rates was the proportion estimator of Michie and Buongiorno [27], which also corresponds to the plug-in estimator of the present study.The asymptotic distribution for the maximum likelihood estimator can be computed using the d-method.The expression for the asymptotic variance is basically the same as the one given by Eq. ( 10), but R then designates the Fisher information matrix.Computations are noticeably more complicated with any estimator as the one used in the present study than with the maximum likelihood estimator.Zetlaoui et al. [37,47] focused on the population growth rate (i.e. the dominant eigenvalue of the Usher matrix) and the corresponding stable diameter distribution, while Chagneau [38] focused on the stock recovery rate at the end of the first felling cycle.The present study shows that these results can easily be extended to other quantities predicted by the matrix model, as long as these quantities are differentiable functions of the vital rates.This simply entails replacing Dk in Eq. ( 10) by the appropriate Jacobian matrix.Notice that the expression of the Jacobian Dk can often be found as part of a sensitivity analysis [1,Chapter 9].Nevertheless, the bootstrap method is comparatively easier to extend to other vital rate estimators and other predictions of the matrix model than the analytic method.It can also deal with an explicit distribution for the observations (first arrow of Fig. 1), by performing parametric bootstrap [32,48].

Fig. 1 .
Fig. 1.Schematic diagram of the uncertainty chain showing how any prediction of the matrix model is a random variable whose distribution depends on the distribution of observations.
and the estimate of r k was: r2 k ¼ðDĥkÞ b RðDĥkÞ > , where b

Fig. 3 .
Fig. 3. Convergence speed of the bootstrap and hybrid estimates of standard error toward its asymptotic (as sample size tends to infinity) value, for the three species at Paracou: solid symbols represent the bootstrap estimate; open symbols represent the hybrid estimate; lines represent the asymptotic value; squares or solid line are for angelique (Dicorynia guianensis); circles or dashed line are for pink gonfolo (Qualea rosea); triangles or dotted line are for grey gonfolo (Ruizterania albiflora).
Convergence speed of the bootstrap and hybrid estimates of bias toward its asymptotic (as sample size tends to infinity) value, for the three species at Paracou: solid symbols represent the bootstrap estimate; open symbols represent the hybrid estimate; lines represent the asymptotic value; squares or solid line are for angelique (Dicorynia guianensis); circles or dashed line are for pink gonfolo (Qualea rosea); triangles or dotted line are for grey gonfolo (Ruizterania albiflora).