Potential of Sentinel-2 and SPOT5 (Take5) time series for the estimation of grasslands biodiversity indices

The aim of this study is to assess the potential of satellite image time series with high spatial and high temporal resolutions for the prediction of grasslands plant biodiversity. The grasslands are modeled at the object scale to be consistent with ecological measurements (one biodiversity index per grassland). A kernel regression is used to predict the biodiversity index of a grassland from its spectro-temporal reflectance. The method is applied using two intra-annual multispectral or NDVI time series of SPOT5 Take5 (18 dates) and Sentinel-2 (7 dates) to predict the Shannon and the Simpson indices of about 200 grasslands in south-west France. The best coefficient of determination for the prediction of the Shannon index is 0.13 and it is 0.17 for the Simpson index prediction. The unsatisfactory results suggest that a high temporal resolution combined with a high spatial resolution and multispectral bands are not sufficient to estimate grassland biodiversity at the grassland scale.


I. INTRODUCTION
Grasslands are one of the most largest land covers on Earth. They represent a significant source of biodiversity in farmed landscapes, because of their plant and animal composition [1], [2]. Thanks to this diversity, they provide many ecosystem services such as carbon and erosion regulation, pest control, crop pollination [3]. However, global grassland surface area is decreasing and grassland diversity is declining because of agriculture intensification and urbanization [3]. To understand these effects, it is of utmost matter to determine and monitor grassland diversity and composition at a large extent.
In ecology, grasslands are usually monitored through ground surveys. But they are limited in time and space, since they are timeconsuming and require important human and materials resources [4]. Remote sensing is a tool which has already proven its ability for habitat mapping [5], [6]. However, the study of grasslands in fragmented landscapes, such as found in Europe, has been limited because of sensors resolutions. Indeed, grasslands are rather small elements in the landscape which require a high spatial resolution to be detectable [7]. Moreover, for biodiversity related applications, very high spatial resolution (less than 1 meter) is more relevant to discriminate the species communities [7]- [9].
For biodiversity application, most of the remote sensing research is based on the Spectral Variation Hypothesis [9]- [11]. It supposes that the spatio-spectral variability in the image is related to the spatial heterogeneity in the environment, and therefore it can be used as a proxy for species diversity. Hence, the study of grasslands biodiversity is usually performed with hyperspectral data issued from a field spectroradiometer or an airborne sensor (around 1-m spatial resolution) [11]- [14]. Although these works showed good results, they were limited to a very local scale, because of the costs involved by such a mission. When this type of data is not accessible and when a larger extent is required, a tradeoff can be considered by using time series with high temporal resolution. Indeed, species communities differ in their temporal behavior, i.e., their phenology. In addition, new satellite missions for continuous vegetation monitoring, such as Sentinel-2, provide freely multispectral time series with high spatial and high temporal resolutions.
In this context, this study aims at evaluating the potential of multispectral satellite image time series (SITS) to determine grassland plant biodiversity at the grassland scale. Two intra-annual SITS with high spatial and high temporal resolutions are compared.
In this experiment, grasslands are modeled at the object scale to be consistent with ecological studies which usually characterize grasslands at the parcel scale. A non-parametric regression method to predict grasslands biodiversity index is used. The method is experimented to predict the Shannon index and the Simpson index of grasslands in south-west France using a SPOT5 (Take5) and a Sentinel-2 time series. Results are analyzed in terms of prediction accuracy and stability.

A. Study site
The study site is part of a Long-Term Ecological Research site located in Gascony ("Coteaux et Vallées de Gascogne", LTER_EU_FR_003), in south-west France near the city of Toulouse (43 • 17 N, 0 • 54 E). This hilly area of around 900km 2 is characterized by a mosaic of crops, small woods and grasslands. It is dominated by mixed crop-livestock farming. Grasslands provide food for cattle by grazing and/or producing hay or silage. They are mainly located on steep slopes whereas annual crops are in the valleys on the most productive lands. The climate is sub-Atlantic with sub-Mediterranean and mountain influences (mean annual temperature, 12.5 • C; mean annual precipitation, 750 mm) [15].

B. Dataset
The dataset is composed of more than 200 grasslands. A botanical survey was conducted in the Springs 2015 (on 171 grasslands) and 2016 (on 45 grasslands), after the flowering and before the mowing (April-May), to record the botanical composition of all these grasslands. From this floristic record, several biodiversity indices can be computed. They represent the biodiversity in the grassland at the grassland scale, and not at the plot scale. The Shannon index (H) and the Simpson index (D) were chosen because of their wide-spread utilization in ecology: where pi is the proportion of the i th species and R is the total number of species in the grassland (species richness). H values are usually between 0 and 4, and D always ranges between 0 and 1. H increases and D decreases as the diversity increases. The statistics of the dataset for each variable are presented in Table I. Three examples of grasslands temporal profiles along the H and D axis, from very poor to very rich in biodiversity, are shown in Fig. 1.
The grasslands were digitalized in a GIS from aerial photographs (BD Ortho database, IGN). For this study, a negative buffer of 10m was applied to all the grassland polygons to avoid edge effects due to mixed pixels at the edges. Only the grasslands composed of at least 10 pixels of 10-m resolution, i.e. having an area higher than 1000m 2 , were kept, to ensure a minimum number of pixels in a grassland. In the end, there were 192 grasslands.

C. Satellite images
The SPOT5 (Take5) time series was used in this experiment (www.spot-take5.org). 18 images were available over the present study site from April to September 2015 (Fig. 2). SPOT5 has a spatial resolution of 10 meters in four spectral channels (visible to near infrared (NIR)).
The Sentinel-2 [16] time series acquired over the year of 2016 was also used for comparison in this experiment. We used the available images from April to September 2016 (Fig. 3). The four 10-m spectral bands were used as well as the four 20-m spectral bands corresponding to the red edge and NIR bands. The 20-m bands were resampled at 10 meters with a bilinear resampling algorithm, using the gdalwarp function of GDAL (http://www.gdal.org/gdalwarp.html). It resulted in a time series of seven dates with eight spectral bands. Therefore, this time series is characterized by less dates than SPOT5 series but with a higher number of spectral bands.

A. Grassland modeling
In this work, each grassland gi is composed of a given number ni of pixels represented by a spectro-temporal vector x ik ∈ R d , where k is the pixel index such as k ∈ {1, ..., ni}, i ∈ {1, . . . , G}, G is the total number of grasslands, d = nBnT is the number of spectrotemporal variables, nB is the number of spectral bands and nT is the number of temporal acquisitions (here, nB = 4 and nT = 18 for SPOT5, nB = 8 and nT = 7 for Sentinel-2, but nB = 1 if using a single vegetation index). Two grassland representations are proposed, at the pixel and at the object scales.
At the pixel scale, with each grassland gi are associated a matrix Xi = [xi1, . . . , xin i ] of size (ni × d) and a response variable yi ∈ R (its biodiversity index).
At the object scale, the mean spectro-temporal vector µ i of the pixels belonging to gi is used to represent gi. It is estimated empirically byμ i = 1 n i n i k=1 x ik . In this case, a vectorμ i ∈ R d and a response variable yi ∈ R are associated with each grassland.

B. Kernel least mean square regression
In order to predict the response variable (H or D) for each grassland represented by its reflectance, a kernel least mean square (KLMS) regression was used. The KLMS regression [17] consists in solving: with yi is the response variable associated with grassland gi, f is the regression function such as f (gi) =ŷi,ŷi is the predicted variable of gi, f (gi) = G j=1 βjK(gi, gj) + b, K is the kernel function, βj's are the parameters of f , b is the intercept and θ is the regularization hyperparameter.
The solution to this problem is given by: where β is the vector of linear coefficients βj, K is the kernel matrix issued from the kernel function K applied between each pair of grasslands gi, I is the identity matrix, y is the response vector made of yi andȳ is the mean value of yi.
In this experiment, two kernels were tested. For the representation at the object scale, the conventional RBF kernel was used between two grasslands gi and gj modeled by their mean vectors µ i and µ j : with σ > 0. This method is denoted by "µ-KLMS" in the following. The second kernel tested is the empirical mean kernel between two distributions pi and pj [18] for the grassland representation at the pixel scale. It corresponds to the average of all pairwise RBF kernel evaluations over the realizations of the two distributions (i.e., pixels that belong to grasslands gi or gj): where ni and nj are the number of pixels associated with gi and gj respectively and x ik is the k th realization (pixel) of gi. This method is denoted by "EMK-KLMS" in the following. In these two cases, two hyperparameters must be tuned during the regression process: the kernel parameter σ and the regression regularization parameter θ.

C. Regression protocol
The regression was investigated for two reflectance configurations: the Normalized Difference Vegetation Index (NDVI) and the four (SPOT5) or eight (Sentinel-2) concatenated multispectral bands (MS).
For each configuration, the regression process was repeated 10 times through a Monte Carlo procedure. For each repetition, the dataset was split randomly into two subsets: 80% for training and 20% for testing. The optimal hyperparameters (σ and θ) were tuned during a 5-fold cross-validation based on the highest coefficient of determination (r 2 ) to minimize the prediction error: The efficiency of each configuration (i.e., kernel, spectral information and sensor) to predict the responses variables was compared in terms of regression accuracy, stability and processing time.
The kernels and the regression were implemented in Python through the Scikit library (http://scikit-learn.org).

IV. EXPERIMENTAL RESULTS
The mean coefficient of determination and its standard deviation over the 10 repetitions during training and testing phases of the regression model for each method to predict the Shannon index and the Simpson index, respectively, are synthesized in Tables II and III. With SPOT5, results were improved when using the MS data (four bands) instead of the NDVI, regardless of the method. It increased thē r 2 of about up to 0.04. For H prediction, the EMK-KLMS method was better than µ-KLMS in all configurations. The best results were obtained with the EMK-KLMS method with MS data (r 2 = 0.13).  For D prediction, µ-KLMS was always better than EMK-KLMS. The best result was achieved with µ-KLMS on MS data (r 2 = 0.17).
With Sentinel-2, the results were worse than with SPOT5 for H prediction (best score isr 2 = 0.07, µ-KLMS with NDVI). However, the regression was better for D prediction using Sentinel-2 NDVI (bestr 2 = 0.15, µ-KLMS) than SPOT5 NDVI (r 2 = 0.13). Sentinel-2 showed opposite results to SPOT5: results are worse using MS data than using NDVI, regardless of the method and the variable. It seems that accuracies are decreased when using the 20-m bands. Indeed, results were better when using only the four Sentinel-2 10-m bands with µ-KLMS. It could explain why the MS data did not perform better than NDVI.
No conclusion can be drawn about the performance of the µ-KLMS method against EMK-KLMS. However, EMK-KLMS did not provide the most stable results (high standard deviation compared to mean value). It is also more time consuming than the mean modeling.
Globally, the Simpson index D was more accurately predicted than the Shannon index H. Nevertheless, the prediction results for both indices are not satisfying.

V. DISCUSSION AND CONCLUSION
A scatter plot of observed vs. predicted D value for the best run of the µ-KLMS method using SPOT5 MS data is plotted in Fig. 4. There is a lack of variance in the predicted dataset, either for H or D. The model predicts always the same range of values. Extreme, and especially higher D values/lower H values (they correspond to grasslands very poor in biodiversity, almost monospecific) are less represented in the dataset (only three grasslands with D > 0.8), and are less represented in the training dataset. Thus, they are badly learned during the training phase, compared to richer grasslands. The regression quality could be improved if having more grasslands with high D values (low H values), making the dataset more balanced along the biodiversity indices gradients. There might also be more variability inter-grasslands than along the H or D gradients. Grasslands can be managed differently (one or two mowings, grazing, mowing and grazing, or no utilization) with a different use intensity. These disturbances might have a higher impact on the signal than the species composition. This was confirmed by Feilhauer et al. [19] who assessed the floristic composition with simulated Sentinel-2 data from field hyperspectral data. They showed that multiseasonal data decreased the model fit compared to monotemporal data. Therefore, it would be of interest to separate grasslands depending on their management.
Accuracies of prediction were much lower than those found using hyperspectral imagery at the plot scale (r 2 of 0.4 and 0.45 for inverse Simpson's diversity in [20], and up to 0.62 for Shannon index in [11] with spectral variability measure). However, these studies were conducted at the plot scale for the floristic record and the associated spectral information. They used the pixels corresponding only to the sampling unit. Our protocol was different, since the botanical survey was conducted at the grassland scale by a random walk strategy and only one biodiversity index was computed from it. In this case, there is no direct correspondence between a given pixel and the floristic record. Indeed, even if one grassland is a homogeneous unit from an agronomic viewpoint (one field, same practices), because of topography, soil depth, presence of a stream, it can present very different ecological sub-units with different plant compositions.
Despite the unsatisfactory results, the Simpson index (D) was always better predicted than the Shannon index (H). D is a measure of the dominance in a community, while H is more sensible to rare species [21]. Thus, one can suppose that dominance in a plant community is more predictable by satellite remote sensing than the presence of rare species, and dominance-based indices should be favored.
The lack of a balanced dataset along the biodiversity gradients does not allow a formal conclusion on the actual potential of multispectral SITS with a high temporal and a high spatial resolutions to predict the biodiversity indices of grasslands. However, current results suggest that this type of data is not suitable to predict such indices measured at the scale of the grassland. Indeed, the species composition of grasslands does typically fall in the hyperspectral domain. As future prospects, the use of spectral heterogeneity [4] as a proxy for species diversity should be considered [11]. It could be adapted to temporal data.