Phosphorus in agricultural soils: drivers of its distribution at the global scale

Abstract Phosphorus (P) availability in soils limits crop yields in many regions of the World, while excess of soil P triggers aquatic eutrophication in other regions. Numerous processes drive the global spatial distribution of P in agricultural soils, but their relative roles remain unclear. Here, we combined several global data sets describing these drivers with a soil P dynamics model to simulate the distribution of P in agricultural soils and to assess the contributions of the different drivers at the global scale. We analysed both the labile inorganic P (PILAB), a proxy of the pool involved in plant nutrition and the total soil P (PTOT). We found that the soil biogeochemical background corresponding to P inherited from natural soils at the conversion to agriculture (BIOG) and farming practices (FARM) were the main drivers of the spatial variability in cropland soil P content but that their contribution varied between PTOT vs. PILAB. When the spatial variability was computed between grid cells at half‐degree resolution, we found that almost all of the PTOT spatial variability could be explained by BIOG, while BIOG and FARM explained 38% and 63% of PILAB spatial variability, respectively. Our work also showed that the driver contribution was sensitive to the spatial scale characterizing the variability (grid cell vs. continent) and to the region of interest (global vs. tropics for instance). In particular, the heterogeneity of farming practices between continents was large enough to make FARM contribute to the variability in PTOT at that scale. We thus demonstrated how the different drivers were combined to explain the global distribution of agricultural soil P. Our study is also a promising approach to investigate the potential effect of P as a limiting factor for agroecosystems at the global scale. &NA; Numerous processes drive the global spatial distribution of phosphorus (P) in agricultural soils, but their relative roles remain unclear. Thanks to a modelling approach, we found that almost all of the global spatial variability in total soil P in cropland soils (PTOT) could be explained by the distribution of the soil biogeochemical background (that determines the P content of soils at the conversion to agriculture, BIOG), while both BIOG and farming practices (FARM) explained the spatial variability in inorganic labile P (PILAB) (˜40% and ˜60%, respectively). Figure. No caption available.


Introduction
Global nutrient management is a key question for agronomists working on global food security (Mueller et al., 2012;Makowski et al., 2013). This is crucial for phosphorus (P), which limits net primary production, and for which access to its sources is becoming increasingly uncertain due to limited rock phosphate supplies and potential geopolitical issues (Peñuelas et al., 2013). In undisturbed terrestrial ecosystems, soil P is often limiting (Elser et al., 2007), either because the amount of total P is low or because P is in forms that are not available for plants (Walker & Syers, 1976). In agroecosystems, the export of P due to harvesting tends to increase the soil P limitation and to maintain high crop yields, P is often supplied in the form

Accepted Article
This article is protected by copyright. All rights reserved. of chemical fertilizers or animal manure. In the 2000s, chemical P fertilizer application averaged 10kgP/ha/yr at the global scale, but showed a large heterogeneity, with a supply of 25 kgP/ha/yr in Europe vs 3 kgP/ha/yr in Africa (Liu et al., 2008). A deficit in available soil P leads to a loss of attainable yield (Mueller et al., 2012) while an excess of total P in soils could trigger aquatic eutrophication in surrounding water bodies through runoff and water erosion (Schindler et al., 2008). The P content of agricultural soils is thus an index for agricultural soil fertility, as well as an important variable affecting water quality. High-resolution assessment of the current distribution of P in agricultural soils at the global scale is critical for the identification of areas where P is in a state of imbalance and mitigation is needed. However, such an assessment has not been possible up to now due to limited data availability and poor understanding of soil P dynamics at large scales.
The P-content of soils in currently unmanaged ecosystems can be considered to be the result of transformations that occur on a geological time-scale. While P in unmanaged soils can be predicted from local soil properties (Yang et al., 2013), this approach cannot be used for agricultural systems because humans have altered the P cycle in these ecosystems (Elser & Bennett, 2011) and changed the spatial distribution of P in soils. In fact, numerous long-term, interacting and highly spatially variable processes drive the P content in agricultural soils across the world. For instance, farming practices (e.g. the amount of fertilizer applied) have the potential to re-shape both total and available soil P (Sattari et al., 2012). Agricultural soils are subjected to erosion rates, which are typically one order of magnitude higher than those occurring in unmanaged soils, potentially leading to high P losses (Quinton et al., 2010). Atmospheric P deposition has increased with anthropogenic emissions and is likely to play an increasing role in soil P budgets (Wang et al., 2014). Internal soil dynamics processes, which are known to have small effects on total soil P, have the potential to modify the redistribution

Accepted Article
This article is protected by copyright. All rights reserved. of P within the different soil pools and thus, to modulate plant available P. Such soil dynamics depend on the soil buffering capacity (Vitousek et al., 2010) and soil meteorological conditions (Buendía et al., 2010), both varying spatially. Finally, land use and land cover change tend to modify the P content of agroecosystems due to the conversion of natural lands with their original soil P contents, into cultivated areas (Mackenzie et al., 2002;MacDonald et al., 2012). While in a given context some drivers could a priori be neglected as compared to others (e.g. local ratio of deposition to chemical fertilizer equal to 0.03 in countries with intensive agriculture), this could vary from one location to the other. In addition, many of these drivers have varied over time (Ringeval et al., 2014), thereby making their relative contribution to the current global distribution of P in agricultural soils extremely difficult to decipher.
Several studies have attempted to model part of the P cycle at the global scale. Some of them focused on a given driver, taken separately (e.g. (MacDonald et al., 2011) focused on farming practices) and showed that the different drivers are characterized by a large spatial heterogeneity across the world. However, no study has provided an integrated assessment of the combination of these drivers. Here, we built a simple, process-based soil P dynamics model that enabled the combination of various recent spatially explicit global datasets to reconstruct the temporal evolution of P in agricultural soils during the 20 th century. These global datasets enabled us to represent the drivers that are known to play a role in the current global distribution of P in agricultural soils. Our analyses focused on the distribution of total soil P and more labile P involved in plant nutrition. We first calibrated the soil P dynamics model against observations on sites. Simulated global maps of agricultural soil P were then analysed, particular attention being paid to uncertainty in the simulated soil P resulting from uncertainty in the drivers. Finally, as the core of our study, a sensitivity analysis based on a full factorial design where each driver was either variable or constant in space, was performed to estimate the contribution of each driver to the current distribution of agricultural soil P.

Accepted Article
This article is protected by copyright. All rights reserved.

Modelling framework
The system that was modelled comprised P in the top 0-0.3m horizon of agricultural soils. In agricultural soils, P availability and root uptake decline substantially below the plough layer (Lynch & Brown, 2001) and long term fertilization trials show that contrasted fertilization regimes mainly affect the P availability in the plough layer with very little effect in deeper layers (Messiga et al., 2012;Li et al., 2017). Thus, the system modelled should ideally be the plough layer. However, the thickness of the plough layer shows strong variations in time and space as a function of the farming practices (Brandsaeter et al., 2011;Gronle et al., 2015;Alcántara et al., 2016) and no spatially explicit dataset providing plough depth was available at the global scale. We thus chose to represent the soil horizon corresponding to a fixed thickness of 0.3m which, at the global scale, is considered as encompassing a major proportion of crop roots (Jackson et al., 1996). The sensitivity of our results to the thickness of the soil horizon modelled is assessed and discussed in the following sections. No vertical discretisation was considered. Our modelling framework worked at half-degree fractional resolution (0.5° latitude x 0.5° longitude). Cropland and pasture soils were distinguished in each gridcell. Soil P pools were expressed in kgP/ha while fluxes were given in kgP/ha/yr. The modelling framework, resulting from the combination of a soil P dynamics model and global datasets, was run at 1-year time intervals.
The spatially explicit global datasets used in this study enabled us to represent the drivers that are known to play a role in shaping the P in agricultural soils over the last centuries and explain the current global distribution: the soil biogeochemical background corresponding to P inherited from natural soils at the conversion to agriculture (BIOG), soil P input/output corresponding to farming practices (FARM), land use and land cover change (LUCC), soil water

Accepted Article
This article is protected by copyright. All rights reserved. content and temperature affecting weathering and mineralization of organic matter (CLIM), losses of P through soil erosion (LOSS), atmospheric P deposition (DEPO) and soil buffering capacity (BUFF). The different drivers and their representations are described in Table 1. The datasets resulted from the combination of different measurements (satellites, on site measurements, etc.), and/or model simulations. All datasets were re-gridded to a half-degree resolution. More details about each dataset are provided in the Supporting Information.
While we focused on soil P of cropland and pasture, four land fractions were considered in each grid-cell to represent the land use and land cover change. These land fractions were: cropland (C), pasture (P), non-agricultural vegetated areas (NA) and urban areas (U). NA actually included both undisturbed and secondary vegetation re-growing in its natural state. For a given grid-cell, at the beginning of each time-step y (y represents a specific year in the model), the effect of land use and land cover change on the evolution of each soil P pool was represented by equation (1): where P x represented a given soil P pool (apatite, occluded, etc.; see equation (3)) of the land fraction x (in kgP/ha); f x was the grid-cell fraction covered by x (no unit) andΔ was the land fraction covered by x1 converted into x2 from y-1 to y. f C , f P and the different landconversions (Δ) were provided by the Land Use Harmonization dataset (LUHa.v1) (Hurtt et al., 2011) and correspond to the LUCC driver. Note that transition from urban areas to cropland/pasture (Δ and Δ ) were considered as null. P NA is the P inherited from natural soils at the conversion to agriculture and correspond to the driver BIOG. P NA was also used as

Accepted Article
This article is protected by copyright. All rights reserved.
initial conditions for agricultural soils (i.e. in 1700, see below). For each grid-cell and at any moment, P NA was set to the current P in unmanaged soil estimated by (Yang et al., 2013), i.e.: for any y, P NA (y) = P Yang (2) Through this equation, we assumed that the P inherited from natural soils at the conversion to agriculture could be represented by prescribing the current P in unmanaged soils to all soils converted to agricultural soils over the last 300 years. This was a key-assumption of our approach. This could be limiting, in particular in regions where shifting cultivation occurs, leading to modify the P content of soils covered by natural vegetation. We also neglected soil P input corresponding to forest biomass left on soil at the time of conversion (Beck & Sanchez, 1996).
In each time-step, once equation (1) had been applied, P C and P G evolved for all soil pools according to other following drivers: soil input/output resulting from farming practices (FARM), P losses due to soil erosion (LOSS), and atmospheric P deposition (DEPO). In addition, the soil P dynamics model allowed us to redistribute P within the different soil P pools according to soil buffering capacity (BUFF) and soil conditions such as soil temperature and humidity (CLIM) (Figure 1).

Soil P dynamics model
The design of the soil P dynamics model (i.e. which pools are represented, Figure 1) was chosen to be consistent with the Hedley fractionation method (Hedley & Stewart, 1982;Tiessen et al., 1984). The Hedley method fractionates soil P into different forms of P that are removed sequentially with successively stronger extraction agents. While relating the fractions re-

Accepted Article
This article is protected by copyright. All rights reserved. moved by chemical products to soil mechanisms is prone to difficulties (Tiessen et al., 1984), these fractions are usually merged into various inorganic and organic pools and classified according to their plant availability, which gives a comprehensive picture of the different forms of P in soils (Cross & Schlesinger, 1995) (Table S1). In addition, to our knowledge, Yang et al. (2013) is the only published database providing information about the global distribution of P in unmanaged soils and it was based on the concepts developed in the Hedley method.
Total P (P TOT ) was defined as the sum of all represented soil P pools: where P OCC , P APA , P SEC , P ILAB , P OSTA , P OLAB represented occluded P, P in apatite, P bound on secondary minerals, labile inorganic P, stable organic P and labile organic P, respectively. Our analysis focused on two variables: total soil P (P TOT ) and inorganic labile P (P ILAB ). We chose P TOT because it represents the total amount of P and is therefore an integrative variable for several processes, whereas P ILAB is used as a proxy for soil P available to plants on a short time-scale (Cross & Schlesinger, 1995).
As a first approach, we chose to represent as few soil P dynamics processes as possible. Processes like root-induced processes to overcome P limitation or P immobilization/release by microbial biomass were not taken into account in our model in particular because their representation at the global scale remains uncertain (see Discussion). The flux parameterizations used in our study were based on studies performed with Dynamic Global Vegetation Models (DGVMs) (Wang et al., 2010;Goll et al., 2012) because DGVMs have been used at the global scale and are process-based, which were two aims of our approach. Parameterizations are described in details below where defines the P flux from pool X to pool Y.

Accepted Article
This article is protected by copyright. All rights reserved.
Weathering, occlusion and mineralization are described thanks to a first order kinetic. Apatite weathering is represented by: where g 1 and g 2 (unitless) described the sensitivity to soil temperature (T, in °C) and relative soil water content (W, unitless), respectively. Before calibration, k W = 0.0001 yr -1 following Buendía et al. (2010). k w is the base rate at a temperature of ~15°C (Table 3 of Buendía et al. (2010)) thus this value is used as reference temperature in g 1 (T  (Hartmann et al., 2014)). We used a Q 10 formulation to be consistent with equation (6). A Q 10 of 2.4 gave the same analytical result as the Arrhenius equation described in Hartmann et al. (2014) for common temperature ranges and was thus chosen. Following Buendía et al. (2010), g 2 (W) = W.
The occlusion is parameterized as follows: = . (5) The pre-calibrated k occ is 1.2e -5 yr -1 (Yang et al., 2014). In this first attempt, we did not consider de-occlusion processes. Thus, once a molecule of P belongs to the P OCC pool, it cannot be involved in the plant nutrition any more.
The mineralization of organic matter is described thanks to: where h 1 and h 2 represents the sensitivity to soil temperature and soil water content and k m1 , k m2 are turnover rates for stable and labile pools (in yr -1 ). These pools are considered to be biologically available over different time scales ("short", "intermediate") (Cross & Schlesinger, 1995) but the literature about the Hedley method did not provide any accurate estimates of these time scales. Models of soil carbon usually consider that only few percent of

Accepted Article
This article is protected by copyright. All rights reserved.
the recalcitrant pool could be mineralized each year. On the other hand, the comparison between labile organic P measured using a Hedley fractionation method (fraction NaHCO 3 in (Oberson et al., 1993)) and a kinetic of mineralization (Oehl et al., 2004) on the same site suggests that the labile organic pool is mineralized very quickly. Following this information, we prescribed k m1 =0.02yr -1 and k m2 =0.60yr -1 as default values before the calibration step.
Following (Wang et al., 2010;Goll et al., 2012), we used a Langmuir equation to describe the equilibrium between labile inorganic P and P bound on secondary minerals equilibrium: where S max is the maximum capacity of secondary minerals to bind P (in kgP/ha) and K s is a coefficient (in ha/kgP). The suitability of this equation to describe the equilibrium between P ILAB and P SEC is discussed in the last section. (K s ,S max ) vary as function of soil order following (Wang et al., 2010) (Table 1 and Table S2).

Temporal variations of the drivers and initial conditions
Our analyses mainly focused on the soil P simulated for farming practices (Bouwman et al., 2011) are available (from 1500 and 1900 respectively).
We prescribed the values provided by (Yang et al., 2013) for unmanaged soils (P Yang ) to agri-

Accepted Article
This article is protected by copyright. All rights reserved.
cultural soil P pools in 1700 (hereafter referred to as the initial conditions), i.e.: P C (1700) = P P (1700) = P NA (1700) = P Yang (8) where P represented a given soil P pool (apatite, occluded, etc.; see equation (3)). We assumed that farming practices before 1700 led to negligible modifications of the soil P pools of agricultural soil in 2005 since most P was supposed to be recycled locally; the effect of this assumption on the results was tested by varying the starting year (Supporting Information).

Calibration of the soil P dynamics model
The calibration focused on base rates used in equations 4-6: k w , k occ , k m1 ,k m2 . The parameters used in equation 7 correspond to the driver BUFF and are considered in next sections. To calibrate our soil P dynamics model, we compiled Hedley measurements recorded in long-term experiments on cropland/pasture sites from the literature. We found 11 references providing local and observed variables to force our model instead of the ones provided by global datasets. This allowed us to calibrate the intrinsic abilities of our model to reproduce the observed temporal soil P dynamics. Simulations performed in that case are called "simulations on sites" hereafter. The 11 references found correspond to 12 locations and 49 "treatments" including variation in fertilizers applied (frequency, amount and form), crop rotation and soil type (Table S3). The 12 locations encompass a relative diversity in latitude ( Figure S2). The different treatments in each location were considered independently. Few treatments on the same site (e.g. different crop rotations in Ref 9 in Table S3 while crop rotation is not represented in the model) could be considered as pseudo-replicates, but removing them did not change the calibration or final match between observations and the model (not shown).
Simulations were performed on the period of available measurements. Measurements of ei-

Accepted Article
This article is protected by copyright. All rights reserved.
ther soil P in uncultivated sites at the end of the experiments or soil P under fertilization treatments at the beginning of the experiments have been used as initial conditions (BIOG).
Observed farming practices (fertilizer rate application, etc.) were used as input into our model to prescribe FARM. In simulations on sites, soil erosion, CLIM, DEPO were provided by global datasets while no change in land use and land cover change (LUCC) was prescribed.
The observed USDA soil order was used to define the parameters in the Langmuir equation (BUFF). For simulations on sites, the model was run on the soil horizon characterized by a bottom depth closest to -0.3m (i.e. 0.3m below the soil surface) for which measurements were available and which encompasses the plough layer. For a few sites, we have no information about the plough depth and measurements were available only for a very thin soil layer (e.g. 0-0.1m; Table S3). The bulk density required to translate observations from mgP/kg to kgP/ha and to compute losses were derived from observations, either directly or indirectly (i.e. as a function of the available information, by using carbon content , soil organic matter or the bulk density measured at different depths or in different treatments for the same location).
For each treatment, we computed the Root Mean Square Error (RMSE) between observed and modelled soil P pools for the last year of available observations. Then, we summed the RMSE of all treatments, expressed it in percent of averaged observations and used this value as criteria for calibration. The calibration was made in two stages: first, we focused on each parameter independently, taken alone and searched for a value that minimized the criteria for one given soil P pool. For each parameter, the pool chosen is the pool whose size is sensitive (almost) only to that parameter: P APA for k w , P OCC for k occ , P OSTA for k m1 , P OLAB for k m2 . In a second stage, we made all parameters vary together and searched for parameter values minimizing the criteria for P ILAB . If necessary, a compromise between the calibrated values of these two stages was made.

Accepted Article
This article is protected by copyright. All rights reserved.

Estimation of the contribution of each driver
The current spatial variability of both P TOT and P ILAB was quantified thanks to the variance (named VAR hereafter) of both variables simulated for 2005. The variance was computed at two spatial scales: either at the "grid-cell scale" (i.e. using all grid-cells at half degree resolution with a non-zero cropland (or pasture) fraction in our simulation for 2005) or at "continental scale" (i.e. using 7 large World regions, namely: North America, Central and South America, Africa, Oceania, Western Europe, Asia, Russia; Figure S2). Note that VAR was weighted by the cropland/pasture area of each grid-cell in the computation at grid-cell scale.
Cropland/pasture area of each grid-cell was also used to compute averaged P TOT and P ILAB for each World region.
Each driver encompassed several variables and drivers were potentially correlated spatially, which made the method of estimating their contributions complicated. To measure the contribution of each driver to the current spatial variability of P TOT and P ILAB , we assessed the effect of removing the spatial variability of a given driver on the simulated VAR of P TOT and P ILAB . A full factorial design was generated with six factors at two levels, resulting in n runs =2 6 simulation runs. The six factors (called F hereafter) corresponded to all drivers except land use and land cover change (i.e. BIOG, LOSS, BUFF, CLIM, FARM, DEPO). For each factor F, level (F=+1) meant that the spatial variability of F was taken into account in the simulation run, while level (F=-1) meant that its spatial variability was suppressed by prescribing a constant value to the cropland (or pasture) fractions of all grid-cells. As it was not possible to remove the spatial variability of the land use and land cover change artificially (i.e. LUCC=-1), we restricted the factors of the full factorial design to all drivers except LUCC. Indeed, using a constant value for the cropland/pasture fractions of all grid-cells at a given time-step did not allow us to maintain a temporal consistency between fractions and fraction conversions.
A table of the full factorial design is provided in Table S4. The first-order effect of factor F on

Accepted Article
This article is protected by copyright. All rights reserved.
VAR was estimated by (Kobilinsky & Monod, 1991): where denotes the value +1/-1 of F at run j and denotes the VAR at run j. the thickness of the soil horizon modelled) or analysis setting (e.g. the value prescribed to all grid-cells to set a factor to -1) was evaluated (Supporting Information).

Uncertainty in the representation of the drivers
At the global scale, each driver is prone to uncertainty and we assessed how this uncertainty propagated to the simulation of P ILAB and P TOT and to the contribution of the drivers to their spatial variability. First, we estimated a range of uncertainty for each driver defined by bottom and top boundaries (estimates 1 and 2 in Table S5) following available information. No information about the uncertainty was available for FARM and BUFF, and we arbitrarily assumed that this uncertainty reached ±30%. Then, the full factorial design described in the previous section was replicated 30 times. For a given replication and a given run within a replication, the value of each driver was chosen randomly within the range between the two estimates by assuming a uniform distribution (i.e. assuming that all values between estimates 1 and 2 were equally likely). This was done independently for each grid-cell.
Each replication of the full factorial design was used to compute the contribution of the different drivers to the spatial variability in P TOT and P ILAB (see equation 9). The 30 replications

Accepted Article
This article is protected by copyright. All rights reserved.
of the full factorial design were used to compute an average and standard deviation of the contributions. This standard deviation was used as an estimate of the uncertainty in the contribution.
How the uncertainty in the representation of the drivers affected our confidence in the simulated P TOT and P ILAB was assessed as follows. First, from the full factorial design we selected the 30 simulations where all drivers=+1. Then, we computed for each grid-cell, a coefficient of variation of these 30 simulations. Finally, a global average of these coefficients of variation (the average was weighted by the cropland/pasture area of each grid-cell) was computed and used as a global indicator of uncertainty in P TOT and P ILAB . The contribution of the uncertainty of a given driver (e.g. BIOG) on the uncertainty in simulated P TOT and P ILAB was estimated by using the same estimate for that driver in the 30 simulations and by measuring the resulting decrease in the global indicator of uncertainty.

Calibration of the soil P dynamics model
Varying k w and k occ did not improve the agreement between model and observations for any soil P pools (P APA , P OCC or P ILAB ) ( Figures S3 and S4). This could be explained by the fact that observations made on relatively few years lead to low constraints on pools characterized by long time-scales. Pre-optimized values were kept for these parameters (k W = 1.e -4 yr -1 ; k occ = 1.2e -5 yr -1 ). Organic pools and P ILAB were more sensitive to k m1 and to a larger extent to k m2 . Calibrated k m1 and k m2 were 1e -2 yr -1 and 1e -1 yr -1 , respectively. These values correspond to turnover times of 100 and 10 yr for P OSTA and P OLAB , respectively. An observation vs. simulation comparison of the change in P TOT and P ILAB between final and first years of observations for all treatments (n=49) after calibration is displayed on Figure 2. A comparison detailed for each soil P pool and each treatment is plotted in Figure S5. We found a fairly good

Accepted Article
This article is protected by copyright. All rights reserved. agreement between the simulated and observed temporal evolution of P TOT and P ILAB , despite some discrepancies at a few sites. Different reasons could be put forward to explain such mismatches, such as: an inconsistency between observed soil horizon thickness and the processes represented in our model, or missing mechanisms in our model (such as fallow periods or plant processes to overcome limitations) (Table S3). However, we considered the agreement to be reasonable enough to use the model as a tool to assess the drivers of the current global P distribution.

Distribution of P TOT and P ILAB at the global scale and associated uncertainty
In this section, we focused primarily on cropland (see Figure S7 for pasture). Global averages of P TOT and P ILAB for cropland amounted to 2211±13 and 246±1 kgP/ha for the top 0-0.3m soil, respectively (average ± std over the simulations performed for the uncertainty analysis, values weighted by the cropland area). Using an averaged global bulk density of 1.3g.cm -3 , these numbers correspond to concentration of 567 and 63 mgP per kg of soil for P TOT and P ILAB , respectively. Highest P TOT was found in northern America and western India, while lowest P TOT was found in Argentina (Figure 3a), where simulated soil P content cannot sustain the prescribed uptake ( Figure S8). The distribution of P TOT was heterogeneous over the surface of the world with a close proximity of areas with the highest and lowest P TOTS . Overall, the areas of highest (or lowest) P TOT corresponded to areas of highest (or lowest) P ILAB (Figure 3b), but some discrepancies existed between the distribution of the two variables.
This was visible using the percentile distribution of both variables ( Figure S9), and the P ILAB /P TOT ratio (Figure 3c). The global average of the P ILAB /P TOT ratio amounted to 9±5% but reached values close to 0 in northern America and 30% in north India. The coefficient of correlation between both variables computed using all grid-cells was 0.73 for cropland, which confirmed that P ILAB could not simply be approached by a constant fraction of P TOT .

Accepted Article
This article is protected by copyright. All rights reserved.
As each driver was prone to strong uncertainty, we investigated how this uncertainty propagated to simulated P TOT and P ILAB . We found a 47 and 61% global uncertainty for cropland P TOT and P ILAB , respectively (Figure 3; right column). The uncertainty in the simulation of P TOT was high in North America and western Russia (coefficient of variation of the simulations performed with drivers varying within their range of uncertainty ~60%) and low in China and Europe (~20%) (Figure 3d). A similar spatial pattern was found for P ILAB despite higher uncertainty in North America, Africa and Argentina (Figure 3e). We found that the uncertainty in BIOG played a key role in the uncertainty in simulated P TOT and P ILAB , explaining more than half of the uncertainty in P ILAB and almost all the uncertainty in P TOT (Table S6).
We found that the uncertainty in FARM had a small effect on the uncertainty in P TOT and P ILAB but this could be related to a potential underestimation of the uncertainty of FARM.
Nevertheless, at the current stage, our results stress the need for more accurate estimates of P in natural soils at the global scale. The increasing resolution of soil datasets (e.g. GlobalSoilMap (Mulder et al., 2016), SoilGrid1km (Hengl et al., 2014)) could help to increase our confidence in simulated soil P maps. Despite the large uncertainty in the distribution of P TOT and P ILAB , we were able to robustly decipher the relative contribution of the drivers of global soil P distribution (see following and small error-bars in Figure 4).

Contributions of 'natural' and 'anthropogenic' drivers
In this section, we focused on cropland and only the last paragraph is related to pasture. We found that the simulated spatial variability of P ILAB was larger than that of P TOT : for instance, the coefficient of variation of all cropland grid-cells computed when all drivers varied spatially was 1.67 for P ILAB vs 0.81 for P TOT . Thanks to the full factorial design performed, we were able to estimate the relative contributions of the different drivers to the spatial variability computed at either grid-cell or continental scale. Positive contributions were interpreted as

Accepted Article
This article is protected by copyright. All rights reserved.
leading to an increase in the spatial variability of soil P (Supporting Information). Each driver made a first-order contribution to the spatial variability of soil P (left bar of each panel in Figure 4). At the grid-cell scale (first line of Figure 4), we found that BIOG alone explained almost all of the variability of P TOT (with a first order contribution of 103%) while FARM made a preponderant contribution as compared to BIOG for P ILAB (first-order contribution of 63 and 38% for FARM and BIOG, respectively). Thus, BIOG explained the differences between P ILAB and P TOT maps (Figure 3). Other drivers made only minor first-order contributions: LOSS for P TOT (-8%) and LOSS (-7%), BUFF (4%) and CLIM (3%) for P ILAB . All other drivers (including LUCC) made no significant first order contribution at the grid-cell scale.
The behaviour of P TOT was explained by P APA , P OCC and P OSTA : the sum of these pools corresponded to 74% of P TOT and their spatial variability was totally driven by BIOG (not shown).
At continental scales (second line of Figure 4), we found that BIOG and FARM were the main drivers for both P TOT and P ILAB . We found that FARM made a larger contribution to the spatial variability of P TOT than at the grid-cell scale (first-order contribution of 33%). Many drivers were involved at the continental scale for P ILAB . In particular, BUFF, CLIM and LUCC had a significant role. When shifting from grid-cells to continents, the larger contribution of FARM could be explained by a lower decrease in the variability of FARM than that of BIOG (Table S7). We also found that the larger contribution of FARM to P TOT variability when shift to continental scale was mainly explained by soil organic P pools (P OLAB +P OSTA ) (not shown), underlying the role of farming practices affecting these pools (application of manure, residues).
Drivers could contribute to the spatial variability of soil P, not only through first-order effects but, also as a result of interaction effects (middle bar of each panel in Figure 4). Overall, the interactions between two drivers made only small contributions to the spatial variability of soil P at grid-cell scale. The largest interaction was found between BIOG and LOSS, contrib-

Accepted Article
This article is protected by copyright. All rights reserved.
uting for -9% and -5% to the variability of P TOT and P ILAB , respectively. Such negative contributions could be interpreted as LOSS tending to decrease the soil P where BIOG tends to increase it, and vice-versa. At the continental scale, a positive BIOGxFARM interaction contributing for 11% and 6% to the variability of P TOT and P ILAB , respectively, was found.
In addition to the variability computed by using all cropland grid-cells around the World, we computed the driver contributions (at the grid-cell scale) in different latitude bands ( Figure   5). We found that the spatial variability of P TOT did not change as function of the latitude band contrary to P ILAB whose the variability within the tropical band was lower than the one within the >30°N band (not shown). We also found higher CLIM, BUFF and DEPO contributions and a lower FARM contribution in the tropics than in >30°N or at the global scale. Later access to chemical fertilizer in South America than in Europe or North America ( Figure 6) and specificities of tropical soils (e.g. richness in iron oxides) could explain the lower FARM and larger BUFF contributions found, respectively.
Finally, we found that P TOT and P ILAB had increased in cropland during the 20 th century (+14 and +61 %, respectively) due to past variations of some drivers and to soil P dynamics. The increase in P TOT was mainly explained by LUCC in 1900-1950, but interestingly, was then driven by FARM, while FARM explained the increase in P ILAB almost totally, confirming the key role of fertilization practices in driving soil P availability at the global scale (Supporting Information).
For pasture (Figure 4e-h), we found that BIOG was the primary driver of soil P content. At the grid-cell scale, the first-order contribution of BIOG reached 104% for P TOT and 101% for P ILAB ; the latter variable being also moderately driven by LOSS and BUFF. Interestingly, we found that the contribution of the FARM driver to the spatial variability of soil P was much lower in pasture than in cropland which suggests a much lower anthropogenic perturbation of

Accepted Article
This article is protected by copyright. All rights reserved.
these ecosystems from a biogeochemical point of view. However, FARM made a large contribution to P TOT if focused on the continental scale.
Our results showed little sensitivity to the model assumptions and the analysis settings that were tested, except for the thickness of the soil horizon considered ( Figure S11). Reducing the thickness of the soil horizon that was modelled (from 0-0.3m to 0-0.2m) led to significantly increases in the contribution of FARM (e.g. from ~70 to 90% for the contribution to the variability of P ILAB at the grid-cell scale) to the detriment of BIOG.

Main drivers and interactions
Previous studies showed that the possible different drivers of soil P content have strong spatial heterogeneity at the global scale. However, all these studies focused only on a single driver, thus providing a partial picture of the soil P content and preventing any analysis of the relative contribution of the different potential drivers. Our study is the first to show how the different drivers contribute interactively to the soil P content at the global scale. The fact that BIOG and FARM appeared as the main drivers of soil P content in croplands is a critical result for further modelling efforts, and increases our understanding of the distribution of P in agricultural soils at the global scale. Even if the importance of BIOG and FARM was intuitive in a given local context, our study has demonstrated that this also applies at the global scale.
BIOG plays a key role for both P TOT and P ILAB whatever the agroecosystems (cropland or pasture) and the spatial scale of the variability (grid-cell or continent) considered. The role of FARM is different, making a major contribution to the variability of P ILAB but having a minor role in the variability of P TOT . The contribution of FARM also varies between cropland vs

Accepted Article
This article is protected by copyright. All rights reserved.
pasture, as a function of the latitude and as a function of the spatial scale of the variability.
Total soil P is partly driven by pools with long residence times (stable organic, apatite and occluded) which show small sensitivity to human perturbation. This explains the smaller sensitivity of P TOT to FARM when considering all grid-cells. However, the heterogeneity between farming practices affecting soil organic pools (i.e. manure application, residues) found between continents is large enough to make FARM a major driver of the P TOT variability at this scale. The greater contribution of FARM to labile P in croplands than in pastures confirmed that differences in fertilization strategies on croplands vs. pastures had strong consequences on soil P spatial distribution (Sattari et al., 2016).
Overall, our results underline the complex spatial patterns of the drivers and ultimately of the soil P. The lack of negative interaction between BIOG and FARM suggested that fertilization practices did not tend to reduce the spatial variability in soil P content due to the biogeochemical background. At the continental scale, we even found a positive interaction. The easier access to the P resources in rich countries, independently from the natural soil P content, could be invoked to explain this decoupling between BIOG and FARM at the global scale.
Finally, the fact that FARM and LUCC were identified as key drivers of the temporal evolution of P in croplands demonstrated the long-term legacy effect of human activities on the current soil P, as previously found at the country scale (Ringeval et al., 2014).

Limitations of the model framework
Our modelling framework has some limitations, which have the potential to modify our results slightly. The main limitations concerns are i) the lack of representation of some soil P pools or dynamics processes, ii) a relatively arbitrary definition of the drivers represented in our study and iii) the sensitivity of our results to the thickness of the modelled soil horizon.

Accepted Article
This article is protected by copyright. All rights reserved.
First, some pools are not explicitly represented (the pool of orthophosphate ions) or not represented at all (the P in microbial biomass) in our soil P dynamic model. Following the approach commonly used at the global scale (Wang et al., 2010;Goll et al., 2012 et al., 2014)) because soluble P is not commonly measured by the Hedley method. While this not straightforward, we think that such representation would be more process-based. It could lead to a higher contribution of BUFF to the global distribution of available P than the one found here. The P in microbial biomass that may function as a buffer by immobilizing orthophosphate ions from soil solution and potentially preventing them from bonding to oxides of iron, aluminum and manganese (Achat et al., 2010) is missing in our approach. Representing microbes in global models started only recently and many uncertainties remain (Wieder et al., 2015). Processes used by plants to overcome limitation (such as root exudates and their potential effects on mycorrhizae, de-occlusion or weathering; (Buendía et al., 2014)) were not represented in our model. A representation of root exudates would require a coupling with the carbon cycle (Buendía et al., 2014) which is challenging (Reed et al., 2015) and beyond the scope of our paper. In our approach, grid-cells where our simulations predicted an important P limitation (i.e. where simulated P ILAB +P SEC was smaller than the plant uptake prescribed by FARM dataset) were few ( Figure S8) and thus, implementing processes to overcome limitation would probably not lead to a serious modification of the contribution of drivers found in our study.

Accepted Article
This article is protected by copyright. All rights reserved.
Second, how the drivers were distinguished in our study is somewhat arbitrary and was constrained by the availability of global datasets. The availability and quality of databases are often limiting for studies performed at the global scale (Makowski et al., 2013). Despite this constraint, we considered that the drivers represented enabled us to take into account all processes known to play a role in shaping the P in agricultural soils over the last centuries and explaining the current global distribution. For instance, we recognize that the soil biogeochemical background corresponding to P inherited from natural soils at the conversion to agriculture (BIOG) results from a huge diversity of processes that are considered in other drivers, such as P losses (LOSS), soil buffering capacity (BUFF) or sensitivity of some soil processes to soil water and temperature (CLIM). However, because temporal scales characterizing BIOG and other drivers are different, it makes sense to distinguish them.
Finally, the contributions of drivers found in our study were sensitive to the thickness of the soil horizon that was modelled. The 0-0.3m soil horizon was chosen because it encompasses a major proportion of roots. A better representation would consist of representing the plough layer in a spatially explicit way. However, no information was currently available at the global scale and our work highlighted the need for such a dataset in the modelling of nutrients in agricultural soils.

A perspective for assessing P limitation on global yields
Whereas previous studies (Elser & Bennett, 2011) have only described major changes in global biogeochemical cycles arising from agricultural practices, we are able to provide actual estimates of these perturbations by quantifying and mapping soil nutrient status at the global scale. Our modelling framework is a critical step towards the assessment of the P limitation in agricultural ecosystems and for global food production. A combination of our global estimates of soil P status with global crop models (Elliott et al., 2015;Reed et al., 2015) Accepted Article This article is protected by copyright. All rights reserved.
would permit a more accurate exploration of the consequences of farming practices on the carbon cycle in agroecosystems. Such combinations would require different modelling developments. First, the P uptake by plants should be represented in a mechanistic way (while prescribed from global dataset in our current study). Then, the effects of P limitation on crop development and yield have to be incorporated. Finally, the confidence in simulated maps of P ILAB has to be increased by reducing the uncertainty of some drivers (in particular in BIOG). While such uncertainty has no effect on the contributions found in our study, this limits the future use of simulated maps of P TOT and P ILAB . Once these developments achieved, our approach could be combined with global crop models. Such coupled model could be used under different farming scenarios to investigate, besides consideration of local nuances of farm operations (Sharpley et al., 2016), how policies could be oriented towards more sustainable P resource management and yield gap closure.

Accepted Article
This article is protected by copyright. All rights reserved. uptake and residue for each grid-cell based on variables provided in Bouwman et al., 2011", "Computation of some parameters involved in the computation of uptake and residue: r U/H and r R for cropland, r U/W for pasture". Figure 4: Contributions of the different drivers to the current spatial variability of agricultural soil P. The contributions are given for both P TOT (left panels) and P ILAB (right panels) for cropland (top) and pasture (bottom). Variability was computed either at grid-cell (1 st and 3 rd lines) or continental scale (2 nd and 4 th lines). For each panel, the first-order effect (left bar) and the effect of interaction between two factors (middle bar) were computed thanks to equation (4) for BIOG, LOSS, BUFF, CLIM, FARM and DEPO. The contribution of LUCC (right bar) was estimated from the remaining spatial variability of P TOT and P ILAB when the 6 drivers above were set to -1 (Supporting Information). Errorbars represent 1 std of the 30 repetitions of the full factorial design used to take the uncertainty in the representation of the drivers into account.

Table and Figures captions
Figure 3: Simulated agricultural soil P for cropland. Simulated P TOT (a-d), P ILAB (b-e) and P ILAB /P TOT (c-f) for cropland: mean (left panels) and coefficient of variations (CV, right panels) computed using the 30 simulations performed to take into account the uncertainty in the global datasets used. An irregular colour pallet corresponding to 0 and the 20,40,60,80,99 th percentiles was chosen for panels a and b.
Figure 2: Change in soil P between the first and the last dates of observations on sites: comparison between observations (y-axis) and simulations (x-axis). Both changes in P TOT (triangle) and P ILAB (circles) are given and each symbol corresponds to one treatment. The right panel displays a zoom of the area of the left panel defined with [-600,2000] boundaries. The colour pallet corresponds to the number of the reference reporting the observations (Table S3). Linear regressions provided in the right panel were computed excluding one treatment of Reference 9. Influence plot of the regression for P ILAB is given in Figure S6. Simulations were performed with calibrated k m1 and k m2 .
Figure 1: Modelling framework: the soil P dynamics model and drivers of the spatial distribution of P in agricultural soils. The model allowed us to represent the evolution in time of different soil P pools (inorganic -blue -and organic -orange -pools) as function of simulated soil P dynamics fluxes (green arrows) and soil P input/output (black arrows). The name of the drivers and place where they are involved in the modelling framework are given in magenta (boxes and arrows, respectively). The effects of BIOG and LUCC do not appear on the figure and are described thanks to equations 1, 2 and 8. Table 1: Drivers: name, description of its effect on soil P and representation in our approach. Variables given in the 3 rd column correspond to variables whose spatial variability was removed in the full factorial design to compute the driver contribution to the spatial variability of P TOT and P ILAB . More information about the different datasets (last column) is provided in Supporting Information.

Accepted Article
This article is protected by copyright. All rights reserved.
Driver name Description of the effect on soil P

Corresponding variables
Datasets used to estimate the corresponding variables and their variation in space and time BIOG (natural soil biogeochemical background) P inherited from natural soils at the time of conversion to agriculture. P in natural soils was also used to approach P in agricultural soils at the beginning of the simulation (in 1700, initial conditions). Soil P input/output corresponding to farming practices P in: chemical fertilizer, manure applied on agricultural soils, residues, and plant uptake. Here, 'residues' refer to plant biomass that remains on/within the soil after the harvest and include root biomass.
Derived from P withdrawal and P fertilizer estimated in (Bouwman et al., 2011). Basic assumptions were made to compute residues and uptake from withdrawal. Effect of soil temperature and soil water content on P weathering and P mineralization Relative soil water content and soil temperature of top 0-0.3m soil horizon (equations 4 and 6) Simulated by two Dynamic Global Vegetation Models (ISBA (Decharme et al., 2013) and ORCHIDEE (Krinner et al., 2005)). The annual average of the climatology computed for the 1979-2010 period (i.e. no year-to-year variability) was used. LOSS (P losses through soil erosion) Losses of P due to water erosion and runoff processes Losses of P for each soil P pool (fP X out in Supp. equation 1).
Losses were computed using the soil P content simulated in our approach, the sediment fluxes resulting from erosion provided by (Van Oost et al., 2007) and the weight of top 0-0.3m soil computed from bulk density of the same horizon provided by Soilgrids (ISRIC -World Soil Information, 2016).  : Driver contributions to the current spatial variability of soil P at the grid-cell scale for different latitude bands. Only first-order contributions are displayed and they were made for cropland P TOT (panel a) and P ILAB (panel b). In each panel, the bars correspond (from left to right) to the contributions found for the global scale, the tropical band (30°S-30°N) and the northern band (>30°N), respectively. Note that no uncertainty in the drivers was taken into account here.

Accepted Article
This article is protected by copyright. All rights reserved.

DEPO (atmospheric deposition)
Soil P input resulting from deposition of atmospheric P P deposition falling within labile P (P ILAB ) and apatite (P APA ) Derived from the (Wang et al., 2014) dataset that provides atmospheric P deposition resulting from mineral dust, primary biogenic aerosol particles, sea salt and combustion, averaged over different time-periods.

BUFF (soil buffering capacity)
Properties of soil describing its ability to replenish the P ILAB pool and to adsorb labile P on its particles (P SEC ) Parameters involved in the Langmuir equation used to describe the equilibrium between P ILAB and P SEC (K S and S max ) (equation 7) (K S ,S max ) estimated for each soil order by (Wang et al., 2010). The global distribution of