Using numerical plant models and phenotypic correlation space to design achievable ideotypes

Numerical plant models can predict the outcome of plant traits modifications resulting from genetic variations, on plant performance, by simulating physiological processes and their interaction with the environment. Optimization methods complement those models to design ideotypes, i.e. ideal values of a set of plant traits resulting in optimal adaptation for given combinations of environment and management, mainly through the maximization of a performance criteria (e.g. yield, light interception). As use of simulation models gains momentum in plant breeding, numerical experiments must be carefully engineered to provide accurate and attainable results, rooting them in biological reality. Here, we propose a multi-objective optimization formulation that includes a metric of performance, returned by the numerical model, and a metric of feasibility, accounting for correlations between traits based on field observations. We applied this approach to two contrasting models: a process-based crop model of sunflower and a functional-structural plant model of apple trees. In both cases, the method successfully characterized key plant traits and identified a continuum of optimal solutions, ranging from the most feasible to the most efficient. The present study thus provides successful proof of concept for this enhanced modeling approach, which identified paths for desirable trait modification, including direction and intensity. 1This is the post-print version of the manuscript published in Plant, Cell, and Environment (10.1111/pce.13001) 1 ar X iv :1 60 3. 03 23 8v 3 [ qbi o. Q M ] 1 9 Ju n 20 17 Introduction Using simulation models to optimize phenotype Global demand for agricultural products to supply food, feed, and fuel is rapidly increasing (Edgerton, 2009). One option to meet this growing agriculture need is to continue improving plant productivity per unit of cultivated land area. However, after decades of increase, many major crops have recently shown slower rates of yield improvement, stagnation, or even loss of productivity (Ray et al., 2012). This situation likely results from the negative effects of global climate change and societal prejudice that perceives environmental costs and limits agricultural resources, for example, public policies to reduce the use of chemicals in disease management programs (Sutton, 1996). Overcoming these challenges will require that breeders continue the genetic improvement of major crops accounting for changing agricultural practices towards sustainable production systems (Vanloqueren and Baret, 2009). In general, selecting plant traits associated with improvements in crop yield is difficult. Improvements at the organ or plant level must scale to the field level and these increases must not be gained at the expense of other traits that may ultimately offset yield advances (Sinclair et al., 2004). In addition, the specific environment where the crop is grown has a significant effect on yield. These genotype × environment (G × E) interactions require considerable experimental effort to identify the persistent traits that contribute to yield increases across many environments. The effort required is redoubled in pluri-annual crops where data, collected over changing environments, are difficult to interpret for complex phenotypes or at a fine scale. Computer-based modeling approaches have recently emerged as a method to save time, labor, and resources and to infer traits value beyond field experiments (Casadebaig, Zheng, et al., 2016 ; David Da Silva et al., 2014; Martre, He, et al., 2015). This computer-based strategy has been developed over the past two decades using a number of simulation models. These software models derive from mathematical equations that represent the biological processes that influence plant growth and development as a function of time, environment (climate, soil, and management), and genotype-dependent parameters. Those parameters are expected to be more heritable than complex traits, less prone to G x E interactions and to have a less complex genetic architecture (Heslot et al., 2014). These parameters also represent a range of functional traits, i.e any morphological, physiological, phenological or behavioral feature that is measurable at the individual level (Violle et al., 2007). In this case, simulation is used as a tool to predict trait × trait and trait × environment interactions when scaling from the individual plant to the field level. Depending on their structure, simulation models can be referred to as process-based crop models or functional-structural plant models. Process-based models are defined at the plot scale, often without an explicit representation of individual plants and focus on predicting crop performance, mainly harvestable organ yield or quality. Differences in the physiological framework (e.g radiation-based or water-based biomass production) lead to distinct families of process-based model that have been used worldwide (Brisson et al., 2003; Keating et al., 2003; Ritchie and Otter, 1985; Stockle et al., 2003). Such models are used to explore the G × E landscape and assist breeding programs by taking advantage of genetic and environmental resources (e.g. Chapman et al., 2003; Hammer et al., 2006; Jeuffroy et al., 2014). By contrast, functional-structural plant models particularly represent the plant structure (i.e. its topology and geometry) linked with the functions that allow the plant to interact with its environment, i.e. light interception, photosynthesis, carbon allocation, etc. (DeJong et al., 2011). Depending on the objectives, these models focus on specific plant structure such as trunk and wood (Nikinmaa et al., 2003) vegetative development (Costes et al., 2008; Fournier and Andrieu, 1999; Lopez et al., 2008), or fruit development (Allen et al., 2005). The concept of ideotype describes the idealized realization of a plant phenotype, “a biological model which is expected to perform or behave in a predictable manner within a defined environment” (Donald, 1968). This definition provides the breeder with a guide for characterization of cultivar by identifying and selecting for particular features in progenies. Model-assisted phenotyping and ideotype design is a research domain with recent developments (Martre, Bénédicte Quilot-Turion, et al., 2015). These advances formulates ideotype design as an optimization of model inputs related to plant traits (David Da Silva et al., 2014; Ding et al., 2016; Paleari et al., 2015; Quilot-Turion et al., 2012; Semenov et al., 2014; Semenov and Stratonovitch, 2013), allelic combinations (e.g. Letort et al., 2008; Quilot-Turion et al., 2016), or management options (Grechi et al., 2012; Wu et al., 2012). These approaches have been used for different goals, including understanding crop adaptation to climate change (Paleari et al., 2015;


Introduction
Using simulation models to optimize phenotype Global demand for agricultural products to supply food, feed, and fuel is rapidly increasing (Edgerton 2009). One option to meet this growing agriculture need is to continue improving plant productivity per unit of cultivated land area. However, after decades of increase, many major crops have recently shown slower rates of yield improvement, stagnation, or even loss of productivity (Ray et al. 2012). This situation likely results from the negative effects of global climate change and societal prejudice that perceives environmental costs and limits agricultural resources, for example, public policies to reduce the use of chemicals in disease management programs (Sutton 1996). Overcoming these challenges will require that breeders continue the genetic improvement of major crops accounting for changing agricultural practices towards sustainable production systems (Vanloqueren & Baret 2009).
In general, selecting plant traits associated with improvements in crop yield is difficult. Improvements at the organ or plant level must scale to the field level and these increases must not be gained at the expense of other traits that may ultimately offset yield advances (Sinclair et al. 2004). In addition, the specific environment where the crop is grown has a significant effect on yield. These genotype environment (G E) interactions require considerable experimental effort to identify the persistent traits that contribute to yield increases across many environments. The effort required is redoubled in pluri-annual crops where data, collected over changing environments, are difficult to interpret for complex phenotypes or at a fine scale. Computer-based modeling approaches have recently emerged as a method to save time, labor, and resources and to infer traits value beyond field experiments (Da Silva et al. 2014b;Martre et al. 2015a;Casadebaig et al. 2016b ).
This computer-based strategy has been developed over the past two decades using a number of simulation models. These software models derive from mathematical equations that represent the biological processes that influence plant growth and development as a function of time, environment (climate, soil, and management), and genotype-dependent parameters. Those parameters are expected to be more heritable than complex traits, less prone to G x E interactions and to have a less complex genetic architecture (Heslot et al. 2014). These parameters also represent a range of functional traits, i.e any morphological, physiological, phenological or behavioral feature that is measurable at the individual level (Violle et al. 2007). In this case, simulation is used as a tool to predict trait trait and trait environment interactions when scaling from the individual plant to the field level.
Depending on their structure, simulation models can be referred to as process-based crop models or functional-structural plant models. Process-based models are defined at the plot scale, often without an explicit representation of individual plants and focus on predicting crop performance, mainly harvestable organ yield or quality. Differences in the physiological framework (e.g radiation-based or water-based biomass production) lead to distinct families of process-based model that have been used worldwide (Ritchie & Otter 1985;Keating et al. 2003;Brisson et al. 2003;Stockle et al. 2003). Such models are used to explore the G E landscape and assist breeding programs by taking advantage of genetic and environmental resources (e.g. Chapman et al. 2003;Hammer et al. 2006;Jeuffroy et al. 2014). By contrast, functional-structural plant models particularly represent the plant structure (i.e. its topology and geometry) linked with the functions that allow the plant to interact with its environment, i.e. light interception, photosynthesis, carbon allocation, etc. (DeJong et al. 2011). Depending on the objectives, these models focus on specific plant structure such as trunk and wood (Nikinmaa et al. 2003) vegetative development (Fournier & Andrieu 1999;Costes et al. 2008;Lopez et al. 2008), or fruit development (Allen et al. 2005).
The concept of ideotype describes the idealized realization of a plant phenotype, "a biological model which is expected to perform or behave in a predictable manner within a defined environment" (Donald 1968). This definition provides the breeder with a guide for characterization of cultivar by identifying and selecting for particular features in progenies. Model-assisted phenotyping and ideotype design is a research domain with recent developments (Martre et al. 2015b). These advances formulates ideotype design as an optimization of model inputs related to plant traits (Quilot-Turion et al. 2012;Semenov & Stratonovitch 2013;Da Silva et al. 2014b;Semenov et al. 2014;Paleari et al. Comment citer ce document : Picheny, V., Casadebaig, P., Trepos, R., Faivre, R., Da Silva, D., Vincourt, P., Costes This article is protected by copyright. All rights reserved. 2015; Ding et al. 2016), allelic combinations (e.g. Letort et al. 2008;Quilot-Turion et al. 2016), or management options (Grechi et al. 2012;Wu et al. 2012). These approaches have been used for different goals, including understanding crop adaptation to climate change (Semenov & Stratonovitch 2013;Semenov et al. 2014;Paleari et al. 2015), sustainable production (Quilot-Turion et al. 2012) or integrated pest management (Grechi et al. 2012).

Research objectives and challenges
Despite the observation of trait correlation, i.e. traits covary among individuals within a population, current model-based ideotype designs do not formally consider these correlations. This failure to account for trait correlations can lead to numerical experiments that are not well rooted in biological reality and ultimately fail to provide meaningful data to the breeder. These correlations are observed at the phenotypic level and result from differences in the variances and covariances at the genetic (G) and environmental (E) level (and their interactions, GxE), or between consecutive years or branching orders in perennial plants (Segura et al. 2008). Multi-environmental trials allow separation of the variance and covariance components for G, E and GxE. However, this process is generally based on the assumption that covariance structure for E does not depend on the genotype, and vice versa.
Here, we propose a notion of feasibility, that can be viewed as the probability that the breeder will be able to find progenies having the expected ideotype from a cross between given parents, based on meiosis. The distribution of a trait reflects its dispersion and the possibility of its selection. Thus, many individuals will have values close to the mean value of the progeny while some individuals will have a phenotype more extreme than the parents (heterotic or transgressive traits). In this perspective, some trait associations may be unlikely due to negative correlations between the traits, possibly from physiological antagonisms. Other traits may be difficult to dissociate due to close proximity of their respective key genes (linkage), which can result in a lack of genetic recombinants. In addition, several genes may interact and have a combined effect on a given trait (epistasis) or one gene may simultaneously affect several traits (pleiotropy). Thus, the architecture of variances and correlations between traits reveals the feasibility of jointly selecting traits to bolster genetic gain.
In our approach, we define a feasibility criterion, based on the observed joint distribution of the traits used as model parameters, as a possible means of improving the validity of ideotype design. Introducing real data is a means to control the optimization process and can be used to improve predictions similar to data assimilation for dynamic models. From an operational point of view, we focus on empirical correlations observed at the plant level (in agronomic plant populations); the aforementioned distinctions between genetic and environmental variances are not considered in this work. Indeed, due to the sample size, these empirical, phenotypic correlations were expected to be more robust than inferred genetic correlations.
The present work evaluated an apple tree orchard and a sunflower crop to design realistic and efficient plant ideotypes. The study relied on two numerical models: a functional-structural plant model for the apple tree (MAppleT, Costes et al. 2008) and a process-based model for the sunflower crop (SUNFLO, Casadebaig et al. 2011;Lecoeur et al. 2011). These examples targeted different steps in the breeding process and considered different sources for genetic variability:(1) upstream sources, with a segregating population of apple tree, and (2) downstream sources, with a collection of sunflower cultivars. In both cases, we used a unified multi-objective optimization formulation to solve the ideotyping problem. We used a metric of performance returned by the numerical model, and the introduced metric of feasibility based on observations. The main objective of our approach was to improve the realism of model-based ideotype design, and no modification of the simulation models was required: the approach we propose is therefore suitable for different types of simulation models (process-based, functional structural, etc.) or input data.

Apple tree orchards
Breeding programs for apple have primarily focused on major traits, such as disease resistance and fruit quality, despite the desirability of additional traits, such as bearing regularity or an optimized fruit distribution in the canopy (Lespinasse et al. 1992 ;Laurens et al. 2000). More recently, new selection criteria have been proposed that allow adaptation to climate change. Such traits include tree architecture (leaf area, branching), phenology (flowering, vegetative shoot, and fruit maturation), and the ability to tolerate periods of water deprivation during the growing season.
The architecture of a tree determines the three-dimensional foliage distribution and consequently the efficiency of light interception. It thereby impacts water transport and transpiration as well as carbon acquisition and allocation . Thus improvement of light penetration within tree canopies has been an objective of management options for physically constraining plant growth (Lauri 2002). The optimization of tree architecture could also be achieved through genetics and breeding to complement and eventually reduce human intervention with training systems. Ideally, the withinspecies genetic variability could be used for defining ideotypes in plant breeding. However, it remains difficult to integrate architectural traits in breeding programs due to the complex changes in trait values during tree development (Laurens et al. 2000).

Model description
There exist few methods to quantify and objectively compare the impact of training systems and cultivars on light interception efficiency. Moreover, the complexity of fruit tree structure, the large number of trees required for experiments in quantitative genetics, and the long growth period makes it difficult to use real trees to explore the link between the genetic variation of tree architecture and light interception throughout tree development.
In this context, MAppleT was developed to explore wide ranges of tree geometries and topologies in silico. MAppleT is a functional-structural plant model that simulates apple tree development over years, considering both topology and geometry in interaction with the environment (including gravity in regard to branch bending) . The growth and branching processes are simulated with Markov chains and hidden semi-Markov chains, respectively, estimated based on previously collected data (Costes & Guédon 2002;Costes et al. 2003;Renton et al. 2006). For geometrical development, branch bending is supported by a biomechanical model (Alméras 2001;Taylor-Hell 2005), taking into account the intra-year dynamics of primary, secondary and fruit growth. A detailed documentation on the model variables, parameters, and implementation can be found online at the MAppleT project repository (Cokelaer 2017).
Initially parameterized for the Fuji cultivar, MAppleT outputs represent the progression of tree form and topology over time and were assessed by comparing descriptors between simulated and digitized trees . The model was used in virtual experiments in which apple trees were simulated and coupled with SLIM, i.e. MultiScale Light Interception Model, which estimates radiation attenuation from statistical description of foliage at different scales (Da Silva et al. 2008). In silico experiments explored different combinations of geometrical and topological traits and their effects on light interception, considered as one of the key parameters to optimize fruit tree production (Han et al. 2012;Da Silva et al. 2014a b).

Design of experiments
Our work focused on four key geometrical traits: branching angle (BA), internode length (IL), top shoot diameter (TSD), and leaf area (LA). These traits were selected based on previous investigations by (Han et al. 2012;Da Silva et al. 2014a) because of the assumed influence on the three-dimensional distribution of leaves and therefore on light interception. The performance metric in this study is the integrated projected leaf area (iPLA) for five-year-old trees, a global descriptor of the light interception efficiency of a tree. The iPLA is the result of a weighted mean of 46 directional projected leaf areas. The 46 used directions are the central directions of 46 solid angle sectors of equal area discretizing the sky hemisphere according to the 'Turtle sky' proposed by (Dulk 1989 This article is protected by copyright. All rights reserved.
is associated with a weighting coefficient related to its elevation, the lower the elevation, the smaller the coefficient as is the sun radiance.
Since petiole angles are constant in the current version of MAppleT, the leaf orientation was assumed to be primarily influenced by branching angle and branch bending. The latter depends, for a given wood elasticity, on the weights imposed by leaves and internode widths along an axis. The internode widths are recursively accumulated from the shoot top to the shoot base. Thus, the top shoot diameter is expected to have an impact on branch bending and consequently on leaf spatial distribution. Internode length, which determines the distance between leaves, may impact leaf density and branching. Finally, the area of each individual leaf is, along with the number of leaves, a major component for total interception surface of the tree. However, not all leaf area captures light and the projected leaf area accounts for overlaps and mutual shading of leaves in the canopy. is thus considered as a proxy of the intercepted light by the whole tree.
Topological variables were not considered in this study. However, because we used Markov chains, the topology was not constant over all geometries. To avoid drawing conclusions specific to a particular configuration, the performance is taken as the mean over three distinct topologies, obtained from given values of the four input parameters. In summary, denoting the topologies, the performance can be defined as: with $\x = \{BA, IL, TSD, LA \}$ being the set of values of input parameters.
The genotype-dependent parameters were obtained by measuring 123 apple tree hybrids developed from the biparental cross between Starkrimson and Granny Smith. These trees were planted and measured during several years in an experiment reported by Segura et al. (2008). Briefly, 246 trees (two replicates) were planted in March 2004 at the Melgueil INRA Montpellier experimental station 5 x 1.5 m apart in an east-west orientation in six-tree microplots randomly scattered throughout the field. All the trees were grown with minimal training, that is, they were not pruned and the trunks were staked up to 1 m. They were regularly irrigated using a microjet system to avoid soil water deficits. Pests and diseases were controlled by conventional means in line with professional practices throughout this study. This dataset provides 6,150 average measurements from the individual trees for the four geometrical traits. The interval of variation for each trait is reported in Table 1. The minimum and maximum values for the four studied parameters correspond to the minimum and maximum values observed within the progeny. For reference, those extremes can be compared to the parameters for the Fuji cultivar: 45 for BA, 0.03 for IL, 0.003 for TSD and LA.

Sunflower
Sunflower is adapted to a wide range of environmental conditions (de la Vega & Hall 2002) and breeders mainly focus on productivity, disease resistance and fatty acid composition, while targeting large climatic zones within geographical Europe (including Ukraine and Russia) or Argentina (de la Vega & Chapman 2006). Variability in the duration of vegetative growth and grain filling is the main lever to adapt the cultivar to farming environments, primarily by growing earlier maturing cultivars in colder zones. Overall, this broad-adaptation strategy is also more economical for the breeding industry since it reduces the number of cultivars to be handled in seed production and distribution networks.
Sunflower grown in Southern Europe is mostly cultivated in low rainfall areas, without irrigation, and on shallow soils (Tuck et al. 2006) that result in frequent exposure to water deficits (Olesen & Bindi 2002). Drought tolerance involves a wide range of component processes and their spatial and temporal combination (Jones 2007). Consequently, different phenotypes can support diverse drought adaptation strategies. For example, minimization of water loss can be achieved by lowering either leaf area, transpiration per unit leaf area (stomatal conductance) or reducing the energy load of the plant (extinction coefficient) (Sadras et al. 1993). The regulation of stomatal conductance, as a function of water deficit, was found to present genotypic variability (Casadebaig et al. 2008)  This article is protected by copyright. All rights reserved.

Model description
SUNFLO is a process-based model for sunflower that was developed to simulate the grain yield and oil concentration as a function of time, environment (soil and climate), management practice and genetic diversity (Debaeke et al. 2010;Casadebaig et al. 2011;Lecoeur et al. 2011). This model is based on a conceptual framework initially proposed by Monteith (1977) and now shared by a large family of crop models. In this framework, the daily crop dry biomass is calculated as a difference equation function of incident photosynthetically active radiation, radiation interception efficiency (RIE) and radiation use efficiency (RUE, g MJ -1 ). The radiation interception efficiency is a function of leaf area index (LAI) and light extinction coefficient (k), based on Beer-Lambert's law ( ). The RUE concept (Monteith 1994) is used to represent photosynthesis at the crop scale.
Broad scale developmental or physiological processes of this framework, the dynamics of LAI, photosynthesis (RUE) and biomass allocation to grains were split into finer processes (e.g leaf expansion and senescence, response functions to environmental stresses) to account for genotypic specificity, thus exhibiting G E interactions. Globally, the SUNFLO crop model has about 50 equations and 64 parameters (43 plant-related traits, among which eight are genotype-dependent and 21 environment-related). In cropping conditions, these physiological processes are affected by numerous abiotic or biotic factors. Therefore, predictions with the SUNFLO model are restricted to obtainable yield (Van Ittersum & Rabbinge 1997): only the main limiting abiotic factors (temperature, light, water and nitrogen) were considered. A report that summarizes the equations and parameters used in the model is available as supplementary information. The source code is available on INRA software repository (git, tag 1.4, commit SHA 4d5c30d3) and the VLE-RECORD environment (Quesnel et al. 2009;Bergez et al. 2013) is used as simulation platform.
SUNFLO was evaluated on both specific research trials (40 trials, 110 plots) and agricultural extension trials that were representative of its targeted use (96 trials, 888 plots). Over these two datasets, the model was able to simulate significant G G interaction and rank genotypes (Casadebaig et al. 2011(Casadebaig et al. , 2016a. The prediction error for grain yield was 15.7% when estimated over all data (9% -30% in individual trials). From these two evaluations, it was determined that SUNFLO is accurate enough to support optimization methods, i.e. allows discrimination between two given cultivars.

Design of experiments
Previous studies suggested that high yields could be obtained by several specific trait combinations (Casadebaig & Debaeke 2011;Lecoeur et al. 2011). SUNFLO inputs include genotype, environment (soil and climate), management actions, and initial conditions. In this study eight genotype-dependent input parameters were included for optimization (Table 2) whereas management actions (sowing on 2012-04-16, 7 plants m -2 , no irrigation nor fertilization), soil characteristics (150 mm soil water capacity), and initial conditions (full initial soil water capacity, 30 kg ha -1 residual mineral nitrogen) were fixed. These conditions are representative of the average low-input farming operations for the sunflower crop in France (Barbet-Massin 2011).
Performance was defined as the mean grain yield over five model evaluations. To capture basic climate variability in these models, data was collected from five diverse sunflower production regions in France in 2012 (Avignon, Toulouse, Reims, Poitiers, and Dijon) using neighboring stations (< 5km) of the French meteorological network (Meteo-France).
The corresponding model parameters are presented in Table 2. The minimal and maximal bounds were determined for 89 hybrids and assumed to represent the cultivated genetic diversity.
The values of the genotype-dependent parameters were obtained by measuring those eight phenotypic traits in dedicated field platforms and controlled conditions. Our aim was to measure potential trait values, so different environmental conditions were targeted depending on the set on traits: field nonlimiting conditions (deep soil) for phenological and architectural traits, field limiting conditions (shallow soil) for allocation traits, and a range of controlled water deficit (greenhouse) for response traits. For field experiments, 89 hybrids were phenotyped on ten trials (two locations, five years: 2008-2012), using randomized complete block designs with three repetitions of 30 m 2 plots (6-7 plant m -2 ), see Debaeke et al. (2010) and Casadebaig et al. (2016a) for additional details. For controlled conditions, 82 hybrids were phenotyped in 10 liters pots, during six greenhouse experiments to determine the response of leaf expansion and transpiration at the plant scale after stopping watering and leaving the soil progressively drying (dry-down design). We used randomized complete block designs with two water treatments (control, stress) and six repetitions (7 pots m -2 ), see Casadebaig et al. (2008) for additional details.

Phenotype optimization
Phenotype optimization tends to solely focus on performance. Here, we propose to add a feasibility metric, and to solve a bi-objective problem.

Performance
Phenotypes with the best performance can be identified by solving the following optimization problem (Semenov & Stratonovitch 2013;e.g. Martre et al. 2015b).
with $\x$ representing the phenotype traits and $P(\x)$ the performance (respectively, and grain yield). $\Xset \subset \Rset^d$ defines the ensemble of potential phenotypes and $\x_P^*$ the optimal phenotype (ideotype). The dimension of the problem is equal to the number of input parameters, four for MAppleT and eight for SUNFLO (see Tables 1 and 2).
Choosing $\Xset$ is, in itself, a difficult task. Typically, one may define realistic lower and upper bounds for each trait; allowing the traits to take any value between the bounds. However, solving equation without any further consideration may lead to unrealistic (and eventually useless) solutions, as some trait combinations are very unlikely to be obtained. In particular, extreme values (or "corners") of $\Xset$, when all traits take their upper or lower bound values, are highly improbable. Hence, accounting for correlation between traits, and more generally for the capacity to obtain a given phenotype, introduces a second objective function, called feasibility.

Feasibility
Datasets of observed phenotypes were available for both models and used to define a function that indicates the likelihood of a new phenotype. These measures provide us with a rough indication of the domain of potential existence of the traits combinations (both in terms of bounds and co-occurrence), and a simulated phenotype may be considered less unrealistic as it is far from the cloud of observations. Therefore, we require a function that measures the proximity of a simulated phenotype to the observed phenotypes.
A simple yet sensible solution consists in fitting a multivariate probability density function to the observations. Such a function would be maximal at the center of the observation cloud, decrease when moving away from the center, and naturally take into account the correlations between traits. Hence, this function could be used as the feasibility function.
As shown in the Results section, Estimation of feasibility on the parameter space, the multivariate Gaussian distribution is a reasonable choice for our data. Then, using the maximum likelihood estimates for the distribution moments, the density function is: with [Please refer to LaTeX source] and [Please refer to LaTeX source], being the number of observations (respectively, 123 and 89 in our use cases) and the observed phenotypes. This article is protected by copyright. All rights reserved.
Finally, for numerical convenience the feasibility function is defined as the logarithm of the density, and the feasibility optimization problem as:

A multi-objective formulation for phenotype optimization
To identify the most efficient phenotypes, while also favoring those more likely to be obtained, the initial optimization (equation ) was reformulated as a bi-objective problem: These two objectives are likely to be conflicting, as the most feasible phenotypes are not necessarily the most efficient ones. So, there will not exist a common maximizer $\x^*$ for the two objectives. The goal is then to identify the set of optimal solutions, called a Pareto set (Collette & Siarry 2003), which relies on the concept of Pareto dominance. A point dominates another if both its objectives are better. Hence, an ensemble of solutions, ranging from the most feasible to the most efficient is sought.

Optimization algorithms
The two models differed substantially in terms of computational need, as the performance function $P(\x)$ computation took 0.5s for the SUNFLO model (annual crop) while it required 135 min for MAppleT (5 year-old tree), using a 2.90GHz quadcore processor with 8Go RAM. Hence, different algorithmic families were used to solve the two optimization problems.
For MAppleT, the time cost was prohibitive for using standard multi-objective optimization algorithms. In a previous study, metamodel-based optimization strategies were found well-adapted for this problem with the ability to return a good approximation of the Pareto set using a very reasonable number of calls to the simulator (Picheny 2014). In short, this strategy is based on the use of a Gaussian process approximation model (called metamodel, Rasmussen & Williams (2006)), built from a small number of well-chosen simulations (the experimental design). This metamodel is then used as a guide to sequentially choose the most interesting simulations to run (as in the classical EGO algorithm for single objective problems of Jones et al. 1998). The optimization process relied on the statistical computing environment (R Core Team 2016) and package (v1.5.5, Roustant et al. 2012) and (

Estimation of feasibility on the parameter space
Graphical representations of the feasibility functions for the MAppleT and SUNFLO models, along with empirical correlations between traits (Figure 1) are reported here for the first time. For both models, the normality hypothesis was observed to be reasonable (except for K and LE traits in SUNFLO, p < 0.1), since contour lines match observations on the projected spaces. The observed variability was important in all traits, which leads to positive expectations for selection. The orientation of the ellipsoids shows how correlation between traits is taken into account in the feasibility estimation.
In apple tree, correlations between the four variables were relatively low. Despite the usual consideration of allometric relationships between organ dimensions, especially between internode length and leaf area, these relationships appeared moderately conserved after genome recombinations. In sunflower, correlations between traits were globally low, with the exception of four cases, specifically TDF1 and TDM3; K and TR; TLN and LLS; TLN and K. Among these exceptions, some observed correlations were expected, such as the positive correlation in development stages (earliness at flowering, TDF1 and at maturity, TDM3) or the positive correlation in plant architecture (high leaf number, TLN and potential leaf area, LLS). The strong negative correlation between light extinction coefficient (K) and control of stomatal conductance (TR, i.e. how strongly plant transpiration is reduced with water deficit) was not expected, as these two traits were measured by different methods in different environments (field versus greenhouse). This correlation suggests that cultivars that are more efficient at intercepting light for a given leaf area (high extinction coefficient) are also maintaining their stomatal conductance under water deficit (high response parameter).

Optimization of input traits
The two Pareto fronts obtained by the optimization algorithms and the objective values (performance) of (1) the phenotypes corresponding to the initial experimental design for MAppleT and (2) 50 randomly selected phenotypes in [Please refer to LaTeX source] for SUNFLO are given in Figure 2. In both cases, the Pareto fronts were relatively smooth and the first point on the front (top left) corresponded to the center of the cloud of phenotypes (i.e. to the most feasible phenotype). The performance varied from 60 to 120 for MAppleT, doubling from the most feasible to a very "atypical" phenotype. The range of variation in the optimal set is smaller for SUNFLO, from 2.49 to 2.72 t ha -1 , yet the values of the 50 random phenotypes indicated that the Pareto-optimal phenotypes were substantially better than average: the mean performance of the optimal set was around 2.62 t ha -1 compared to 2.48 t ha -1 for the random set. The most efficient phenotype (far right on the Pareto front) reached a performance level of 2.72 t ha -1 .

Figure 2: Performance and feasibility of sampled and optimized individuals for the apple and sunflower models.
The relationship between the feasibility and performance criteria was illustrated by displaying the Pareto sets (input values) using two-dimensional subspace projections (Figure 3). The most central points are the most feasible, and the points closest to the boundaries are the most efficient. In both models, the Pareto sets took the form of a 'path', going from the center of the ellipsoids to the bounds of the hypercube, with the exception of a few traits that remained constant over the Pareto set (BA in apple tree, TDF1 and LLS in sunflower).
For the apple tree ( Figure 3A), the path notably extended outside the ellipsoids for LA suggesting that efficient phenotypes could exist, but with very low feasibility. Such phenotypes would rely on a broken correlation between LA and the three other variables. For sunflower ( Figure 3B), the path followed the lowest feasibility gradient, as indicated by contour lines, where the optimization process globally followed the correlation between traits. The only notable exception was for phenology, where the correlation between earliness at flowering (TDF1) and at maturity (TDM3) was broken in order to reach an increase in the performance criterion.

Figure 3: Position of optimal phenotypes as a function of observed traits correlations in sampled cultivars.
To better characterize those paths and the performance:feasibility trade-off, the value of each trait was plotted for the optimal phenotype as a function of performance (Figure 4). Since the mean and variance of each trait were different, we standardized the trait values (i.e. substract the mean and divide by the standard deviation of average individual). Based on this visualization, values exceeding were considered as difficult to access based on the sampled diversity. In addition, we approximated each curve using a linear model (or piecewise linear model for the IL trait) to improve readability. This article is protected by copyright. All rights reserved.
The resultant graph allowed identification of traits whose value was almost constant in the optimal set of solutions, such as BA for apple tree ( Figure 4A), indicating minimal importance in the performance:feasibility trade-off. In contrast, lines with larger slopes corresponded to traits for which the "price" to pay for feasibility was the highest, yet changing their value had the strongest impact on performance.
For the apple tree ( Figure 4A), BA clearly stayed at its mean value, while IL and LA increased rapidly with performance, reaching atypical values early. In comparison, the TSD also increased, but at a lower rate. For sunflower ( Figure 4B), the traits could be grouped into sets of high slopes (TDM3, TR, TLN, K, LLH) or low slopes (TDF1, LE and LLS). Interestingly, each group contained traits associated with different plant characteristics, e.g architectural traits controlling leaf profile (LLH and LLS). These results suggest that the breeding process released genotypes that perform using diverse physiological strategies, and opportunities exist for new plant types. Finally, we reported four optimal plant phenotypes evenly distributed along the Pareto front for apple tree and sunflower (Table 3). Table 3: Eight optimal phenotypes identified with the MappleT and SUNFLO models.

Discussion
The use of phenotypic measurements of existing cultivar populations and simulation-based optimization allowed us to virtually recombine plant traits to identify a set of optimal phenotypes that maximize performance and feasibility (Figure 2, Tables 3-4). This result is first discussed from a biological perspective, i.e the morphological and physiological characteristics in the optimal set of phenotypes for apple tree and sunflower. We then discuss the influence of the input dataset on the outcome of our approach and how the feasibility criterion could be defined in relation to the type of genetic material (e.g recombinant inbred lines or hybrids). Finally, we discuss how this multiobjective optimization method is a less data-intensive alternative than emergent molecular breeding methods such as genomic selection or gene-based modeling to estimate the breeding value of plant phenotypes.

Morphological and physiological characteristics of the optimized phenotypes
Considering both apple and sunflower use cases, the performance:feasibility optimization approach resulted in paths of improvement that can be used in trait-based breeding ( Figure 4A and 4B). Such paths could be used to follow target values for the most promising traits. These paths also use trait covariance to indicate when modifications would yield an unlikely plant phenotype.

Apple tree use case: longer internodes and higher leaf areas improve light interception
Branching angle (BA) in apple trees can range from 20 to 120 degrees (Table 1), but values approximately 80° from the vertical (open branches) were identified as both the most feasible and most optimal (Table 3). Morever, due to branch bending, this basal branching angle will result in higher top branching angles. However, no significant improvement in this trait can be expected, since there was no increase in the performance value associated with its variation (Figures 4A and 4B). This result is consistent with observations in more erect trees with low branch angles where light capture efficiency is reduced, likely from higher leaf overlap (Da Silva et al. 2014b).
In contrast, significant increases in performance could be expected by increasing internode length, leaf area, and to a lesser extent top shoot diameter (Table 3, Figure 4A). Interestingly, the highest value in the explored range of internode length was reached prior to maximum performance suggesting that internodes longer than five cm could increase and therefore light interception. However, this extreme phenotype is less feasible than other phenotypes, as well as unstable from a biomechanics point of view; the corresponding shoots would likely be unable to support the weight of the apple fruit (Alméras & Fournier 2009 This article is protected by copyright. All rights reserved. The existence of such tradeoffs among traits has been shown in other species, especially long internodes increasing light interception but being detrimental to biomechanics (Pearcy et al. 2005). These tradeoffs are likely to result from developmental, biomechanical and hydraulic constraints in plants and to be responsible for the general rules observed across species between stem diameter and leaf area (Corner 1949), as well as for limits in plant plasticity (Valladares et al. 2007).
No limitation was observed for leaf area or top shoot diameter where increases continued to be associated with higher performance ( Figure 4B).These results suggest that values beyond the scope of this work could be explored for increased light interception, with the caveat that such phenotypes would be associated with low feasibility (Table 3). Performance advantages from increased top shoot diameter were less rapid than with leaf area. This result further suggests that exploring values beyond those tested here present additional opportunities for improvement of performance, at least in terms of light interception. As previously mentioned for internode, tradeoffs may exist between leaf structure and its functions. Leaf morphology and specific weight, which depend on the intercepted light during organogenesis, are tightly linked to leaf functions, photosynthesis and transpiration. Moreover, whatever the trait considered, dimensions are correlated to biomass and organ construction and maintenance have a cost that may counter-balance the expected advantages (Poorter et al. 2006a). Even though the space of correlation values explored here was less than in ecological studies (e.g. Niklas 1994;Poorter et al. 2006b), both constraints and costs are likely to limit plant plasticity and the domain of possible phenotypes (Valladares et al. 2007). Accounting for leaf respiration and transpiration would certainly limit the performance of plants with large leaf areas. The fact that no plants were observed with such large leaf areas could mean they are sub-optimal. Making models more complex, including more constraints among traits and criteria to define optimality would help detecting such trade-off but remain challenging.
The optimal phenotypes designed in the present approach exhibit characteristics quite different from those of the Fuji variety used in the initial version of MAppleT Da Silva et al. 2014b). Indeed, longer internodes, and higher leaf areas would be preferable for optimizing light interception.

Sunflower crop use case: late maturation, lower leaf area and adaptive traits improve performance
The values for two sunflower leaf traits, specifically LLS and LE, remained relatively constant across the optimal set of solutions, indicating that once fixed to their optimal values, these traits had no impact in the performance:feasibility trade-off ( Figure 4B). The designed sunflower ideotype, for the considered growth environments, had a less than average potential individual leaf area (low LLS in Table 3) and leaf expansion was more sensitive to water deficit (high LE in Table 3), compared to the overall population of phenotyped hybrids (Table 2). For these two traits, the convergence towards a fixed value indicated that they were central for cultivar adaptation in the sampled environments: any modifications in these traits would lead to sub-optimal solutions (either in performance or feasibility).
In contrast, modification of the other six traits had an impact on the performance:feasibility trade-off of the ideotypes (i.e. important slopes in Figure 4B; Table 3). The most efficient ideotype had a late maturity date (TDM3) and mid-late flowering date (TDF1). Its aerial architecture was defined by a low leaf number (TLN), a symmetric leaf profile (LLH), and a low extinction coefficient (K). Sensitive control of the stomatal conductance (TR), where stomatal conductance starts to decline at moderate water deficits, was also found to be an important characteristic. These optimal characteristics were difficult to obtain, at least for maturity and control of stomatal conductance, as the values were out of the feasibility range.
Interestingly, the characteristics of the designed ideotype portrayed a plant well-adapted to water deficit: moderate leaf area, low light extinction, and possessing adaptive traits (LE, TR). These characteristics lead to water-efficient plants, with a lower water loss due to transpiration, because the leaf area is lower or because transpiration rate is reduced under water deficit (higher LE value).

Improving genotype-to-phenotype prediction in simulation-based ideotype design Genericity and limits of the results depend on model complexity and targeted cropping environments
In our experiments, the input phenotypic space was bounded by observed values, thereby avoiding the question of extrapolating plant performances for unseen-before traits values. Although our approach indicates "directions" for ideotypes, extrapolation should only be done with caution. For example, Haile et al. (1998) and Srinivasan et al. (2017) report empirical evidences that leaf area is already supra-optimal for the soybean crop (see also Conley et al. 2008Conley et al. , 2009), whereas our approach indicates a positive value of increasing leaf area-related inputs (Figure 4, strong value for apple trees, lesser for the sunflower crop).
This discrepancy can first result from the model content and validity domain. Plant models are a simplified description of the real biological system, and some physiological processes might not be represented (e.g respiration, biotic stresses) leading to an erroneous simulated cost for biomass production and thus plant performance. We argue that making models complex enough to detect such trade-offs might be much more difficult than operating optimization in a bounded phenotypic input space and incorporating a measure of feasibility.
The definition of cropping environments in the model-based approach can also lead to differences in trait value. In our approach, average cropping conditions only were used. Considering additional diverse environmental and management factors (for instance, drought-prone environments) led to negative value for leaf area increase (Victor Picheny 2017). However, considering additional diverse environmental and management factors (for instance, drought-prone environments) would likely affect the characteristics of the optimal set of ideotypes (e.g negative value for leaf area increase as observed in Victor Picheny (2017)) and eventually lead to ideotypes for specific growth conditions. We suggest that a few key traits are thus responsible for cultivar global adaptation capacity. Secondary traits are needed to adapt to uncertain environmental conditions and support alternative resource use strategies.

Is the feasibility function a reasonable representation of the biologic reality?
The feasibility function is entirely dependent on the dataset at hand. This implies that the diversity of optimal phenotypes selected by the optimization process results from the trait variance present in the considered population. This variance corresponds to heterosis (biparental progeny), in which case it also depends on chosen parents, or to genetic variability (collection of hybrids). When more polymorphisms are available, with multiple alleles at a locus, more diverse phenotypic traits can be obtained. The genetic distance between the parents will also impact this capacity; more distantly related parents increase the chance of different alleles at a locus.
In this study, the variation observed in the apple tree case resulted from a biparental segregating population with low parental relatedness (Allard et al. 2016). A larger distribution of the traits, and possibly different correlations could be expected from different crosses or in populations with increased genetic diversity, such as multi-parental populations (Bink et al. 2002;Blanc et al. 2006) or core collections (Lassois et al. 2016). In the sunflower case, the observed variability in the traits resulted from the upstream selection of the commercial hybrids. The variability observed in this material likely covers only a small portion of the diversity present in core collections. The lack of kinship information on this material assumes average values for each trait are most feasible. This is reflected by our definition of the feasibility function: maximal at the center of the cloud and lowest at the edges of the domain. Using plant material with a more complex genetic structure would necessitate accounting for kinship matrices comparable to pedigree-based (Bink et al. 2002) or genome-wide association study (GWAS) analyses. This would account for genetic correlations between individuals and between traits as proposed herewith.
More specifically, this approach could consider alternative feasibility functions depending on the type of genetic material screened in the optimization process. In the proposed function (Equation 5), the proximity of a virtual phenotype to a particular existing individual will only indirectly be accounted for. In the sunflower case, where hybrids were produced by breeding, an alternative feasibility function could be defined that would reach its maximum at those particular trait combinations ( Figure  This article is protected by copyright. All rights reserved.

5).
In this case, virtual phenotypes matching existing phenotypes would be more present in optimal solutions, even if they are farther from the population mean. We did not conduct these tests because it was not central to our proof of concept study. However, the global approach and optimization algorithm would support alternative feasibility functions without modifications.

Figure 5: Feasibility function used in the study (blue) and alternative function (black).
How does our approach fit into gene-to-phenotype prediction strategies ? The efficiency of the breeding process benefits from gene-to-phenotype predictions to assign an accurate breeding value to genotypes. Hybrid modeling approaches have recently emerged between statistical modeling (forward, gene to phenotype) and process-based modeling (reverse, phenotype to gene) (Bustos-Korts et al. 2016).
On the one hand, gene-to-phenotype predictions are generally undertaken from the molecular level where genomic selection (Meuwissen et al. 2001) is used to predict the breeding value from genomewide informations and statistical modeling. The difficulty with this method lies in the prediction of non-additive gene effect and gene environment interactions. These predictions can stem from environmental covariables (Heslot et al. 2014) or computed from simulation modeling (Technow et al. 2015). Additionally, as in our approach, phenotypes measured in the training population for genomic selection could also be used to estimate feasibility function based on correlations observed between traits. This function will likely improve the accuracy of the genomic selection model for estimating breeding values in the next generation (e.g. Marulanda et al. 2015).
On the other hand, gene-based modeling (White & Hoogenboom 2003;Yin et al. 2004), where gene or QTL action is represented through linear effects of specific alleles on crop simulation model parameters (Messina et al. 2006;Chenu et al. 2008), intrinsically accounts for feasibility at the expense of an intensive phenotyping (i.e. on large populations) and modeling process (i.e. to develop genetic models for all influent genotype-dependent parameters of the simulation model).
Consequently, gene-to-phenotype predictions can be improved to account for gene environment interactions by (1) augmenting genomic selection frameworks with new types of predictors or (2) relying on numerical models and optimization methods to identify optimal combinations of traits (or alleles, depending on the input type). Our approach fits the second option, and can improve the realism of ideotypes designed with model-based optimization methods. This requires less phenotyping work as compared to the gene-based modeling options, by estimating the feasibility criteria using the values of genotype-dependent parameters of the simulation model.

Conclusion
Plant numerical models are powerful tools to facilitate ideotype design. These models offer the ability to perform large-scale experiments and to explore a large variety of trait combinations. However, as stated by Martre et al. (2015b), it is essential to integrate genetic constraints on virtual cultivars during trait optimization to avoid impossible phenotypes that breeders cannot produce. We argue that an efficient way to increase realism in model-based ideotype design approaches is by introducing correlations between genotype-dependent parameters into a feasibility criterion, which is not formally considered in current approaches. Multi-objective optimization allowed exploration of potential phenotypes that successfully target realistic trait combinations and avoid misleading solutions. This feasibility criterion approach successfully demonstrated that it could provide paths for desirable and realistic trait modifications (both in direction and intensity), when coupled with trait-based breeding methods, for apple trees and sunflower crop in our proof of concept study.
Comment citer ce document : Picheny, V., Casadebaig, P., Trepos, R., Faivre, R., Da Silva, D., Vincourt, P., Costes This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.  This article is protected by copyright. All rights reserved.  This article is protected by copyright. All rights reserved.  The red contour lines represent the feasibility function: the smaller the ellipse, the larger the feasibility (cf. Figure 1). The points show all the Pareto-optimal phenotypes (optimized individuals). Their intensity of blue color corresponds to the performance value (in t ha -1 for panel B), with less intense coloration attributed to higher performance.
Comment citer ce document : Picheny, V., Casadebaig, P., Trepos, R., Faivre, R., Da Silva, D., Vincourt, P., Costes This article is protected by copyright. All rights reserved.  This article is protected by copyright. All rights reserved.  The assumption behind the current feasibility function is that it is easier to produce a phenotype with average traits, while the alternative assumption would be that it is easier to produce a phenotype close to a single existing one.