Convergence of knowledge in a stochastic cultural evolution model with population structure, social learning and credibility biases

Understanding how knowledge emerges and propagates within groups is crucial to explain the evolution of human populations. In this work, we introduce a mathematically oriented model that draws on individual based approaches, inhomogeneous Markov chains and learning algorithms, such as those introduced in [F. Cucker, S. Smale, Bull. Amer. Math. Soc, 39 (1), 2002] and [F. Cucker, S. Smale and D. X Zhou, Found. Comput. Math., 2004]. After deriving the model, we study some of its mathematical properties, and establish theoretical and quantitative results in a simplified case. Finally, we run numerical simulations to illustrate some properties of the model.

1. Introduction 1.1. On social learning. Computers, spaceships and scientific theories have not been invented by single, isolated individuals. Instead, they result from a collective process by which innovations are gradually added to an existing pool of knowledge, most often over multiple generations [4,28]. The ability to learn from others (social learning) is pivotal to this process because it allows innovations to be passed from individual to individual and from generation to generation.
Since the beginning of the twentieth century, much energy has been devoted to understand the mechanics of learning and cultural innovation. Among the oldest seminal papers, [14,23] used the so-called "stimulus sampling" approach, which formalizes the effects of repetition on the learning process following a 'trial and error' procedure. Stochastic models were subsequently used to take the effect of randomness into account in the learning process. Despite their simplicity, these models correctly describe simple repetitive experiments [6,7]. These two approaches were then extended and generalized as learning theory in the early 2000s in the seminal papers by Cucker and Smale [8,10]. These papers introduced and extended the modern theory of mathematical learning process by splitting it into two fundamental mechanisms: cognitive learning where an individual increases its knowledge by its own experience, and social learning by which an individual learns from the interaction with other individuals. After these works, many new models have been published, either stochastic or deterministic, using various approaches ranging from microscopic agent based models [27,9,24,19], to mesoscopic kinetic theory of active particles [3,15,26,11] to macroscopic fluid models [18,17] (see [5] for a detailed review).
In this work, we develop a new mathematical model that aims to describe the dynamics of knowledge creation and propagation among interacting individuals. Our approach generalizes the Cucker-Smale theory of learning by introducing different possible biases and asymmetries in knowledge transmission due to population structure, individuals' status, or individuals' credibility. As a result, the influence of individuals over each other varies through time and various hypotheses from [10] are relaxed.
1.2. Outline of the paper. The model is properly introduced and simple applications are given in Section 2. In Section 3, we study some of the mathematical properties of the model, and establish theoretical, quantitative results in a simplified case describing the evolution of knowledge among interacting individuals. Finally, we develop a numerical method to simulate our model in Section 4. This method allows us to run numerical analyses of the model in cases where we do not have analytical results, and to present numerical illustrations of the classical model [10] on the evolution on language, which is contained in our model. homogeneous Markov chain structure of the model. LM would like to thanks Dorian Ni for his feedback on the model.

Presentation of the mathematical model
In this section, we shall present the model describing the evolution of knowledge within a finite population. Many different definitions of knowledge have been proposed. Here, we consider that knowledge results from conceptualizations that appropriately reflect the structure of the world. Conceptualizations are defined as functions linking a set of possible experiences to a set of possible concepts. We call these functions knowledge-like functions.
Time is supposed discrete. At each time step the knowledge-like function of individuals changes according to a learning dynamic that depends on their interactions with others (social learning) and random exploration (individual learning). Our model is an extension of the model of Cucker, Smale and Zhou describing the evolution of language [10], and can be seen as an hybrid between a learning algorithm [8] and an individual based model [2,1].
We suppose that individuals influence each other according to a social learning matrix Λ ∈ M N (R). This matrix depends on individuals' perception of each other's credibility through a credibility matrix C ∈ M N (R). As a consequence, individuals' credibility and how knowledge is transmitted between individuals change at each time step. The social learning matrix Λ also depends on the structure of the population described by a structure matrix Γ ∈ M N (R) (e.g. a professor has a strong impact on her students, while students have a lesser impact on their professor or individuals learn preferentially from spatially closer individuals). Knowledge-like functions also evolve by individual learning which is described as a stochastic process that we will detail in the following. The learning algorithm then takes into account both social and individual learning. Let us first start with some useful notations that we shall use in the following: • The space of square matrices of size N > 0 with coefficients in K will be denoted by M N (K). • The vector of R N composed of 1s will be denoted by e: • The distance from a function f to a set X is defined by (1) E is a closed and bounded subset of R n .
(2) C ⊂ E l with l ∈ N * , E an euclidean space, and 0 ∈ C.
(3) F is a subset of the set of the functions from E to C.
The set E represents all the possible experiences, and C represents all the concepts (an illustration is presented in Fig 1).

Definition 2.
A knowledge-like function f ∈ F is a function linking the experience set E to the concept set C.
Each knowledge-like function represents the knowledge of one individual. Let e be in E, when there is a c ∈ C such as f (e) = c and c = 0, we say that the knowledge-like function conceptualizes e. We assume that individuals conceptualize all experiences they go through. Elements that are not conceptualized (i.e. not experienced) by individuals are sent to the zero of the set C by their knowledge-like function.
Example. The knowledge-like function associated with colors. Let E be [0, 1000] ⊂ R representing the set of wavelengths in nanometers and C = {0, red, yellow, green, blue, purple} contains color names and 0. In this case E is a continuous space and C is a discrete space. A knowledge-like function assigns a color to each wavelength, or 0 if the individual has not conceptualized this color. For example f defined below is a knowledge-like function.  Figure 1.

Remark 1.
In Japanese the color word (ao) includes what Western poeple would call green and blue. As with other languages, the Japanese language did not initially differentiate between these two colors. This language could be modeled by the knowledge-like function f':

Learning algorithm.
Let N be the number of individuals in the population. Each individual i is associated with a knowledge-like function k i ∈ F. We consider a dynamical, discrete time model: knowledge-like functions evolve over time because of social and individual learning. Let us denote by k t := (k t 1 , ..., k t N ) ∈ F N the state of the population at time t > 0. As time evolves, individuals modify their conceptualization through the learning algorithm presented in [8]: Definition 3. The learning algorithm computes the knowledge-like function at the next time step using a least-square procedure For each individual i, the elements of S t i represent what will shape the knowledge of i at the next step. The sampling S t i is done randomly using the probability measure ρ i,t defined in (2.4). The elements of S t i result from social or individual learning. The relative importance of individual vs. social learning is controlled by a parameter τ such that: where ρ i,t Λ and ρ i,t I are two probability measures representing the effects of social and individual learning respectively, and defined below.
We assume that the extent to which individuals are influenced by each other is determined by the social learning matrix Λ = (λ t i,j ) 1≤i,j≤N ∈ M N (R) (see 2.7 below for the calculation).

Definition 5.
The probability measure ρ i,t Λ representing social learning is defined by: where λ t ij describes the influence of individual j on individual i (see 2.7 below for the calculation of the social learning matrix Λ t ) at time t.
According to (2.5), drawing an element (e, k t i (e)) using the probability measure ρ i,t Λ is equivalent to randomly drawing an individual i weighted by the coefficient (λ t ij ) 1≤j≤N , and randomly choosing an experience e in E. Definition 6. The probability measure ρ i,t I representing individual learning is given by where ρ(c|e) denotes the conditional probability measure on C, defined for every (e, c) ∈ E × C, and every integrable function ϕ by In this expression, ρ E denotes the marginal probability measure on E, namely Thus, the individual learning phase is equivalent for each individual i to draw an element e ′ experienced by i and to draw an experience e following a normal law centered and concentrated on e ′ . Because of the shape of this probability law, individuals tend to explore the set E close to the elements they already explored. The concept c is drawn following a probability law centered and concentrated on k t i (e).

Social learning.
Social learning depends on the influence of each individual over each other described by the social learning matrix Λ ∈ M N (R). This matrix captures the effect of the structure of the population and reflects, for instance, spatial structure or differences in social statuses. Social learning also depends on the credibility that individuals assign to each other based on their own knowledge-like function (as described in Section 2.4). Population structure and individuals credibility are described by the matrices Γ = (γ t i,j ) 1≤i,j≤N and C = (c t i,j ) 1≤i,j≤N respectively.
• We consider a population of N individuals structured in age, sorted such that individual 1 is the youngest and N the oldest. It has been shown that older individuals tend to have a higher inertia [13], which can be modelled by the condition γ 11 < ... < γ N N .
• Let us consider the relationship between a parent and her offspring. The offspring learns a lot from her parent but the situation is not symmetric. Let s ∈ (0, 1) describes the influence of a parent on her offspring. We have • We consider now the relationship between two students and their teacher. Because of her status, the teacher has a high influence on her students but is influenced by them to a lesser extent. Assuming that the relationship between the students is symmetric, we have Thus the influence of individual j on i depends on the structural influence γ ij of j on i, and on the credibility c ij that i attributes to j. We shall assume in this work that these phenomena are multiplicative.

Definition 9.
The social learning matrix Λ = (λ ij ) 1≤i,j≤N is a square matrix of size N defined by:

Likelihood landscapes and how they affect credibility.
In our model, some conceptualizations (i.e. knowledge-like functions) appropriately reflect the structure of the world, while others do not. For instance, in an environment in which blue berries are safe to eat while green berries are unsafe, color categorizations that discriminate between blue and green are superior because they appropriately capture the structure of the environment. Individuals don't know a priori how to categorize their environment. An individual who, by chance, only ever ate blue/safe berries might consider that discriminating between blue and green makes no sense. Yet, an individual who got sick after eating green/unsafe berries is likely to refine her color conceptualization to avoid being sick again. Sometimes, alternative and irreconcilable conceptualizations are equally likely. As an illustration let us consider the shape illustrated in figure 2. One might consider that it represents (1) a pair of faces (in black) or (2) one cup (in white). Additional observations will not allow individuals to decide whether one conceptualization is more likely than the other. In our model, we assume that individuals evaluate the likelihood of their conceptualization according to their own experience. To do so we define a likelihood landscape as following: The map L is called the likelihood landscape.

Examples of likelihood landscapes.
• Let us consider the evolution of two different concepts in a population: flat earth (F ), and round earth (R). Individuals can have experiences where the Earth seems flat (f ) and others where the Earth seems round (r) (seeing a picture of the Earth, a boat vanishing behind the horizon, etc). In this case E = {f, r} and C = {0, F, C}. We define the likelihood landscape as : and When the earth seems flat (f ) the earth could be flat or round (because round surfaces can appear flat when observed up close), so both concepts (F and R) are likely. However, when the earth seems round only the concept that the earth is round is likely.
• Let us consider again the example of color developed above ( Fig. 1), where E is the set of all wavelengths of the visible spectrum and C is the set of colors. Moreover, let us consider that it is not useful for individuals to discriminate between colors. In that case, we would define the likelihood landscape as L(e, c) = 1 for all (e, c) ∈ E × C \ {0}.
As introduced earlier (Definition 8), the influence individuals have on each other depends on their credibility through the credibility matrix C. The level of credibility attributed to an individual by another depends on both the knowledge-like functions, and the likelihood landscape.
More precisely, c ij describes the credibility individual i attributes to individual j. If the credibility given to individual j by individual i is high compared to those attributed to other individuals (including herself), that means that individual i is more prone to adopt individual j's conceptualization. We described in Section 2.2 how this adoption changes one's knowledgelike function. We also consider that the credibility c ii that an individual i gives to its own categorization can be affected by her own experiences. In other words, individuals are able of self-criticism. The lower the self-credibility, the more likely an individual is to be influenced by other individuals (self-credibility directly affects an individual's inertia).

Definition 11.
A credibility matrix C = (c ij ) 1≤i,j≤N is a square matrix of size N defined by: where c min ≥ 0 is a fixed parameter , and

Remark 4. If the set E contains a finite number of elements, the credibility formula reduces tõ
namely, the second part of the formula is similar to a measure of likelihood in probability theory [22]. In this formula, associating few experiences with unlikely concepts penalizes credibility a lot.
Application to the round vs. flat earth example.
together with c min := 0. For any i ∈ {1, 2, 3}, f i are experiences where the Earth is as likely to be flat as round (e.g. a human watching the horizon), and r 1 , r 2 are experiences where the Earth is unlikely to be flat and likely to be round. We consider a population of four individuals with different knowledge-like functions k 1 , k 2 , k 3 and k 4 such that: Let us normalize C such that its rows sum up to 1, in order to easily grasp the influences on an individual j in row i: Individual #1 has only experienced f 1 , she judges the other individuals (and herself) based on her sole experience. All individuals associate f 1 with an appropriate conceptualization. So individual #1 gives the same credibility to all individuals. Individuals #2, 3 and 4 all have experienced f 1 and r 1 . They evaluate individual #1 as less credible than themselves because #1 has not experienced r 1 . Individual #3 and 4 evaluate #2 as not credible at all because she associates r 1 with a concept that is unlikely to them (i.e. the earth is flat while their experience shows it is round). #2 judges herself not credible because she associates r 1 with an unlikely concept. Individual #3 evaluates #4 as less credible than herself because #4 uses several concepts (in our model, less parsimonious conceptualizations are penalized).
3. The case of globally shared knowledge: convergence without individual learning In this section, we are interested in the convergence of the learning dynamics with high probability towards a common shared conceptualization among individuals (i.e. when everybody carries the same knowledge-like function k). This result is obtained assuming no individual learning: in all this section, we shall assume that the rate of individual learning τ = 0. The more realistic case where individuals also learn individually is explored below using numerical simulations.
One step idealistic processes. In order to establish the main result, let us decompose the stochastic process k t into two distinct processes that we shall analyze separately.

Definition 12. Let us define the application
We can then define the one step deterministic idealistic process as Using these idealistic processes, the time evolution of the knowledge-like function is given by We first prove the contraction of the idealized process K t T from (3.1) in the space M F using some algebraic properties of primitive matrices, as well as results about inhomogeneous Markov chains. Secondly, we shall prove the convergence of the process ∆k t with high probability using learning theory. Then, under certain hypothesis (such as τ = 0), we prove the convergence of k t towards the set M F with high probability.

Primitive matrices and their applications.
The behavior of the idealistic processes K t T is mainly driven by the social learning matrix Λ. In this Section, we study the relationship between the properties of the influence matrix and the interactions taking place within the population. If i does not communicate with j we write i ↛ j. • We say that i communicates with j with k intermediates if there exists i 1 , ..., i k ∈ {1, ..., N } such that

Examples. Let us consider a matrix A defined by
As we can see on the graph of the matrix A (Fig. 3), every individual communicates with each other with at most 3 intermediates. Then, A is a primitive matrix, because    We can now establish results relating graphs and eigenvalues of matrices: holds, then 1 is an eigenvalue of A, and its multiplicity is 1.
Proof. We suppose that 1 is not an eigenvalue of multiplicity 1 of A. Since A is a stochastic matrix one has Ae = e with e given by (2.1), so 1 is an eigenvalue of A. In particular, its order of multiplicity is bigger than 1, and there exists X ∈ R N \ Re such that AX = X.
Let P be a permutation matrix such that the coordinates of the vector P X are ranked from the lowest to the highest. Let A ′ = P T AP and X ′ = P T X. Let n l and n h be respectively the number of coordinates equals to the lowest and to the highest coordinates values. We have n l ≥ 1, n h ≥ 1 and n l + n h ≤ N .
As shown on the figure 5, G 1 ↛ G 3 and G 3 ↛ G 1 . We conclude as before. Figure 5. Illustration of the influence relationship between the three clusters G 1 , G 2 , and G 3 □ 3.2. Eigenvalues of the matrix of influence Λ. In this Section we study the quantitative properties of the eigenvalues of the influence matrix, in order to understand the dynamics of the idealized process K t T . Let Γ and C be respectively the structure and credibility matrices defined in Def. 7 and equation (2.9). By construction, C is a stochastic matrix. The social learning matrix Λ is defined according to (2.7) by Proof. Let us set m γ = min i,j γ ij . Since γ ij > m γ for all i, j ∈ {1, ..., N }, one has that .

Lemma 3. Let Λ be a stochastic matrix of size N that is bounded by below by m λ as in Lemma 2. Let us consider the sorted collection
..N of its eigenvalues: Then there exists a universal constant 0 < M λ < 1 such that Being composed of stochastic matrices, the set A is bounded (by 1) for the norms induced by both the 1 and ∞ vector norms on R N . Moreover, it is closed by construction. In particular, A is a compact subset of M N (R). Let the application that returns the eigenvalues of a matrix, sorted in a nonincreasing (in modulus) order. According to the Theorem II.5.1 of [20], L is a continuous function on the set of stochastic matrices. In particular, A being compact, the numerical range of L, is also compact. Continuous function reach their bounds on compact sets, so that one can take

Contraction of a stochastic primitive matrix.
We shall now study in this section the properties of stochastic primitive matrices, in order to understand the behavior of the process K t T .

Lemma 4. Let A ∈ M N (R) be a stochastic matrix that is bounded by below by m A as in 2. Let
M := Re be the eigenspace associated with the eigenvalue 1 and W the eigenspace associated with the remaining eigenvalues. One has (1) R N = M ⊕ W, both spaces being stable by A; (2) There is a norm · on W and a distance d on R N such that for all (x M , x W ) ∈ M⊕W, Proof.
with · 2 being the euclidean norm on R N . (1) Then F N = M ⊕ W, both spaces being stable by A;

Remark 5. This result is inspired from the Lemma 1 from [10], and has similar conclusions. Nevertheless, one has to bear in mind that with its set of hypotheses, the original result from
(2) There is a norm · on W and a distance d on R N such that, for all f M ∈ M and for all f W ∈ W, Proof.
There exists a distance d on F such that Proof. Consequence of the previous corollary. □ We recall that the application T : F ×N → F and the process K t T are defined by T (f, t) = Λ t f and K t T = T (k t , t). This may be interpreted as an idealistic step of learning. Moreover, we assumed that τ = 0, namely no individual learning occurs in the model.

Theorem 1.
If Γ > 0, and the minimum credibility c min > 0 is fixed, then for all times t, there exists a distance d Λ t and m > 0 independent from t such that

Proof. Consequence of lemmas 1 and 2, and corollary 2. □
We define an idealistic deterministic process K t by K t+1 = Λ t K t and K 0 = k 0 . Theorem 1 implies that the idealistic, deterministic process converges to a common shared knowledge:

Corollary 3. Under the hypotheses of Theorem 1, there exist
3.4. Learning theory. The results presented in this part are inspired by [8]. This article deals with the inference of functions to fit with random samples. In our case, the functions are knowledge-like functions and samples come from social learning and individual learning. Nevertheless, our theoretical results shall only deal with the case where individuals learn from social sources only (under the hypothesis τ = 0). The results of this theory implies the convergence of the process ∆k t with high probability.
3.4.1. Sample error. We study the learning process from random samples governed by the probability measure ρ on Z = E × C. We recall that E is a compact subset of R n , and that C is a subset of an euclidean space including zero.

Definition 16.
We define the least square error of f as where · C is a norm on C associated with the inner product ·, · C of the ambient euclidean space E l .
Proof. Adding and subtracting f ρ yields We have For the second term we have □ As a consequence of the proposition 4, the regression function f ρ minimizes the mean square error ε.
Definition 17. Let f F be the target function that minimizes ε: During the learning phase, the probability measure ρ is not assumed to be known. The learning process is a minimisation procedure on a sample S = ((e 1 , c 1 ), ..., (e m , c m )), m ∈ N * .

Definition 18. We define the empirical error ε S of f on the sample S by
and f S the empirical target function, namely a minimizer of ε S : This minimizer is of course not unique. Nevertheless, when the size m of the sample is large enough, the empirical target function will approximate the target function. More precisely, one has the following classical concentration inequality from [25]: Proposition 5. We assume that: (1) F is a compact and convex set; Then for all η > 0, where N (F, s) is the so-called covering number, namely the minimal l ∈ N such that there exists l disks in F with radius s covering F. Since F is compact, this number is finite.
We can now get back to our model. We recall that the probability measure ρ i,t allows the sample for the learning of the individual i at time t to be drawn, and that τ = 0. The probability measure ρ i,t then depends only on social learning. During the learning phase of our model we have : Since F is convex, f ρ i,t ∈ F. If E is finite, or in the other case, if F is a set a continuous functions, we have f i,t F = f ρ i,t with f i,t F being the minimiser of the error ε with ρ = ρ i,t .

Main result.
Combining our results on the one-step idealistic process, together with the ones on learning theory, we are able to study the convergence of the full process k t with high probability. We recall that M F = {(k, ..., k), k ∈ F }.
Proof. Let d be the distance defined in corollary 1.
We recall that the application T : t). Notice in particular that the process K t T is different from k t . By the triangle inequality we have, using (3.3) and (3.2), that The contractivity of the second term is yielded by Theorem 1: there exists α t < 1 such that Now, we need to estimate the other term. We recall that E is compact in R N . By the compactness of E and F we have that Let us now define the norm · F N ρ on F N by: For all 1 ≤ i ≤ N , one has: In particular, Thus, gathering (3.7) and (3.8), and using the convexity of the exponential, Let d F N ρ be the distance on F N defined by the norm · F N ρ . For all f and g in F N , we have: For all f and g in F N we also have: All the norm being equivalent on (R l ) N , there exist C ′ A and C A such that Iterating on the discrete times, one has with confidence at least Let α * = max i=0,...,N max t α i (t). According to Lemma 3, one has α * < 1, yielding that Thus, for any 0 < δ < 1, choosing the parameter m such that yields with confidence at least 1 − δ that Taking η = α 2t * d(k 0 , M F ) 2 /N finishes the proof. □ Remark 6. When time t goes to infinity, so does the number of sample m t needed for the convergence in Theorem 2 to occur. Indeed, using Section 7.1 of [10], there exists C F > 0 and a > 0 such that Plugging this into (3.10) yields that Choosing appropriately δ as a function of t, one can show that f t tends to M F almost surely. One can then define the minimal sampling size m(t) by which tends to +∞ when t → +∞. Proof. Let ϵ > 0. For all t big enough, one has

Numerical simulations
Let us now both illustrate the mathematical results of the paper, such as Theorem 2, and show that some generalizations also hold when individual learning is possible (τ > 0). Individual learning allows new experiences and observations, and original conceptualizations which make possible the evolution of knowledge for both individuals and populations.

Illustration of the main theorem.
Our model aims to be used by theoretical anthropologists. To show its usefulness, we illustrate the results of our main theorem in specific cases. Test 1. Impact of self-inertia. As a first numerical test, we aim to illustrate Theorem 2. Let E = {1, ..., 5} and C = [−10, 10]. We consider a relationship between two individuals (labeled 1 and 2). The structure matrix 7 is given by Parameter α can be interpreted as cognitive inertia (see Remark 2). The higher the α, the less change in individuals' knowledge-like functions along the dynamics. As likelihood landscape (10), we take L(e, c) = 1 for all e ∈ E and c ∈ C\{0}, and take c min = 0.1. We define F as the set of continuous functions from E to C so F is convex. As E contains a finite number of elements and C is compact, then F is compact. We set τ = 0 so the dynamics is only driven by social learning.
When α varies in (0, 1), all hypotheses of Theorem 2 are met (even though, strictly speaking, we do not illustrate exactly the theorem because we cannot compute m t ). Our numerical simulations show the population converges to a common shared knowledge, which is consistent with our mathematical results.
At the initial state, the knowledge-like functions of individuals 1 and 2 are k 0 1 and k 0 2 , respectively, given by k 0 1 (e) = 2 ∀e ∈ E, k 0 2 (e) = 6 ∀e ∈ E. Let d be the distance defined in Corollary 1. By using numerical simulations we follow the evolution of d(k t , M F ) through time for different values of the parameter α. We ran 100 simulations. The average dynamics is presented in Figure 6. When α = 1 the population rapidly converges to a common shared knowledge (Fig. 6) as predicted by Theorem 2. We notice that this convergence is exponential, which is expected given that the process is driven by an inhomogeneous Markov chain. We notice that the convergence is faster when α = 0.5.
When α = 1 the matrix does not respect the hypothesis of Theorem 2 since γ 12 = γ 21 = 0. It corresponds to the case where individuals do not communicate with each other. Thus individuals knowledge-like functions do not vary through time, and the process does not converge towards a common shared knowledge.

Test 2. A professor and its audience.
Now let us consider a population of 5 individuals: 1 professor (1) and 4 students (2, 3, 4, 5). We keep the same setting as previously, namely no individual learning (τ = 0), since it is a pure teaching situation.
Let the structure matrix be , so the professor has a knowledge-like function that is more likely than that of the students. We call k eq the common shared knowledge at the equilibrium. We define ∆ i as the distance between the initial knowledge of individual i and the common shared knowledge. We have , with d C the distance induced by the inner product on C.
We ran 100 numerical simulations as previously. Results are shown in Figures 7(a) and 7(b). Figure 7(a) shows the evolution of the distance to space F with time. In both cases, whether the likelihood landscape is fixed or concave, the population rapidly converges towards a common shared knowledge. When the likelihood landscape is concave, the professor has a strong influence on her students and the convergence towards a common shared knowledge is faster. Figure 7(b) shows the values of ∆ · . In both cases, the common shared knowledge is farther away from the students' initial knowledge than from the professor's.

Creation of knowledge.
We now consider the case where individual learning is present, namely τ > 0. Although we couldn't prove a convergence result for this case, we can still use numerical approaches when the parameters of the model do not allow analytical resolution. if c = 0, exp −(x − 1) 2 otherwise, such that the function 1 F defined as: is the likeliest function. We consider a population of ten individuals and we set an initial state where K 0 = (0 F , ..., 0 F ). Let Γ be the square matrix of size N full of 1. Thus at the initial state, individuals are "newborn", that is, they have not conceptualized any experiences. We investigate convergence of knowledge towards the function 1 F and its dynamics by numerical simulations.
We define the relative entropy (RE) of the population as with for all f , g in F, When each individual in the population has 1 F as knowledge-like function, the relative entropy is maximal and equals 0. We use the relative entropy as a measure of knowledge in the population i.e. the higher the relative entropy, the likelier the individuals' knowledge. This allows us to quantify the effect of parameters on the evolution of knowledge. Figure 8 shows that the relative entropy increases with time. In our simulations, the individual learning rate was fixed at τ = 0.02. Individual learning results in new experiences and observations, while social learning promotes the spread of adequate conceptualizations. The combined effect of individual and social learning allows the population to evolve towards better solutions (Figure 9).

4.3.
Comparison with a language evolution model. Cucker, Smale and Zhou developed a model to describe the evolution of language [10]. In their model a language-like function is a function that links a space of objects to a space of signals. Our work is stongly inspired by their model and extends it by introducing individual learning and a credibility matrix that allows influences between individuals to vary though time. Although the interpretation of our variables differs, our results are consistent with those of [10] that proved the convergence of the languages of different individuals to a common shared language (although under different hypotheses, see Remark 5). Test 4. On the evolution of language. We modified our numerical method in order to simulate the model of language evolution developed in [10]. We consider two different linguistic communities of two individuals with few interactions. We take E = {1, ..., 5} and C = [−10, 10]. The individuals of the first and the second communities have the language-like function k 1 and k 2 , respectively. Where k 1 and k 2 correspond to two different languages. This language-like function is defined as ∀e ∈ E, k 1 (e) = 5, and ∀e ∈ E, k 2 (e) = 7.
We take so the two linguistic communities hardly interact. Numerical simulations show that the two communities converge to a common shared language: Figure 10 shows that the distance between the process k t and the set F tends to 0.

Conclusion
The aim of this work was to develop a more general mathematical model of knowledge evolution than the existing ones e.g. [12,21,16]. Existing models have been widely used to investigate the impact of population size on the evolution of knowledge. However, they rely on strong assumptions and omit important aspects of social dynamics. Here, we developed a hybrid model, between an individual based stochastic model and a learning algorithm, that relaxes hypotheses and incorporates various forms of social interactions.
Analytical results show that interacting individuals converge with high probability towards a common shared knowledge, when no innovation occurs (i.e. no individual learning). Numerical simulations show that these results hold when individuals combine individual and social learning and that conceptualizations that appropriately reflect the structure of the world emerge across time. This model can be used to investigate knowledge evolution in hierarchically or spatially structured populations of variable sizes.