A study of connectivity on dynamic graphs: computing persistent connected components

This work focuses on connectivity in a dynamic graph. An undirected graph is defined on a finite and discrete time interval. Edges can appear and disappear over time. The first objective of this work is to extend the notion of connected component to dynamic graphs in a new way. Persistent connected components are defined by their size, corresponding to the number of vertices, and their length, corresponding to the number of consecutive time steps they are present on. The second objective of this work is to develop an algorithm computing the largest, in terms of size and length, persistent connected components in a dynamic graph. PICCNIC algorithm (PersIstent Connected CompoNent InCremental Algorithm) is a polynomial time algorithm of minimal complexity. Another advantage of this algorithm is that it works online: knowing the evolution of the dynamic graph is not necessary to execute it. PICCNIC algorithm is implemented using the GraphStream library and experimented in order to carefully study the outcome of the algorithm according to different input graph types, as well as real data networks, to verify the theoretical complexity, and to confirm its feasibility for graphs of large size.


Introduction
In static graphs, connectivity is measured thanks to the computation of connected components. The problem of graph connectivity is relevant to many applications and contexts, such as communication networks, logistic networks or social networks. Furthermore, when a graph can be decomposed into several connected components, many problems can be decomposed too and solved separately on the different components. For instance, coloring problems, matching problems or vehicle routing problems can be decomposed.
In some cases, connectivity is also a necessary condition that needs to be checked before solving a problem. Flows, for example, cannot be computed if the source and sink do not belong to the same connected component.
Furthermore, time is an important issue that needs to be taken into account in many fields. Indeed, the interactions between entities are not necessarily static, and their nature might not be constant over time either. Static graphs do not allow the modeling of interactions which are evolving over time. The logical extension of graphs allowing this is then dynamic graphs. In a dynamic graph, vertices and edges can be present or absent depending on time. Every piece of information carried by vertices or edges can also be time-dependent.
The issue addressed in this paper is connectivity in a dynamic graph. Several questions are answered. First, what does connected component mean in a dynamic graph? And second, how can connectivity be measured in a dynamic context? We propose an extension of connected components in dynamic graphs, called Persistent Connected Components (PCC). This new definition takes into account the temporal dimension of the graph, space and time being considered simultaneously. PCCs are defined by their number of vertices, similarly to connected components in static graphs, but also by the number of consecutive time steps they are present on. We propose a polynomial time algorithm computing non-dominated PCCs in a dynamic graph and the associated Pareto front. This algorithm is studied together with experiments that show tractability even for large graphs on a long time horizon. The experiments were carried on with different graph types in order to study the impact of the graph structure on the results. There were also experiments made on large size real instances in order to verify the applicability of our algorithm on real data.
Section 2 presents the main concepts of dynamic graphs necessary for this work, and related works. Section 3 presents the persistent connected components (PCC). Section 4 introduces the algorithm designed to detect PCCs in a dynamic graph. The experimental study is presented in Sect. 5 and concluding remarks are given in Sect. 6.

Main concepts and state of the art
Dynamic graphs, also known in the literature as dynamic networks, time varying graphs (Casteigts et al. 2012), evolving graphs (Bui-Xuan et al. 2003), temporal graphs (Michail 2016) or temporal networks (Holme 2015), have been studied mostly in the past 20 years. Holme (2015) made an extended survey.
When we consider a graph and its evolution over time, we work on a dynamic graph. The dynamicity can be on vertices, edges or both. The presence of vertices or edges can be modified during the interval on which the graph is studied. All the information carried by vertices or edges can also be time-dependent (costs, capacities, storage, etc.).
Definition 1 A study interval T = {1 . . . T } is a discrete set of T time steps. The end of this interval, noted T , is called the time horizon.
Definition 2 A t-graph, also called snapshot, noted G i , i ∈ T , is a static graph corresponding to the dynamic graph G at a given time step i. Definition 3 A dynamic graph G is simply noted G = (G i ) i∈T and is defined on a study interval T = {1 . . . T }. It is a succession of t-graphs G i = (V , E i ), i ∈ T , such that all t-graphs are defined over the same vertex set. We denote by n the cardinality of V .
Note that these definitions, close to the literature, allow to isolate a vertex by removing its adjacent edges. In terms of connectivity, this is equivalent to removing this vertex.
A compact representation of a dynamic graph can be given. See for example Fig.  1a where the labels on edges represent their times of presence. Figure 1b-e show the succession of static graphs. Bui-Xuan et al. (2003) extend the definition of paths to dynamic graphs. The equivalent, in a dynamic graph, of a path in a static graph is a journey. A journey from a vertex u to a vertex v in a dynamic graph starts from u at time step i start and ends on v at time step i end . It is a succession of paths P i in static graphs G i . P i start starts on vertex u in G i start . P i end ends on vertex v in G i end . Path P i in G i , i start ≤ i ≤ i end , ending on a vertex w enforces path P i+1 to start on the same vertex w in G i+1 . A similar definition is used in the work of Kempe et al. (2002) in which the edges of the graph appear exactly once. In the following of the section, a focus is made on connectivity issues for dynamic graphs.
In static graphs, a connected component is a maximal set of vertices that are connected through edges in the graph. In other words, for two vertices u and v in the component, there exists a path between u and v in the graph. In directed graphs, the definition can be extended in two different ways: strongly and weakly connected components, whether there exists a directed path from u to v and from v to u, or a path between u and v when edge directions are ignored.
In dynamic undirected graphs, the existence of a journey from a vertex u to another vertex v does not imply the existence of a journey from v to u. Because of the edges time of presence, journeys are directed in dynamic graphs. In Fig. 1, there is a journey from vertex 2 to vertex 3 going through edge (2, 4) at time step 2 and edge (4, 3) at time step 3. There is no journey from vertex 3 to vertex 2.
Based on the definition of journeys from Bui-Xuan et al. (2003), Bhadra and Ferreira (2003) give a definition of strongly connected components in a dynamic directed graph. Their definition can also be applied to undirected graphs. Such a component is a maximal set of vertices such that for all vertices u and v in the component, there exists a journey from u to v and a journey from v to u in the graph. A distinction is made between closed strongly connected components and open strongly connected components. In the former, the journeys must cross vertices inside the component only whereas in the latter, journeys can cross vertices outside the component. In Fig. 1, {1, 2, 4} is an open strongly connected component. There exists a journey, both ways, between each pair of vertices. The journey from vertex 4 to vertex 1 goes through vertex 3 which is not in the component, because there exists no journey from vertex 3 to vertex 2. Bhadra and Ferreira also prove that the problem of finding a connected component (open or closed) of size k for a given value k is NP-Complete.
This definition implicates an interesting feature. Unlike static graphs, in dynamic graphs, connected components do not partition the vertices. Strongly connected components in dynamic graphs can overlap, as a vertex can be a part of two distinct components. In Fig. 2, {1, 2, 3, 4} is a closed connected component. There is a journey both ways between each pair of vertices of the component going only through vertices of the component. For the same reasons, {4, 5, 6, 7} is also a closed strongly connected component. Vertex 4 is part of both components. Jarry and Lotker (2004) use Bhadra and Ferreira's definition and show that asking whether a graph is connected or not is NP-hard even for two-layer grids but is polynomial in the case of trees. They propose an algorithm for this particular case. Nicosia et al. (2012) work on connectivity on dynamic graphs using a definition corresponding to the open strongly connected components of Bhadra and Ferreira (2003). They propose a way to solve the problem of finding such components in a graph using a clique search in a static undirected graph, which is not polynomial.
Gómez All those definitions are based on journeys in the dynamic graph. Some vertices can be in the same component and never be connected at any time step of the graph. Vertices 1 and 4 from example in Fig. 1 are never directly connected by an edge or a path in any t-graph.
The main usage of such definitions is message transmission. Casteigts et al. (2015) work on connectivity in dynamic graphs and define the τinterval connectivity (which they call T -interval connectivity). A dynamic graph is τ -interval connected when the intersection of G i . . .
where T is the time horizon, is a connected graph in the static sense. They propose algorithms needing O(T ) operations (binary intersection and connectivity test) to solve this problem. They do not propose a definition of connected component based on their definition of τ -interval connectivity.
Akrida and Spirakis (2019) present a continuous time model. They define interval temporal networks as graphs for which a set of intervals of availabilities is defined on each edge. An edge is present during the defined intervals. They propose a polynomial  3, 4} and {4, 5, 6, 7} time algorithm able to give the longest time interval starting at a given time x and ending before a given time y on which the graph remains connected. Unlike the work of Casteigts et al. (2015), the connection is not necessarily made using the same edges. They present a second algorithm computing the sets of vertices of cardinality larger than a given bound that remain connected for the longest period of time starting at a given time x. This gives connected components that do not overlap, unlike other definitions found in the literature. For both algorithms presented, the choice of parameter x determines the outcome of the algorithm. If the graph starts to remain connected at a later time than x or if the graph has large components that start being connected later than x then the algorithms do not detect it.

Persistent connected components
This section defines the persistent connected components. We propose a point of view of connectivity in dynamic graphs, which is not based on journeys, unlike most of the definitions found in the literature. An example is given and applications are discussed.

Definitions and notations
Informally, a persistent connected component represents a set of vertices in V that are connected in the graph for several consecutive time steps.
Definition 4 A persistent connected component (PCC) p of G is a quadruplet such that p = (K , k, l, f ) where K is a set of vertices, k is the size of this set, l, called length, is the number of consecutive time steps where K is connected (either directly or through other vertices of the graph) and f is the last one of those time steps.
In other words, PCCs are characterized by their size (number of vertices) and length (number of consecutive time steps). Analogously to the static definition that implies maximality regarding to inclusion, we define maximality regarding to inclusion of both vertices and time steps.
Definition 5 A maximal persistent connected component is a PCC which is maximal regarding its size and length. For a given maximal PCC p = (K , k, l, f ) with K = {u 1 , . . . , u k }: (2) A maximal PCC is a PCC such that its vertex set is not included in a bigger vertex set connected on the same time steps (Condition 1), and the same vertex set is not connected on the previous time step (Condition 2) nor on the next time step (Condition 3). In the following, we will only consider maximal PCCs, therefore by slight abuse of notation, the term PCC will be used to refer to a maximal PCC.
It should be noted that both directed and undirected graphs can be considered. In the case of undirected graphs, we consider, for PCCs, a set of k vertices simply connected. And in the case of directed graphs, we consider a set of k vertices strongly connected. All the definitions hold.
In static graphs, we can look for the largest connected component. Similarly, in dynamic graphs, we aim at finding the biggest persistent connected components in terms of size and length. Hence there are two criteria to optimize, that is the reason to look for a Pareto front formed by all non-dominated PCCs.
Definition 6 p = (K , k, l, f ) is a non-dominated PCC at a given time step i if and only if there exists no PCC p = (K , k , l , f ) = p, with f ≤ i and f ≤ i, such that: Conversely, if such PCC p exists, it is said that p is dominated by p .
Note that if a PCC is dominated at time step i, it remains dominated at each time step larger than i. Condition 6 from Definition 6 implies that in the case of two components of same size and same length, the earliest one is considered non-dominated. If furthermore two components have the same size and length and finish at the same time step, then an arbitrary total order on the vertex subsets shall be used (for instance, the lexicographic order), this is insured by condition 7.
One may question the extension of the chosen dominance, particularly considering Conditions 6 and 7. The first motivation is the wish to compute some "large" components, that have a meaning considering the applications (e.g., a subset of nodes considered as "safe" when connectivity is considered). Computing all such large components may not be useful, it may not be practical either. The second motivation is to provide the Pareto curve itself, considering size and length. Indeed, this curve provides a characterization of the network, as our experiments will show. The third motivation is practicality. Thanks to our definition, a set of at most min(n, T ) PCCs is obtained. If Conditions 6 and 7 are relaxed, the size of the obtained set might grow exponentially with n. A dynamic graph with this characteristic is easy to build, considering for instance that each G t connects exactly one different vertex subset of size n 2 . This entails some complexity issues, both in time and in space, to compute and to keep the set.
One may object that to get such a large number of PCCs, a number of time steps exponential in n is needed, which is true. Still, increasing the time horizon will mechanically increase the set size, which is not true with our definition. Furthermore, another dynamic graph can be built, for which the number of PCCs is O(n 2 ) when T = O(n). Suppose n = 2 · k, and at each time step t ≤ k, the graph G t is composed of k disjoint edges. These edges are different at each time step. To achieve this, it suffices to consider as edges the node pairs (i, j), i ∈ {1 . . . k}, and j taken by circular permutation from the set of integers {k + i, k + i + 1 . . . , k + i + k − 1}. Hence, after k time steps, exactly k × k PCCs are obtained, all with size 2 and length 1. In this example, a number of PCCs quadratic in n is obtained in a number of time steps linear in n.
Hence even for small values of T , the number of PCCs might increase rapidly.
These two examples show that our dominance definition allows to avoid an explosion on the number of solutions. The experiments presented later in the paper show that computing these solutions, with the appropriate algorithm, is possible for large values of n and T .
A PCC (as per Definitions 4 and 5) is a maximal set of k vertices that stay connected for at most l consecutive time steps. Therefore, in all of those time steps, for any two vertices in the PCC, there exists a path between them. A PCC is then always part of an open strongly connected component (as defined in Sect. 2). Our definition of connectivity, like those of Casteigts et al. (2015) and Akrida and Spirakis (2019), is not based on journeys. Nevertheless, it differs from their definitions. If a graph is τ -interval connected according to the definition in , then this graph is necessarily connected over T and it has a persistent connected component composed of the n vertices of the graph and lasting for the whole study interval T . The reciprocal implication does not hold, because even if a graph remains connected over T , as the connection might not be achieved with the same edges, then it will not necessarily be τ -interval connected for τ > 1. In Akrida and Spirakis (2019), as they work with a continuous time model, the dynamic graph cannot be described as a succession of static graphs.
Unlike most problems described in Sect. 2, the problem of finding non-dominated PCCs can be solved polynomially (as proved in Sect. 4.4).

Applications
Connectivity in a dynamic graph finds applications in many fields. Remember first that travel times associated to edges are not considered in this model. This is appropriate when the network's dynamics is slow compared to the time necessary to cross an edge.
In communication networks such as ad hoc networks or sensor networks, the transmission is almost instantaneous. In such networks, a PCC is a subnetwork that remains connected, which is essential when communications are considered, see for instance (Koster and Muñoz 2009).
In transportation networks, roads availability can be time-dependent. The unavailability of a road can be temporary, new roads can be built and existing roads can be closed. Travel time on edges are often negligible compared to the networks dynamics. A PCC would measure in this case the reachability of different locations. Démare et al. (2017), for instance, use dynamic graphs to model the transportation network on the Seine valley.
In social networks, there is no travel time on edges because they represent a relationship, see the seminal work of Newman (2003). Most works on community detections use local edge density, some of them also consider time dimension (Nguyen et al. 2011). If we consider that a community must verify the connectivity condition between its members during some time interval, then detecting PCCs, according to our model, will help identify these communities.  Figure 3 presents a dynamic graph on 4 time steps and 5 vertices. This graph is not connected through the whole study interval. The t-graphs G 1 and G 4 are disconnected and both have two connected components ({1, 2, 3} and {4, 5} in G 1 and {1, 5} and {2, 3, 4} in G 4 ). The t-graphs G 2 and G 3 only have one connected component containing all vertices. There are 6 maximal persistent connected components in this graph.

Example
Vertices 1 ad 5 are connected from time step 2 to time step 4, even though they are directly connected with an edge only on time steps 3 and 4. They form a persistent connected component p 1 = ({1, 5}, 2, 3,4). Similarly, vertices 4 and 5 are connected from time step 1 to time step 3 and therefore they form a persistent connected component p 2 = ({4, 5}, 2, 3, 3). In this graph, vertices 1, 2 and 3 are connected from time step 1 to time step 3. It can be noticed that in G 2 , even though they are connected, they form an independent set. Those vertices form a persistent connected component 3,3,3). Vertices 2, 3 and 4 are connected from time step 2 to time step 4 and form a persistent connected component p 4 = ({2, 3, 4}, 3, 3, 4). As the graph is connected from time step 2 to time step 3, vertices 1, 2, 3, 4 and 5 form a persistent connected component p 5 = ({1, 2, 3, 4, 5}, 5, 2, 3). It can be noticed that vertices 2 and 3 stay connected over the whole study interval, even though they are directly connected only in G 3 . Therefore vertices 2 and 3 form a persistent connected component p 6 = ({2, 3}, 2, 4, 4). In the sense of Definition 4, p 1 , . . . , p 6 are all persistent connected components, and they are all maximal in the sense of Definition 5. In the sense of Definition 7, components p 1 and p 2 are dominated by both p 3 and p 4 because of condition (4) of Definition 6. Component p 4 is dominated by p 3 under condition (6). Components p 3 , p 5 and p 6 are non-dominated. Those last components are the ones that we want to find. Figure 4 shows a representation of the PCCs from the graph of Fig. 3. Persistent connected component p 3 is represented in orange, p 5 is in blue and p 6 is in green. It is clear that PCCs are not disjoint because a vertex can belong to several PCCs.

PICCNIC algorithm
This section presents the PICCNIC Algorithm (PersIstent Connected CompoNent InCremental Algorithm) whose goal is to find the Pareto front containing all nondominated solutions, that is, every non-dominated PCC.
The algorithm will be presented together with as an execution example. Correctness and complexity of this algorithm will be proved.

Presentation
PICCNIC is presented in Algorithm 1. Its objective is, for a given dynamic graph, to find all non-dominated persistent connected components in the sense of Definition 7 that have size bigger than k min and length bigger than l min . The default value of k min is 2 because a component of size 1 is not relevant as any vertex is connected to itself for the whole study interval of the graph. And the default value of l min is 1 because we consider any set of connected vertices to be relevant. This algorithm works incrementally on the time steps and can therefore be used online.
We can access the vertex set of a PCC p with K ( p). The same way, we can access the size of a PCC p with k( p), we can access it length with l( p) and its finish date with f ( p).
Several sets of components are used to compute the persistent connected components. PCC n contains, at the end of iteration i, the components existing on G i . To be built, PCC n needs PCC t which is a temporary set containing PCCs whose vertex set is contained into one connected component of G i . At the beginning of iteration i, PCC c keeps the components that existed on G i−1 . PCC o contains, at each iteration, the components that just finished. At the end of iteration i, PCC f contains the non-dominated persistent connected components at time step i.
Note that sets PCC t , PCC n , PCC c contain ongoing PCCs, therefore they are not maximal PCCs. Sets PCC o and PCC f only contain finished, and therefore maximal PCCs.
Each iteration of the algorithm starts by retrieving all connected components of t-graph G i (line 3). Those components are strongly connected components in the case of directed graphs, and simple connected components in the case of undirected graphs. We discard components of size lower than k min . Default value is 2 because a vertex is necessarily connected to itself for the whole study interval, so each dynamic graph has n PCCs of size 1 and length T .
The first step of the algorithm (given in Algorithm 2) aims at finding the new persistent connected components beginning at i and keeping the components that are still going on at i. It uses the function Add PCC (lines 20 and 26) to add a PCC to a PCC set by checking that the PCC set does not contain another PCC with the same vertices. If it does, the PCC with the highest length is kept in the PCC set. Simple unions are also used when considering PCCs with disjoint vertex sets (line 28).
The second step of the algorithm (given in Algorithm 3) works on the persistent components that are over at the current time step (meaning that their finish date is the previous time step) and keeps the non-dominated components. It uses Algorithms 4 and 5 to check if one component p 1 dominates another component p 2 depending on which one finishes first. Those domination algorithms check the conditions given in Definition 6. The order on the vertex subsets of 2 V can be for example a lexicographical order.

Example
Let us look at the execution of PICCNIC Algorithm on example from Fig. 3. We consider persistent connected component of size at least 2 and length at least 1 (k min = 2, l min = 1).
In the second iteration, the current t-graph is G 2 (see Fig. 3c). It has only one connected component {1, 2, 3, 4, 5}. At the end of the iteration, the set of current

Algorithm 1 P I CC N I C Algorithm
Input: Dynamic Graph G, study interval T = {1, . . . , T }, lower bound on the size of PCCs k min , lower bound on the length of PCCs l min Output: Non-dominated persistent connected components // Loop on the number of instants 1: for all i ∈ {1, . . . , T + 1} do 2: PCC c = PCC n // for the next iteration 11: end if 12: end for return PCC f

Algorithm 2 P I CC N I C Step1
Input:CC, PCC c , k min Output:PCC n , the set of possible new PCCs 1: PCC n = ∅ 2: for all c ∈ CC AND |c| ≥ k min do 3:
In the third iteration, the current t-graph is G 3 (see Fig. 3d). Just like the previous time step, it also has only one component. All components present

Algorithm 5 Dom Later Equal
Input: Two PCCs p 1 and p 2 such that f ( In the fourth iteration, the current t-graph is G 4 (see Fig. 3e). It has two con- On graph from Fig. 3, there are 3 non-dominated components: one of size 3 and length 3, one of size 5 and length 2 and one of size 2 and length 4.

Theorem 1 PICCNIC provides the set of all non-dominated persistent connected components, in the sense of Definition 7, of a dynamic graph G on interval T .
Proof To prove that the algorithm is correct, we have to show that: 1. For any time i ≤ T , any non-dominated PCC at i (in the sense of Definition 6) is present in PCC f ∪ PCC c at the end of iteration i. 2. At the end of the algorithm, only non-dominated PCCs are present in PCC f . Let us prove the first part. At i = 1, the result is trivial, as the set PCC c contains all connected components of G at the first instant. Let us suppose the result is true for some 1 ≤ i ≤ T , and consider iteration i +1. Let p = (K , k, l, f ) be a non-dominated PCC at i + 1. from definition 6, p finishes at i + 1 or before.
Suppose first p finishes at f ≤ i. If it is dominated at time step i, it is still dominated at time step i + 1 which contradicts the hypothesis on p. So p is non-dominated at time i. By induction hypothesis, p belongs to PCC f or to PCC c (if f = i). In the first case, p is removed from PCC f by Algorithm 3 only if it is dominated at i + 1, which is false. In the second case, p is removed from PCC n by Algorithm 2, then put into PCC f by Algorithm 3, again as it is non-dominated at i + 1.
Suppose now p finishes at i + 1. If its length is 1, it is necessarily a connected component of G i+1 . In the algorithm, its associated boolean Pers is false and p is directly included in PCC t , then in PCC n , then in PCC c . If its length is larger than one, let us first suppose that p ∈ PCC c at the beginning of the iteration. In this case, K ( p) must be included into a connected component of G i+1 . Therefore it is put into PCC t and function Add PCC makes sure that the oldest PCC is kept in PCC t . It is then added to PCC n and then to PCC c . If p does not belong to PCC c at the beginning of the iteration, it means at all previous iterations it was not kept by the algorithm (as a component is never removed from PCC c until it is finished), although it was included into one connected component of each of the associated graphs. This implies that at all of these instants, K ( p) was strictly included into K (q), where q is another PCC, present in PCC c at least at i, and finishing at i, therefore p and q have the same starting time. If q does not exist, p would have been added to PCC c before as it would have appeared as an intersection of a member of PCC c with a connected component. K (q) is not included into a connected component at i + 1 as q is finished, and the intersection of K (q) and some connected component c at i + 1 is K ( p) (a larger intersection implies a larger subset of q present at i + 1, but p is non-dominated). Therefore p is included into PCC t , then PCC n and into PCC c at the end of iteration i + 1.
Let us now prove the second part. Suppose there exists some PCC p = (K , k, l, f ) that is present before the final step and which is dominated by some non-dominated PCC p = (K , k , l , f ). We just proved that p is kept by the algorithm at T . If p ∈ PCC c at the end of iteration T , during the final step it is placed in PCC o as set CC is void. It is then tested against all other elements of PCC c , then against all elements of PCC f . Therefore it must be tested against p and removed. If p ∈ PCC f and p ∈ PCC c at the end of iteration T , then for the same reason p will be tested against p, and p will be eliminated. Suppose now p, p ∈ PCC f at the end of iteration T . They have been tested together when the latest PCC among them has finished (if they finished simultaneously, they are tested together by the algorithm while in PCC o ). Therefore it is impossible that p and p are both present.
In the final step of the algorithm, PCC n is empty, as i = T + 1, therefore at the end of this step, PCC c is also empty so non-dominated PCCs finishing at time step T are in PCC f (as proved in the first point) and only non-dominated PCCs are present in PCC f at the end of the algorithm (as proved in the second point), which proves the theorem.

Complexity
In this section, we prove that Algorithm 1 is polynomial.

Lemma 1 At the end of any iteration 1 ≤ i ≤ T , the cardinality of PCC c is bounded by n − N bCC(G i ), where N bCC(G i ) is the number of connected components of the graph G at iteration i.
Proof PCC t is built for each static connected component of G i . It does not contain doubles, meaning that it does not contain two PCCs that have the same vertex set. The function Add PCC makes sure of that. All the elements from PCC t are then added to PCC n . The intersection between a PCC from PCC c and a static connected component can produce the same vertex set only when we consider the same static component. When we switch the current static component c, the intersections will necessarily give different vertex set. Therefore, the new PCCs added to PCC t and then to PCC n do not contain the same vertices as PCCs present in PCC n . Therefore, there are no doubles in PCC n . As PCC c is built from PCC n a the end of each iteration, it does not contain doubles either.
Let p = (K , k, l, i) and p = (K , k , l , i ) be two elements of PCC n at the end of iteration i such that K = K . Let p start before p . K and K have at least cardinality k min . Both are necessarily included into one connected component of G i .
If they are included into two different components, K and K are disjoint. Suppose now they are included into the same component c of G i and not disjoint (K ∩ K = ∅). If one is not included into the other, K = K ∪ K is also included into c. Furthermore, a PCC p associated to K is present since the iteration θ where p and p were first both present. As p starts before p , it means p appears simultaneously with p . But this is impossible since K strictly contains K and K . Therefore p does not appear. Consequently K ⊂ K (if K ⊂ K p cannot be included into PCC n ). And we can say that the vertex sets of the PCCs present in PCC c are either strictly included or disjoint.
It is easy to verify by induction that the number of such subsets is bounded by the cardinality of the set minus 1, regardless of the way those subsets are chosen. Indeed, a subset contains at least 2 elements (because k min ≥ 2) and the subsets are either disjoint or strictly included. It is obvious that if the set has 2 elements, at most one subset is acceptable according to the previous conditions. Now consider a set having ω elements. Suppose it has ω − 1 subsets, such that this number is maximal (it is impossible to create another subset without violating the strictly included or disjoint condition of the subsets). When one element e is added to , we can either keep the same subsets, and in this case there are still ω − 1 subsets, or we can create a new subset by making an union between {e} and an existing subset, and in this case there are ω subsets. As two subsets are either strictly included or disjoint, we cannot create another subset, regardless of the way the subsets were initially chosen. We can conclude that a set with ω elements has at most ω − 1 subsets such that each subset has at least cardinality 2 and two subsets are either strictly included or disjoint.
G i is divided into α connected components of size κ 1 , . . . , κ i , . . . , κ α . Applying the previous reasoning for each component gives at most κ i −1 subsets in each component, hence the result.

Lemma 2 At the end of any iteration 1 ≤ i ≤ T , the cardinality of PCC f is bounded by min(n − 1, i).
Proof This is trivially true for i = 1, the set is then empty. Now suppose it is true for some 1 ≤ i ≤ T . Theorem 1 states that two components linked by a dominance relation cannot be together in PCC f . Suppose two PCCs p and p of same length l are present in PCC f at i. One of them necessarily dominates the other so only the non-dominated one is in PCC f by construction of PCC f . Therefore, for a given 1 ≤ l ≤ T , there is at most one PCC of length l in PCC f , so at most i elements in PCC f . Conversely, two elements of PCC f must have different sizes, so they are at most n − 1 (remember singletons are not considered).

Lemma 3 The complexity of one iteration of Algorithm 1, including the final one, is O(n 2 ).
Proof Let us first remark that all sets of PCCs considered during one iteration have at most n elements. This is immediate for PCC c and PCC f from the two previous lemmas. It is true for PCC n as it is equal to PCC c at the end of the iteration. It is also true for PCC o as it is included into PCC c from the previous iteration, and for CC by definition. With adequate data structures for these sets, like binary search trees, adding or removing one element from one of these sets is done in O(log(n)) and operations of comparison and intersections can be done in time linear in n.
The first iteration reduces to the assignment of PCC c which is done in time n·log(n). So let us concentrate on the other iterations, the final one included. In each iteration of the algorithm, the static connected components of graph G i are computed. This is done in time O(m + n), which is bounded by O(n 2 ).
The first step of the algorithm (given in Algorithm 2) contains a loop on the set of connected components CC (|CC| ≤ n) of G i in which there is a loop on the set PCC c (|PCC c | ≤ n). Each element of each set has also its size bounded by n but the elements from CC form a partition of graph's vertices. So a set C α of CC has size κ α such that α κ α = n. The comparisons (line 8 of Algorithm 2) and the intersections (line 13 of Algorithm 2) between c from the set CC and p from the set PCC c are made in time O(min |c| ; |p| ). For each component C α of CC, each one of these operations costs O(n·κ α ). The total costs O(n·(κ 1 +· · ·+κ |CC| )), so O(n 2 ). With the same reasoning as for Lemma 1, we obtain that the PCCs in PCC t are either disjoint or strictly included into one another. They only contain vertices from the current component C α of size κ α . This set does not contain doubles as the role of function Add PCC is to add a PCC to PCC t without creating doubles. Therefore, the size of PCC t is bounded by κ α . We stated earlier that adding an element to PCC t could be done in time logarithmic regarding the size of the set. The function Add PCC compares the lengths of identical vertex sets of PCC in constant time, so adding a PCC to PCC t with function Add PCC can be done in time log(κ α ). Instruction line 20 costs O(log(κ α )) for each p in PCC c . So this instruction costs O(n · log(κ α )) for each C α in CC. for the whole algorithm it costs O(n · (log(κ 1 ) + · · · + log(κ |CC| ))). As log(κ α ) < κ α , the total complexity of instruction line 20 is bounded by O(n 2 ). Instruction line 26 costs O(log(κ α )) for each C α in CC. For the whole algorithm, it costs O( α log(κ α )), which is bounded by O(n). Instruction line 28 adds all the elements from PCC t to PCC n . The size of PCC n is bounded by n so adding one element is done in time O(log(n)). As the set PCC t contains at most κ α elements, adding them all costs O(κ α · log(n)) for each component C α in CC. For the whole algorithm, it costs O( α κ α · log(n)) which is bounded by O(n log(n)). In total, algorithm 2 costs O(n 2 ).
In the second step of the algorithm (given in Algorithm 3), both loops contain at most two domination tests made in an inner loop in constant time. This step of PICCNIC also deletes elements from PCC o or PCC f . As the size of each one of these sets is bounded by n, it is not possible to delete more than n elements. So the deletion operations costs O(n · log(n)) in total. The number of inner loops is at most |PCC o | 2 for the first loop and |PCC o | · |PCC f | for the second one. so both loops have complexity O(n 2 ). This step also contains operations of differences and unions of PCC sets outside any loop. These operations are done in O(n log(n)). Overall, the complexity of Algorithm 3 is also O(n 2 ).
Finally, the instruction line 10 of Algorithm 1 is done in constant time. Therefore, one iteration of PICCNIC costs O(n 2 ).

Theorem 2 The complexity of PICCNIC is O(n 2 ·T ), where n is the number of vertices and T the time horizon.
Proof The proof is immediate from the previous lemma, as the number of iterations is T + 1.

Finding all maximal PCCs
The PICCNIC algorithm (given in algorithm 1) identifies all non-dominated persistent connected components in the sense of Definition 7 in polynomial time with complexity O(n 2 · T ). One can also use a slightly modified version of PICCNIC to retrieve every maximal persistent connected component (in the sense of definition 5) in a given dynamic graph.
If we do not compare the PCCs in the second step to delete dominated components, then P I CC N I C Step2 can be reduced to the given algorithm 6. All the proofs still hold, and this new version of PICCNIC gives all maximal persistent connected component in time O(n 2 · T ).

Experimental study
In this section, we propose an experimental study of our algorithm. This is done using the GraphStream Java library 1 (Dutot et al. 2007). The virtual machines used for this experiment have Intel Core 64 bits processors with 8 cores, 4 MB cache size, 2 GHz frequency and 96 GB of RAM (except for the experiment on the StackOverflow network where 400 GB were necessary). The OpenJDK 1.9 Runtime Environment is used.
We tested our algorithm both on randomly generated graphs and on graphs from real world instances. In the first case, we generated undirected graphs, and in the second case the graphs from real instances are directed. We present both situations next.

Random graphs
Let us first focus on the experiments made on random graphs. We start by presenting the experimental settings chosen for this experiment, then we present the results of PICCNIC Algorithm, and finally we present its execution times.

Experiment settings
Graph generationIn order to evaluate PICCNIC Algorithm, we test it on randomly generated graphs. We do not aim at focusing on too sophisticated graph generation models because our first objective in this part is to verify the tractability of the approach on different classes of dynamic graphs, without focusing on specific applications. The second objective is to provide preliminary results on the pareto curve of the obtained PCCs, and to show the impact of the input graph classes. That is the reason a handcrafted generator was proposed in order to test various dynamic graph families, even though there is a large literature on various generative models, see the seminal paper from Holme (2015) and the more recent survey (Gauvin et al. 2020) on randomized reference models. The proposed generator is a link-driven model with memory mechanism (Karsai et al. 2014) which can be seen as a simplified version of link-node memory models (see for instance (Vestergaard et al. 2014)). Thanks to our model, the Fig. 5 Degree distribution of vertices for each graph type and average degree edge presence rate is a feature which is maintained over the whole study interval of the dynamic graph. First we generate the structure of the graph (vertices and edges). Then we add dynamicity to the edges using a Markovian process.
First, an underlying graph (V , E) is generated. Its vertices correspond to the ones of the dynamic graph G. Its set of edges includes all sets E i . The underlying graph is generated using generators from the GraphStream library. We test four different types of graphs that present specific features:  1]. Two vertices are connected if their euclidean distance is below a given threshold. Figure 5 presents the degree distribution of vertices from these graphs and Table 1 presents their average clustering coefficients (see (Watts and Strogatz 1998)). Figure 5 is cropped, indeed, Barabasi-Albert graphs have some vertices with very high degree. As grids are actually toruses, all vertices have same degree, exactly the average degree. Both Random and Random Euclidean graphs have a degree distribution centered on the average degree. In the following, we use the name of the generators.
Once the underlying graph is generated, the dynamics are obtained on edges thanks to a Markov chain. The one used on each edge is presented on Fig. 6. When an edge is present at time step i, it remains present at time step i + 1 with probability p. When an edge is absent at time step i, it remains absent at time step i + 1 with probability q.
We introduce a new parameter: the presence. It is equal to the stationary probability of edge presence in the Markov chain, often noted π 1 (π 1 ∈ [0, 1]). It is asymptotically equal to the rate of presence of each edge over the time study interval.
For a given presence value, there exist many values for p and q. We made experiments (which are not detailed here because it is out of the scope of this paper) and they showed that the values chosen for p and q had negligible influence on the results. Therefore we choose to fix p and q such that p = π 1 and q = π 0 (π 0 = 1 − π 1 ) in the described experiments.
Parameter SelectionWe plan to evaluate the results of our algorithm as well as its execution time. To this end, some parameters are fixed and the others vary for each experiment.
We considered PCCs of size larger than 2 and of length at least 1. In the algorithm, we have k min = 2 and l min = 1, which are the default values.
The study interval needs to be long enough so we can observe relevant results. For this reason the number of time steps is fixed to 1000.
To observe the outcome of the algorithm, the number of vertices is fixed to 1000 in order to deal with rather large graphs. In order to obtain execution times as a function of the number of vertices, n takes values in {100, 250, 400, 550, 700, 850, 1000, 1500,  Values 0.7 and 0.9 are used as presence parameter for the outcome of the algorithm. Only value 0.9 is taken into account to observe the execution time of the algorithm (results were found to be very similar for value 0.7).
In order to obtain statistically relevant observations, 10 instances of graphs are tested for each set of parameters. Table 2 synthesizes the parameters values chosen.

PICCNIC results
Figures 7, 8 and 9 represent the average Pareto fronts obtained with PICCNIC algorithm for each average degree tested and each type of graphs. Each point represents the average size of non-dominated PCCs, for each possible length.
It can be noticed that the higher the presence, the higher the fronts. When edges have a higher presence, the graph is more connected and thus PCCs are of bigger size.
By comparing Figs. 7 and 9, we can notice that the fronts are higher on Fig. 9. It means that when the average degree is higher, as the graph is more connected, the PCCs are bigger.
With degree 4 ( Fig. 7) Random Euclidean graphs do not have components with a high number of vertices. Those graphs do not have "giant" components. With degree 8 and 12, the clustering coefficient of Random Euclidean graphs presented in Table 1 and its degree distribution presented in Fig. 5 show that even though Random Euclidean graphs have many clusters, those clusters are highly connected to each other, therefore components easily persist from one time step to the next.
Barabasi -Albert graphs do not have "giant" components. Compared to other graph types, the size of non-dominated components drops drastically as the length of the components increases. This can be explained by the degree distribution. Figure 5 shows that most vertices of those graphs have small degree. Therefore, such a graph is easily disconnected in future time steps. So long-lasting large components are very unlikely. For each presence value and average degree, the front corresponding to random graphs is above the one corresponding to Barabasi-Albert graphs. For a given length, a PCC is bigger in random graphs than in Barabasi-Albert graphs. For a low average degree, the front corresponding to Random Euclidean graphs is lower than all other types of graphs whereas with high average degree, it is higher than Barabasi-Albert graphs. PCCs in Random Euclidean graphs are way bigger with a high average degree. Barabasi-Albert graphs present smaller components than Random and Random Euclidean graphs because, as previously explained, they have a majority of vertices with a low degree, therefore there are great chances that such graph breaks into small components.
The front corresponding to grids in Fig. 7 for presence 0.9 is high, meaning that PCCs have big size and stay for a long period of time steps. Grids present "giant" components. For presence 0.7, the front drops drastically. PCCs of big size are short and long PCCs are small. Grids are highly connected and robust to changes with a high presence value but they are easily disconnected when presence decreases. In Fig.  8, with a presence value 0.9, the graphs are very connected, therefore grids have only one non-dominated component with almost 1000 vertices and length 1000.

PICCNIC execution time
To study computation time of the algorithm, we compute it for all four types of graphs, with 1000 time steps, underlying graph average degree 4, presence 0.9 and different number of vertices, from 100 to 4500 (see Table 2).  Figure 10 presents median values of PICCNIC algorithm execution time for each type of graph. For each, a regression function of the form n 2 is also represented. PICCNIC worst case complexity is n 2 · T . In this specific experiment, T is fixed (to 1000), so the complexity becomes n 2 , hence the regression function of the form n 2 .
For Barabasi-Albert graphs, the R 2 value of the regression is 0.984. For grids, it is 0.979, for Random graphs it is 0.992 and for Random Euclidean graphs it is 0.997. Those R 2 values confirm that the regression of the form n 2 fits the experimental values of computational time. The experimental results fit the worst case complexity.
With a higher number of vertices, it is getting clearer that the execution of PICCNIC algorithm takes more time on random Euclidean graphs than on all other types of graphs and that it is faster on grids than on all other types of graphs. When compared to Fig.  7, it can be noticed that the algorithm on grids, which present big components (lot of vertices for a long period of time), is executed faster whereas the execution takes more time on random Euclidean graphs which present small components. This is consistent with the theoretical observations: the complexity factor is the number of components in the different sets. Although there is a significant difference of computation times between each type of graphs, the order of magnitude remains the same.
The computation time of PICCNIC algorithm, between about 422 seconds to 1577 seconds for graphs with 4500 vertices, depending on the graph type, is quite reasonable and shows that this algorithm can be used in practice for rather large graph sizes. The next section confirms that for real very large graphs.
Computing the whole algorithm in the conditions of the experiment is done in 1000 iterations. On graphs with 1000 vertices, the average computation time of one iteration is 0.056 second for grids, 0.096 second for Barabasi-Albert graphs, 0.096 second for random graphs and 0.116 second for Random Euclidean graphs. On graphs with 4500 vertices, the average computation time of one iteration is 0.422 second for grids, 0.853 second for Barabasi-Albert graphs, 1.054 second for random graphs and 1.577 second for Random Euclidean graphs. Computing one iteration of the algorithm is possible under a very reasonable time limit, so PICCNIC algorithm can be used online while the graph changes.

Real instances
Let us now present the experiments made on real instances. We start by presenting the data we used, then we present how the dynamic graphs were built based on this data, and then we present the results of PICCNIC Algorithm.

Data used
We used data from the Stanford Network Analysis Projet (SNAP)(Leskovec and Krevl 2014). This dataset collection offers a large choice of real networks, including temporal networks. We focused on two specific networks.
The first one we worked on is the Stack Overflow temporal network. In this network, each node is a user of the forum Stack Overflow. This network is directed and an arc either represents a user answering another user's question, or a user commenting another user's question or answer. All those interactions happen at a specific time over 2774 days. There are 2601,977 nodes and 63497,050 temporal arcs in this network.
The second network we worked on is the Wikipedia's Talk page temporal network. A node is a user, and an arc represents a user editing another user's talk page. Those interactions happen at a specific time over 2320 days. There are 1140,149 nodes and 7833,140 temporal edges in this network.

Building dynamic graphs from real data
In order to use this data, we had to build dynamic graphs from it. We first had to determine how much real time a time step in the graph represents. We set this amount of time to one day. It means that all the interactions happening the same day appear as arcs in the same t-graph. Similarly, we had to determine how much time we consider an interaction to last. We call this parameter the event duration. We tested several values (see Sect. 5.2.3). As a time step is one day, we set the event duration to several days. It means that when there is an interaction between two users, we consider that those users are in contact for several days. We considered PCCs lasting at least one time step (l min = 1 in the algorithm), and bigger than 100 (k min = 100 in the algorithm).
One can wonder what a PCC means in this context, and more specifically, what it means for a set of vertices to be in the same PCC. Both datasets used for this experiment represent the interactions between users of a forum. If two vertices are in the same PCC, it means that the corresponding users have been interacting with each other for a certain amount of time. So a PCC represents a group of users interacting with each other during a period of time. Either on StackOverflow of Wikipedia's Talk Page, it can be inferred that those users are interested in the same subjects.  Figure 11 shows the Pareto fronts of non-dominated PCCs obtained from the Wikipedia's Talk Page and from the StackOverflow network. We tested different values of event duration: from 5 to 100 days for Wikipedia's Talk Page and from 5 to 30 days on StackOverflow forum.

PICCNIC results
In both cases, it can be noticed that the longer we consider an interaction to last, the bigger (in terms of number of vertices) and the longer (in terms of time steps) the PCCs. On Fig. 11, the sizes of PCCs are plotted along the Y-axis on a logarithmic scale. The shapes of the Pareto fronts indicate that the biggest PCCs do not last for a long time and the smallest ones last way longer. The fronts drop fast, which is consistent with what we could notice on the previous experiment with the randomly generated graphs.
PICCNIC Algorithm's computation time does not clearly depend on the event duration parameter. The algorithm is executed on the Wikipedia network in about 5 to 7 hours. It is executed on the StackOverflow network in about 55 to 60 hours, moreover, 400 GB of RAM were necessary to run the experiment on this last dataset.
PICCNIC Algorithm successfully identifies the major groups of users interacting with each other along time. This is computed in a large but reasonable amount of time considering the time horizon.

Conclusion
In this paper, we proposed a new definition of connected components in a dynamic graph, namely the persistent connected component. Related problems have been addressed in the literature before but unlike most of those works, our definition is not based on journeys in a dynamic graph and does not use travel time but instantaneous connection between vertices. Our generalization is quite natural, as the vertices of a PCC associated to an interval I belong to the same connected component at each time step of this time interval. Like the extension of connected components found in the literature, our definition of persistent connected components does not form a partition of the graph. A notion of dominance between PCCs was also introduced.
We presented a polynomial time algorithm computing all non-dominated PCCs in a dynamic graph. PICCNIC algorithm has complexity O(n 2 · T ), with n the number of vertices and T the time horizon. It is online as it works successively on each time step of the study interval.
The algorithm computes the length of a PCC using the number of time steps. But if we consider that a time step in the study interval corresponds to a time when the graph changes, then we can use a model where the actual amount of time elapsed between time steps i and i + 1 and j and j + 1 is not the same. PICCNIC algorithm can easily be modified to compute the length of a PCC with "real time" instead of number of time steps.
We presented an experimental study. In the first experiments, we executed PICCNIC on different types of graphs to study the impact of the graph's structure on PCCs and on its execution time. Then we showed that PICCNIC's execution time is consistent with its theoretical complexity and that its execution time makes it usable in practice on rather large graphs. In the second experiment, we ran our algorithm on instances made of real data with millions of vertices and arcs. This experiment showed that PICCNIC Algorithm can be used on such real large data. Indeed, its execution time remains quite small with regard to their time horizon.
Another natural extension of connected components seems worth to be investigated. Let us consider that connected components can be interrupted and start again later. From definition 4, it would mean that the vertices stay connected directly or indirectly for l time steps that are not necessarily consecutive. Unfortunately, it is not possible to eliminate dominated connected components without considering all the time steps. Therefore, the number of candidate components might be (2 n ), and finding nondominated ones is not tractable for medium values of n.