Integer programming formulations and efficient local search for relaxed correlation clustering

Relaxed correlation clustering (RCC) is a vertex partitioning problem that aims at minimizing the so-called relaxed imbalance in signed graphs. RCC is considered to be an NP-hard unsupervised learning problem with applications in biology, economy, image recognition and social network analysis. In order to solve it, we propose two linear integer programming formulations and a local search-based metaheuristic. The latter relies on auxiliary data structures to efficiently perform move evaluations during the search process. Extensive computational experiments on existing and newly proposed benchmark instances demonstrate the superior performance of the proposed approaches when compared to those available in the literature. While the exact approaches obtained optimal solutions for open problems, the proposed heuristic algorithm was capable of finding high quality solutions within a reasonable CPU time. In addition, we also report improving results for the symmetrical version of the problem. Moreover, we show the benefits of implementing the efficient move evaluation procedure that enables the proposed metaheuristic to be scalable, even for large-size instances.


Introduction
In graph theory, signed graphs are those where each arc (or edge) has a positive or negative sign [50,51]. This type of graph has been extensively used for modeling problems in various fields including biology [16], economy [29,47], chemistry [38], ecology [15], image segmentation [31], linguistics [46], but mainly in social network analysis [2,3,10,19,22,23]. One of these problems is correlation clustering (CC) [5], which is a well-known unsupervised learning problem that aims at finding a vertex partitioning in a signed graph so as to minimize the disagreements, given by negative arcs (or edges) within a cluster and positive arcs (or edges) between clusters. Before formally defining the CC problem on a directed signed graph, we shall introduce some notation. • For each arc a ∈ A, let w a be an associated non-negative weight. We will also use w i j and w ji to denote the weight of the arcs (i, j) and ( j, i), respectively. • The set of positive and negative arcs are denoted, respectively, as A + and A − ; thus • Ω + (S p , S q ) = a∈A + ∩A[S p :S q ] w a and Ω − (S p , S q ) = a∈A − ∩A[S p :S q ] w a .
The imbalance I (P) of a l-partition P is defined as The CC problem consists of determining a partition P which minimizes I (P). One of the applications of CC is related to the analysis of structural balance on social networks. In such networks, the arcs represent social relations between actors (i.e. the vertices of the network), whereas the sign represents feelings such as like/dislike and agreement/disagreement. According to the structural balance theory of Heider [14,30], a network is balanced if there is a bipartition of the vertex set so that every positive arc joins actors in a same group and every negative arc joins actors in different groups. Later, Davis [17] generalized the structural balance to support a partition with more than two groups, which is fully compatible with the criterion optimized by the CC. Indeed, an algorithm for CC is a useful tool for evaluating how balanced a social network is. Note that a signed graph is balanced if there is a partition P such that I (P) = 0.
Although traditional structural balance works in several scenarios, Doreian and Mrvar [20] pointed out that this concept could not be appropriate for some networks. They argued that Equation (1) penalizes patterns in the partitions associated with relevant social psychological processes. For example, the network in Fig. 1a represents a scenario with three groups, where one of them is a group of mutually hostile mediators (vertices 8, 9, and 10). Figure 1b illustrates why CC is not suitable in this case: it is not capable of detecting the group of mediators or any type of subgroup internal hostility. This was illustrated in practice by Levorato et al. [36], where positive/negative mediation was detected in networks describing the United Nations General Assembly Voting Data. The equivalent was also verified for the case of differential popularity (a process in which some actors receive more positive links than others in a group) detected in benchmark instances from the literature [21], and for internal hostility detected in networks describing voting activity of members of the European Parliament [3]. Still in [20], Doreian and Mrvar introduced the concept of generalized structural balance giving rise to a new definition for the imbalance of a vertex partition which corrects the partition patterns penalized in Equation (1). The relaxed imbalance of a l-partition partition P, denoted here R I (P), is defined as The main subject of this work is the relaxed correlation clustering (RCC) problem [20,26], which is a CC variant. In the RCC, given an integer parameter 1 ≤ k ≤ n, one aims at finding a partition P ∈ ∪ k l=1 P l that minimizes the relaxed imbalance given by Equation (2), such that P l is the set of all l-partitions of V . The optimal value of the RCC problem determines how balanced a network is w.r.t. the relaxed structural balance introduced by Doreian and Mrvar [20]. For example, the partition in Fig. 1a is an optimal solution for RCC because R I (P) = 0. Both CC and RCC problems were proven NP-hard by Bansal et al. [5] and Figueiredo and Moura [26], respectively.
Doreian and Mrvar [20] tackled the RCC by applying a relocation algorithm to analyze four real data sets related to relaxed structural balance, with up to 20 vertices. The authors adapted a heuristic method proposed in Doreian and Mrvar [19] for the CC problem with a fixed number of clusters, which optimizes a generic and parameterized function called criterion function. Later, Brusco et al. [9] proposed a branch-and-bound algorithm for solving RCC to optimality. This algorithm was capable of solving instances with up to 29 vertices (hereafter referred to as small-sized instances) and k varying from 2 to 7. Moreover, an additional set of instances with up to 40 vertices was considered for experiments with k = {3, 5}. Figueiredo and Moura [26] developed an integer linear programming (ILP) formulation that was capable of solving some small-sized instances when k = {2, 3} and for high values of k, concluding that the proposed ILP formulation and the existing branch-and-bound algorithm are somewhat complementary approaches. The authors also proposed a symmetrical version of RCC (SRCC). Although different heuristic procedures were proposed in the literature for CC (see [6,19,36,48,49], among others), to the best of our knowledge, only one heuristic procedure has been applied to the RCC. Levorato et al. [36] adapted their iterated local search The two main contributions can be summarized as follows: • We present two novel integer programming formulations for the RCC problem and we investigate their empirical performance in comparison to an existing formulation. The results show that the new formulations appear to produce better results in practice. • We propose a local search-based metaheuristic that relies on a series of auxiliary data structures to efficiently recalculate the relaxed imbalance after applying an operation that modifies a partition, as well as on a novel perturbation mechanism. The results obtained suggest that the developed algorithm is superior to an existing approach, producing, on average, high quality solutions in a limited amount of CPU time, not only for RCC instances but also for SRCC instances. Finally, we also demonstrate the practical benefits of implementing the move evaluation in an efficient way.
The remainder of the paper is organized as follows. Section 2 presents a small instance for the RCC problem. Section 3 defines the symmetric version of the problem. Section 4 introduces two novel mathematical formulations for the RCC. Section 5 explains the proposed efficient local search-based metaheuristic including the efficient move evaluation schemes. Section 6 presents the results of the computational experiments. Finally, the conclusions are discussed in Sect. 7.

Symmetric RCC
In this work, we also consider the symmetric version of RCC (SRCC) introduced in Figueiredo and Moura [26]. The relaxed imbalance, as given by Equation (2), penalizes non-predominant relations (w.r.t. the signs) inside each cluster q and non-predominant relations from a cluster p to a cluster q. The difference in the symmetric relaxed imbalance defined in Figueiredo and Moura [26] is that it penalizes non-predominant relations among pairs of clusters, i.e., it considers simultaneously all positive (all negative) relations from S p to S q and from S q to S p . Thus, the SRCC can be defined on an undirected graph in which parallel edges with opposite signs are allowed.
Let G = (V , E, s ) be an undirected signed graph with a positive weight w i j associated to each edge {i, j} ∈ E. Let us denote E + and E − , respectively, the sets of positive and negative edges in E; thus E = E + ∪E − . In this work, we transform the SRCC instance defined on G = (V , E, s ) into a RCC instance defined on a directed signed graph G d = (V , A, s) in which: the associated weight w i j = w ji = w i j 2 . In other words, for each edge {i, j} ∈ E, one creates two arcs (i, j), ( j, i) ∈ A with the same signal and half of the weight.
Let E[S p : S q ] be the set of edges connecting the vertices in S p and those in S q . We denote Ω + (S p , S q ) = e∈E + ∩E[S p :S q ] w e and Ω − (S p , S q ) = e∈E − ∩E[S p :S q ] w e . The symmetric relaxed imbalance S R I (P) of a l-partition P is defined as the second term in (3) is written only for p < q, the equivalence of the second terms in (2) and (3) follows.
As a consequence of Proposition 1, solving the RCC over graph G d is equivalent to solving SRCC over G .

Mathematical formulations
Integer linear programming (ILP) problem formulations have been used to solve CC and other related problems defined on signed graphs [4,12,13,27,28]. For RCC, an ILP formulation was presented in [26]. When modeling vertex-clustering problems, if there is a need for keeping track the clusters used, two types of formulations can be adopted: cluster-indexed formulation [7,11,24] or representatives formulation [1,4,13,28]. Indeed, the ILP formulation of Figueiredo and Moura [26] is a representatives one. Next, we introduce two new formulations for RCC, one of each type.

Formulation F1: a cluster-indexed formulation
Let K = {1, . . . , k} be the set of possible cluster indexes. For each vertex i ∈ V and p ∈ K we define, A set of binary variables is used to describe the set of arcs that will be penalized once, according to Eq. (2), the predominant relations are defined. For each arc (i, j) ∈ A, we define, A set of binary variables is used to select if the imbalance from cluster S p to cluster S q , with p, q ∈ K , is given by negative arcs (predominant relations are positive) or positive arcs (predominant relations are negative). Notice that intracluster imbalance is defined whenever p = q. For each pair of cluster indexes p, q ∈ K , we define, s pq = 1, if positive arcs from cluster S p to cluster S q are penalized, 0, if negative arcs from cluster S p to cluster S q are penalized. The formulation can be written as Objective function (4) minimizes the total relaxed imbalance. Constraints (5) ensure that each vertex is assigned exactly to one cluster. Constraints (6) and (7) define, respectively, if a positive or negative arc will be penalized due to assignment variables x p i , x q j and the penalizing variables s pq . Note that, when p = q, variable t pq defines an intracluster penalty. Finaly, constraints (8)-(10) define the domain of all variables.
Formulations that make use of cluster-indexed variables such as F1 are considered to be symmetric, as there are a substantial number of ways to represent the same partitioning with a different permutation of indices. In view of this, we add the following set of symmetry breaking inequalities introduced in Bulhões et al. [11] for the p-cluster editing problem: The inequality above forbids the cluster containing i to use a label superior to p whenever each vertex j < i is assigned to a cluster of index strictly smaller than p. The formulation in the next section adopts a similar strategy in order to break symmetry in the solution space.

Formulation F2: a representatives formulation
A representatives formulation for the RCC is presented as follows. The idea behind this kind of formulation [13] is the unique representation of a cluster by its vertex with the lowest label. Hence, for each pair of vertices i, j ∈ V satisfying i ≤ j, we define, 1, if the vertex j is represented by vertex i, 0, otherwise. Note that when i = j, variable x i i indicates if i is a representative vertex. Variables s i j , with i, j ∈ V , used in F2 are equivalent to variables s pq , with p, q ∈ K , used in F1. However now, vertices i and j are used to identify two clusters (i = j) or one cluster (i = j). Variables t i j , with (i, j) ∈ A, are exactly the same as defined in F1. Hence, formulation F2 can be expressed as follows. Constraints (12) impose that each vertex must be represented by exactly one vertex: either by itself or by another one with a smaller index. Constraints (13) enforce vertex i to be a representative one whenever a vertex j is represented by i. Constraint (14) imposes k as an upper bound on the number of representative vertices, i.e., on the number of clusters in the partition. Constraints (15) and (16) are, respectively, equivalent to constraints (6) and (7) of formulation F1. Finally, constraints (17)- (19) are the binary constraints.
The number of variables and constraints of formulations F1 and F2 are illustrated in Table  1. Note that since the number of vertices of a graph is usually much greater than the number of clusters, formulation F1 is more compact than F2. On the other hand, formulation F2 succeeds in eliminating cluster indices from the representation which breaks symmetry from formulation [13].

Proposed local search-based metaheuristic
In this section, we propose a metaheuristic algorithm for the RCC based on the iterated local search (ILS) method [37]. The key concept of ILS is to combine local search strategies and perturbation mechanisms to escape from local optima. The metaheuristic introduced in Levorato et al. [36] for CC and adapted for the solution of SRCC instances is also an ILS method. Different from Levorato et al. [36], in this work, we propose the use of several complex neighborhoods and the use of advanced data structures which make the search process more efficient.
Algorithm 1 presents a general framework of a multi-start ILS, hereafter referred to as ILS RCC , which has the following input parameters: (a) I R is the number of restarts of the metaheuristic; (b) I I L S is the maximum number of ILS iterations without improvements; (c) I P is the maximum number of moves performed by a perturbation mechanism.
For each restart, an initial solution is randomly generated (line 5) and such solution is possibly improved by alternately applying local search (line 9) and perturbation (line 13) strategies until the maximum number of iterations (I I L S ) without improvement is achieved. Finally, the best solution found among all restarts is returned (line 18).
The local search procedure is based on variable neighborhood descent (VND) [40], which is a technique that systematically explores a sequence of neighborhood structures (see Sect. 5.2), searching for better solutions. A neighborhood structure (or simply neighborhood) defines a set of neighbor solutions from a current solution by applying a so-called move. When VND finds an improving move using a particular neighborhood, the solution is updated and the procedure restarts from the improved solution. The procedure terminates when all neighborhoods fail to improve the current solution. The best improvement strategy was adopted, i.e., a neighborhood is fully enumerated and the best improving move (if there is any) is applied. In addition, the neighborhood ordering is defined in a random fashion, which results in a strategy known as Randomized VND (RVND). The combination of ILS and RVND led to state-of-the-art methods for several important combinatorial optimization problems, such as:  18 return P * split-delivery vehicle routing problem [44], minimum latency problem [43] and minimizing weighted tardiness in single machine scheduling with sequence-dependent setup times [45].
The perturbation procedure randomly chooses one of the implemented mechanisms (see Sect. 5.3) in order to modify the local optimal solution P . The selected mechanism then applies I P random consecutive moves over P in order to generate a solution to continue the search.
In what follows, we provide a detailed description of the auxiliary data structures used for performing efficient move evaluation, as well as on the neighborhood structures and perturbations mechanisms.

Auxiliary data structures
Assuming that an adjacency matrix is used to access the signed digraph G, and a feasible solution is represented by a set of subsets of indices (e.g. for a graph with 6 vertices; see Fig. 2), the value of its associated objective function can be straightforwardly computed in O(l 2 n 2 ) operations, where l = |P|. Note that this is due to the complexity of determining the intercluster imbalance. Consequently, performing the move evaluation of a neighbor solution from scratch every time during the local search may turn out to be computationally expensive, especially for large size instances. However, this can be done in a more efficient manner by precomputing and storing information in auxiliary data structures (ADSs).
We thus propose to implement two classes of ADSs: SumIntra[S p ], which is a set of ADSs that stores the sum of the weights for different subsets of A[S p ] (a.k.a. intracluster arcs of S p ); and SumInter[S p ][S q ], which is a set of ADSs that stores the sum of the weights for different subsets of A[S p : S q ] (a.k.a. known as intercluster arcs from S p to S q ). The ADSs are divided according to the sign and arc direction as described in Table 2.
Given a feasible solution, the SumIntra and SumInter ADSs can be initially built in O(l 2 n 2 ) operations as described in Algorithm 2.

Neighborhood structures
ILS RCC uses three neighborhood structures in the local search, namely: Insertion, Swap and Split. In the following, each of them is described in detail.

Insertion
The Insertion neighborhood moves a vertex from a cluster to another one, thus yielding O(l 2 n) possible neighbor solutions to be evaluated. Algorithm 3 describes how an Insertion move is evaluated using the ADSs. This algorithm receives as input the solution P along with its associated cost (relaxed imbalance) R I P , and the information regarding the move, i.e. S p , i ∈ S p and S q . At first, auxiliary variables sum + S p , sum − S p , sum + S p ,S q and sum − S p ,S q temporarily store, in O(1) steps, the sum of the weights associated to the move (lines 2-13). Next, the value of the objective function of the neighbor solution under evaluation, denoted in the algorithm as cost, is partially obtained (lines 14-17) by recomputing the penalty decisions using function UpdateCost (see lines [31][32][33][34][35][36]. In the loop from lines 18 to 30, a similar procedure is performed for the intercluster cases involving the other clusters and the clusters S p and S q . Finally, cost is returned and the move yields an improvement if cost < R I P . Because Algorithm 3 performs O(l) steps (due to the loop), finding the best improving move requires O(l 3 n) operations. Moreover, when P is modified, the ADSs must be updated. However, instead of recomputing the ADSs from scratch in O(l 2 n 2 ) operations, one only needs to update the ADSs affected by the vertex that was involved in the move and this can be performed in O(n) steps, as detailed in the electronic supplementary material of this work. Figure 3 illustrates an example of an Insertion move. Note that the separation of the weights for the adjacent arcs of vertex i in SumIntra clearly facilitates the evaluation of the intercluster sums from S 1 to S 2 and from S 2 to S 1 . Otherwise, it would be necessary to perform O(n) operations to compute the weights separately. Computing the SumIntra ADSs Computing the SumInter ADSs

Swap
The Swap neighborhood exchanges two vertices between two different clusters, which leads to O(l 2 n 2 ) neighbor solutions if one intends to enumerate all possibilities. The pseudocodes presented in the electronic supplementary material describe how a Swap move can be evalu- Recompute the penalty decisions and updates R I P ; update the sum of the intercluster weights involving others clusters ated in O(l) steps using a similar rationale employed in the Insertion neighborhood. Finding the best improving Swap move thus require O(l 3 n 2 ) operations and the ADSs can be updated in O(n) steps as also described in the supplementary material.  Figure 4 depicts an example of a swap move, highlighting the arcs connecting exchanged vertices, as they must be treated separately with respect to some ADSs.

Split
The Split neighborhood splits a cluster into two, resulting in a total of O(ln) neighbor solutions to be examined. Formally, given a cluster S = {v 1 , v 2 , . . . , v |S| } ∈ P and an index c < |S|, the clusters S = {v 1 , v 2 , . . . , v c } and S = {v c+1 , . . . , v |S| } are produced to replace S in P. Clearly, a Split move can only be applied when l < k. The Pseudocodes presented in the electronic supplementary material describes how a Split move can use previous evaluations to speedup the next ones. The overall complexity of determining the best improvement is O(ln 2 ), as also described in the supplementary material. Because of the considerable changes produced by the split move, all ADSs must be updated from scratch using Algorithm 2. It is worth mentioning that implementing a specific procedure to update the ADSs did not pay off the gains in CPU time. Moreover, note that a split move never worsens a solution, since the imbalance decreases monotonically as k increases [20]. Figure 5 shows an example of a split move considering a cluster with 5 vertices that is split into two with 2 and 3 vertices, respectively.

Complexity summary
A summary on the complexity of the neighborhoods is provided in Table 3. For each neighborhood, we present its size, as well as the complexity of performing the move evaluation and the overall one using both the efficient best improvement (EBI) and the naive best improvement (NBI) strategies. In EBI, the search for the best improvement move is carried out as described in Sect. 5.2, whereas in NBI the objective function must be computed from scratch (with no support of ADSs) after each move. We also report the complexity of updating the ADSs in the case of EBI.

Perturbation mechanisms
ILS RCC employs three diversification mechanisms to perturb local optimal solutions, namely: Insertion, Merge and Sign inversion. The first one simply performs random insertion moves.
In the second, given two clusters S 1 and S 2 chosen at random, one merges them to form a new cluster S 3 , that is, S 3 = S 1 ∪ S 2 . The latter perturbation is a novel procedure that considers some RCC specific features, as described in the following. The proposed Sign inversion mechanism enforces the penalized sign in one of the decisions to be changed. More precisely, it modifies the solution in such a way that one of the intracluster or intercluster imbalances becomes defined by the opposite sign. The procedure randomly selects which case (i.e., intracluster or intercluster) is going to be considered. Basically, this is achieved by removing vertices that contribute with the non-penalized sign until the inversion happens. In what follows, we will explain the procedure used to invert an intracluster decision.
Let + be the non-penalized sign for the intracluster imbalance of S p (this also applies for sign −). Formally, the contribution of a vertex i ∈ S p is given by Eq. (20).
At first, the vertices for which all incident arcs (indegree and outdegree arcs) are positive are removed in non-increasing order of Δ + . The value of Δ + must be updated after each removal. If this does not suffice, the remaining vertices with Δ + > 0 are removed using the same sorting criterion. Removals are performed while (i)  initial state. Figure 6 illustrates an example involving the application of the sign inversion mechanism.
The intercluster sign invertion, e.g., from S p to S q , may be easily derived by considering only the arcs A[S p : S q ] that determine the vertices to be removed from S p and by changing the condition (iii) to |S p | > 1. Note that no vertices are removed from S q but it may receive vertices from S p .
This perturbation allows for exploring some particular regions of the search space that is difficult to be achieved by only using the other mechanisms, including the randomized construction procedure, mainly when sign distribution on arcs is unbalanced and small changes are not likely to invert the sign.
Each time the function Perturb is called, a perturbation mechanism is randomly chosen. The selected perturbation then applies from two up to maxPert moves. The number of moves is also chosen at random. If l = 2, Merge is not an eligible perturbation. Therefore, when Sign inversion is chosen and no change could be performed, one of the other two remaining mechanisms (or Insertion if l = 2) is randomly selected. Table 4 presents the main differences between ILS RCC and the ILS by Levorato et al. [36] which was developed for the SRCC.

Differences between ILS RCC and ILS [36]
It is worth mentioning that we tried to incorporate the constructive procedure implemented in Levorato et al. [36] into our algorithm, but the experiments reported in Sect. 6.3.1 indicated that its inclusion did not seem to significantly affect the overall performance of ILS RCC both in terms of solution quality and CPU time.

Benchmark instances
Regarding the benchmark instances used in our testing, we first present the small-size instances from the literature. Next, we introduce the newly proposed RCC instances and, finally, we describe the existing SRCC instances.

Small-size instances from the literature
The small-size instances considered here were proposed in different works and together they compose a set of nine signed digraphs described as follows.
• House instances-In 1952, Lemann and Solomon [32] carried out a sociometric study with students living in three different dormitories (denoted as House A, House B and House C) and obtained four relationship networks per dormitory considering the following information: date, friend, roommate and weekend. Doreian [18] later summed the arc weights of the signed networks associated with each dormitory so as to generate another three networks: House A Sum, House B Sum and House C Sum. We have considered these last three in our experiments. • Monastery instances-In 1868, Sampson [42] studied, in different periods of time, groups of young or novice postulants of a monastery, cataloging data for four types of relationships: affect, esteem, influence, and sanction. From this data, networks were generated for each period of time and type of relationship. Among them, we considered those associated with the relationship affect for different periods of time, namely MonkT2, MonkT3, and MonkT4. In addition, we considered the network Mont4 Sum, generated in Doreian [18] by summing up the arc weights of the four types of relationships in period T4. • McKinney instance-This signed digraph was built by Brusco et al. [9] from the data collected by McKinney [39] in a study about the relationship between children in a classroom. In such study, children were submitted to a test in which they had to choose between the "willing to serve with other children" (labeled as + 1), "not being willing to serve" (labeled as − 1) and "indifferent" (labeled as 0), defining the class relationship digraph. • NewComb instance-In 1961, Newcomb [41] conducted a well-known sociometric study with University students. A signed digraph was generated in Doreian and Mrvar [20] by slightly modifying the data from this study.
The main characteristics of the aforementioned instances are described in Table 5, where d and d − indicate the digraph density (given by d = |A|/(|V | 2 − |V |)) and the percentage of negative arcs (given by d − = |A − |/|A|), respectively. For the sake of convenience, we have specified an alias (in parentheses) for each instance.  The newly generated digraphs are available at http://www.ic.uff.br/~yuri/files/rcc_ random.zip.

Symmetric RCC instances
We also considered three sets of symmetric RCC benchmark instances, namely: • UNGA instances-Generated by Levorato et al. [35] and composed of 63 undirected graphs that were built from the voting data of the United Nations General Assembly (UNGA) annual meetings between 1946 and 2008. These networks are weighted versions of UNGA signed digraphs created by Figueiredo and Frota [25]. • Slashdot instances-Created by Levorato [33] from subgraphs of the social network Slashdot Zoo containing 200 to 10000 vertices. Such subgraphs were transformed into undirected graphs. Levorato [33] performed experiments with a parallel heuristic for the SRCC. Since we are specifically interested in comparing the performance of sequential implementations, it was thought advisable to consider the instances with up to 2000 vertices. • BR Congress instances-Set of undirected graphs generated by Levorato and Frota [34] from voting sessions of the lower house of Brazilian National Congress. They created two graphs per year between 2011 and 2016, resulting in a total of 12 instances.

Tables 6, 7 and 8 present the results obtained by the formulation proposed in Figueiredo and
Moura [26], as well as those determined by F1 and F2. Column z represents the relaxed imbalance, given by R I (P * ), where P * is the solution (optimal or not) found by the corresponding formulation, gap informs percentage gaps calculated between best integer solutions found and final lower bounds (LB) as described in Eq. (21), t indicates the CPU time in seconds ("-" means the instance was not solved in the time limit set), and nodes is the number of nodes that were solved during the search. Regarding the ILP formulation proposed by Figueiredo and Moura [26], we report the original results which were found using XPRESS 21.01.00 (Intel Core 2 Duo 2.10 GHz and 3 GB of RAM) and also those determined by CPLEX 12.7 in order to perform a fair comparison. A time limit of 3600 s was imposed for each run.
We followed the same procedure adopted in Figueiredo and Moura [26] in our testing. For each digraph, we start the experiments with k = 2. If the instance is solved to optimality, we then increase the value of k by one unit and attempt to solve the problem again (forward phase). If an optimal solution with relaxed imbalance 0 is found, we then interrupt the experiments for that particular digraph since this solution is also optimal for instances with larger values of k. When an instance is not solved to optimality, a similar procedure is carried out to solve instances from k = n − 1, where the value of k is decreased by one unit after each successful optimization (backward phase). In the backward phase, we use the value of the optimal solution found in the previous run (i.e., for k + 1) as a lower bound for the instance with k. The backward phase is finished when an instance is not solved to optimality or when the current instance was solved during the forward phase.
The results obtained show that F1 outperforms the other formulations with respect to the number of optimal solutions achieved. When a formulation obtained an optimal solution with relaxed imbalance 0 for a given value of k, then we assume that all optima were found for executions with larger values of k (e.g., F1 and F2 solved instances from MT2 to optimality). While this formulation found 38, 64, 27, 15 optimal solutions for each group (House, Monastery, McKinney, NewComb), respectively, F2 found 29, 64, 27, 15, respectively, and the formulation by Figueiredo and Moura [26] obtained 18, 48, 13, 10 using XPRESS, and 7, 44, 15, 12 using CPLEX, respectively. Overall, a total of 40 new optimal solutions (counting only once the optimal solutions with relaxed imbalance 0 for each digraph) were found and all instances of 6 of the 9 groups were solved to optimality.
In addition, it can be observed that F1 is generally faster, but it is outrun by F2 on instances HAS, HBS and HCS for larger values of k. Note that for the latter two, F2 solves some instances at the root node. The formulation by Figueiredo and Moura [26] is clearly slower, even taking into account the hardware difference for the results found using XPRESS. In general, more nodes are solved when running such formulation, but in some cases, F2 is the one to solve more nodes.
The results also demonstrate that RCC has the expected behavior for k-partition problems, where problems with k close to 2 and n −1 are easier than those problems with k close to n/2. This can be explained by the number of possible partitions, which is given by the Stirling number of the second type [8] as described in Eq. (22).   Table 6 continued Instance k Literature ILP formulation  Table 6 continued Instance k Literature ILP formulation    Table 7 continued Instance k Literature ILP formulation  Table 7 continued Instance k Literature ILP formulation  Table 7 continued Instance k Literature ILP formulation    Table 8 continued Instance k Literature ILP formulation Finally, we report in Table 9 a comparison between the optimal solutions for RCC and CC. In addition to comparing I (P) with R I (P) and the correction δ = I (P) − R I (P) obtained with RCC (recall that RCC was proposed in order to correct wrong penalties in CC), we also decompose the total imbalance into minimum, average, and maximum penalties for intracluster and intercluster cases. With the exception of one instance (MT4 with k = 3), a positive correction is obtained by RCC. It is worth mentioning that most of the imbalance (and hence the correction) occurs in intercluster cases.

ILS implementations
We used the same values adopted in Subramanian and Farias [45] for the main parameters of ILS RCC , that is, I R = 20 and I I L S = min{100, 4 × n}. The only difference is that we imposed a minimum value of 100 for the latter as in Silva et al. [43]. Moreover, we set max Pert = 6 after conducting some experiments (see Sect. 6.3.1). The algorithms were executed 10 times on each instance in all experiments. Hereafter, the percentage gap of a solution P is computed as gap = 100 × ( f (P ) − f (P best ))/ f (P ), such that f is the objective function (i.e., f (P) = R I (P) for RCC and f (P) = S R I (P) for SRCC) and P best is the best solution among all solutions found by the ILS RCC and the ILS of Levorato et al. [36].

Impact of the different components of the algorithm
We evaluate the ILS RCC concerning the impact of: (i) using the greedy constructive algorithm by Levorato et al. [36]; (ii) the parameter max Pert; (iii) the neighborhood structures; (iv) perturbation mechanisms. To this end, we conducted experiments involving all 100-vertex random instances.
At first, we assess the impact of replacing the completely random construction (line 5 in Algorithm 1) with the greedy construction of Levorato et al. [36]. Table 10 shows the average gaps and CPU times obtained by the two versions of ILS RCC (using two different constructive procedures) for different values of k. It can be seen that none of the two versions are significantly superior than the other w.r.t. both criteria. The only exception occurs when k = 9, where using the completely random construction produces a superior average gap and CPU time. In general, using the greedy construction in ILS RCC leads to an improvement of only 0.01% in terms of average gap, and an increase of around 3% on the average CPU time. Therefore, it is reasonable to conclude that the effort to implement a more sophisticated constructive procedure may not be worthwhile for the RCC when the algorithm contains an effective local search.
The impact of varying the parameter max Pert is illustrated in Fig. 7, where values between 4 and 8 are considered to compare different versions of ILS RCC . The results show that the values 5 and 7 are dominated by the remaining ones; the value 4 produces the faster execution but with a poor average gap, whereas the value 8 achieves the best average gap but the worst CPU time. Therefore, we decided to use max Pert = 6 because it yields a good balance between solution quality and CPU time.
We assess the impact of the neighborhood operations by considering 7 different configurations. In this case, we consider I R = 20 and I I L S = 1 (i.e. only one iteration of the RVND procedure is executed) and we measure the percentage improvement over the initial solution.
The average results are shown in Fig. 8. We can observe that the configurations that yields the most promising results are those 5 and 7. The difference between both settings is that the latter includes the neighborhood Swap, which led to a slight improvement despite the additional CPU time. Table 9 Comparison of optimal solutions for RCC and CC in small instances  In order to measure the impact of the perturbation mechanisms, we run ILS RCC using the default values of the parameters and store the percentage improvement over the best solution found in the previous testing for each instance. To choose an interesting configuration, we perform experiments considering two scenarios: with and without the neighborhood swap. The average results obtained are depicted in Fig. 9. The best results are obtained by settings 6 and 7 for both scenarios. The scenario that considered Swap achieves better improvements at the expense of CPU time. Although not reported in Fig. 9, the settings tested in the second scenario (i.e., the one including Swap) systematically found more best solutions than their corresponding counterpart in the first scenario. We, therefore, decided to select configuration 7 of the second scenario because it appears to offer an interesting compromise between solution quality and CPU time.

Comparison with the literature
The implementation by Levorato et al. [36] was originally devised for the symmetric version of the problem. Therefore, we had to slightly modify the source code, which was provided by the authors, to cope with the asymmetric case. We will refer to this method as ILS adapt .
Concerning the small-size instances considered in Sect. 6.1.1, it was observed that ILS RCC and ILS adapt are capable of consistently finding the optimal solutions in a fraction of a second. Table 11 shows the aggregate results obtained by the ILS implementations on the set of random instances. For each digraph, we report the minimum, average and maximum values of the average percentage gaps (there is an average gap for each value of k ∈ {3, 5, 7, 9}), as well as the average CPU time in seconds. Detailed results are reported in Appendix A. Moreover, for an appropriate comparison, we have imposed the average CPU time obtained by ILS RCC , for each instance, as a stopping criterion for ILS adapt .
The results obtained show that, on average, ILS RCC clearly outperforms ILS adapt . When evaluating the performance of each individual instance, the ILS RCC found the best solution (one with a gap of 0%) for 179 instances (93.2% of the cases), where among them 166 (86.5%  Improved values are highlighted in bold of the cases) are strictly better than the best ones achieved by ILS adapt . Furthermore, we can also see that the average runtime increases with the value of d − . Figure 10 illustrates how the average gap between the average solution and the best known solution varies according to different values of d, d − and k. We can observe that the instances appear to become easier when the value of d increases, as depicted in Fig. 10a. Furthermore, from Fig. 10b, it is visible that the instances with a smaller value of d − appear to be harder. Finally, larger values of k seem to increase the difficulty of the instances, as clearly shown in Fig. 10c.
In Appendix A, we also report many improved upper bounds w.r.t those obtained in the experiment reported in Table 11. These improved solutions were found while experimenting with different settings of the algorithm, and also during the preliminary experiments described in Sect. 6.3.1.

Impact of the ADSs on the runtime performance
This section examines the average runtime performance of ILS RCC when incorporating the ADSs for efficiently computing the relaxed imbalance value of a neighbor solution during the local search. Figure 11 depicts the CPU time of the versions of the algorithm using EBI and NBI, respectively, in the log scale. In Fig. 11a, we illustrate the comparison for the small-size Monastery instances. Despite the considerable runtime difference, it can be seen that using NBI, i.e., the one that does not make use of ADSs to perform move evaluation, is still doable in practice, as the average CPU time spent by the method is fairly acceptable. However, for the 100-vertex instances, the difference is astonishing, and visibly illustrates the benefits of incorporating the ADSs proposed in this work. Note that the disparity is likely to become even more prominent for larger instances. Table 12 shows the summary of the results obtained for each set of benchmark instances. In this case, because the original algorithm from the literature is used, we refer to it as "ILS Levorato et al. [36]". Detailed results are provided in Appendix B. We report the number of

Concluding remarks
This paper proposes exact and metaheuristic approaches for the relaxed correlation clustering (RCC) problem. In particular, we developed two integer linear programming formulations that obtained a superior performance when compared to the existing one, as well as an enhanced iterated local search (ILS) algorithm that substantially outperformed the previous ILS implementation from the literature. One key factor of our ILS is the efficient move evaluation scheme, which was crucial for improving the scalability of the method. Moreover, we also put forward a novel perturbation mechanism for the problem that helped the algorithm to find high quality solutions. The performance of ILS was also assessed in benchmark instances of the symmetrical version of RCC (SRCC) and the results achieved were always at least as good as the best known. Future work includes the development of efficient parallel algorithms for tackling very large instances that may arise in real-life social networks. In addition, as the current integer linear programming formulations are still limited to small-size instances, there is still room for developing enhanced exact algorithms, perhaps in the spirit of Brusco et al. [9], as an attempt to solve larger instances.