Dynamics of task-related electrophysiological networks: a benchmarking study

Motor, sensory and cognitive functions rely on dynamic reshaping of functional brain networks. Tracking these rapid changes is crucial to understand information processing in the brain, but challenging due to the random selection of methods and the limited evaluation studies. Using Magnetoencephalography (MEG) combined with Source Separation (SS) methods, we present an integrated framework to track fast dynamics of electrophysiological brain networks. We evaluate nine SS methods applied to three independent MEG databases (N=95) during motor and memory tasks. We report differences between these methods at the group and subject level. We show that the independent component analysis (ICA)-based methods and especially those exploring high order statistics are the most efficient, in terms of spatiotemporal accuracy and subject-level analysis. We seek to help researchers in choosing objectively the appropriate methodology when tracking fast reconfiguration of functional brain networks, due to its enormous benefits in cognitive and clinical neuroscience.


Introduction
Evolving evidence show that motor, sensory, emotional and cognitive functions emerge from dynamic interactions between cortical and subcortical brain structures. Specific rhythms of neural networks allow synchronization and long-range communication between distant and distributed brain areas. This phenomena was shown crucial during visual (Bola and Sabel, 2015;Mheich et al., 2018), auditory (Fontolan et al., 2014), sensorimotor (Pomper et al., 2015;Wilkins and Yao, 2020) and cognitive (Negrón-Oyarzo et al., 2018;Rouhinen et al., 2020) tasks. This brain communication is very transient and there is a dynamic reorganization of functional brain networks during behavioral tasks, even at sub-second time scale (Vidaurre et al., 2018b).
Therefore, the analysis of whole-brain dynamic functional connectivity (dFC) has become a burgeoning field of research in cognitive neuroscience (Bassett and Sporns, 2017;Bullmore and Sporns, 2009;Iraji et al., 2020;Kabbara et al., 2020). In this regard, Magneto/Electroencephalography (MEG/EEG) provides a unique direct and noninvasive access to the electrophysiological activity of the whole brain, at the millisecond scale. Benefiting from the excellent time resolution of the MEG/EEG (~millisecond), current methods allow of estimating sub-second time-varying functional brain networks in the cortical space through sensor-level signals (Hassan et al., 2014;Hassan and Wendling, 2018). The key challenge here is how to characterize and quantify these rapidly changing networks.
Several frameworks have been used and one can distinguish between approaches that analyze the time-varying signal such as data-driven techniques and those that more explicitly model/capture dynamics over time such as Hidden Markov Model (HMM) (Baker et al., 2014;Vidaurre et al., 2018aVidaurre et al., , 2018bVidaurre et al., , 2016, Autoregressive model (AR) (Casorso et al., 2019) and General Linear Model (GLM) (Friston, 1994). The data-driven methods, where 'brain network states' are derived directly from the data without a priori hypothesis on the fitting model, have showed promising results, despite the fact that the selection of the used algorithm is largely empirical. In these methods, sliding window approach, that forms a series of temporal networks, is usually used and followed by a dimensionality reduction or clustering approach. This includes, Kmeans (Allen et al., 2014;Ciric et al., 2017;Du et al., 2016;Fong et al., 2019;Liu and Duyn, 2013;Mheich et al., 2015;O'Neill et al., 2015), component analysis such as temporal Independent Component Analysis (tICA) (O'Neill et al., 2017), Principal Component Analysis (PCA) (Leonardi, n.d.) and Nonnegative Matrix Factorization (NMF) (Chai et al., 2017). Although the conceptual difference between these methods (and within each family of methods such as different ICA algorithms) is theoretically obvious (as they are based on different assumptions), the studies that investigate the differences between them remained very few. The existing comparative studies are mainly limited to confirming results of differences between two conditions (Leonardi, n.d.) or to prove that obtained results are unaffected by the method's choice (Miller et al., 2016). However, a throughout quantitative and qualitative comparative study using both simulation-based and data-driven approaches is still missing and there is no clear consensus about the 'best' (if any) source separation or clustering method to be used to adequately tracking dFC, which is the main objective of our study.
Here, we evaluate the performance of nine dimensionality reduction methods used to track functional connectivity states at both group and individual levels. This was done using simulations and three independent MEG datasets (N=95) recorded during motor and working memory tasks (see Figure 1). The dynamic brain networks were reconstructed using MEG source connectivity method combined with a sliding window technique. The dimensionality reduction algorithms were compared in terms of their temporal and spatial accuracy. These methods include PCA, NMF, Kmeans and six various versions of ICA (Joint Approximation Diagonalization of Eigen-Matrices (JADE), INFOMAX, Second-Order Blind Identification (SOBI), fixed-point algorithm (FastICA), COM2 and Penalized Semi-Algebraic Unitary Deflation (P-SAUD). The motivation behind using several ICA subtypes is that each one has its own definition of statistical independence and several studies showed conceptual differences between them (Kachenoura et al., 2008;Sahonero-Alvarez and Calderon, 2017). We also analyzed the optimal number of subjects needed for each method to reveal significant results. Overall, our results show that the ICA methods and essentially those based on high order statistics have the highest spatiotemporal accuracy at group and subject level.
This study aims at providing a well-established framework for researchers interested in studying reconfiguration of functional brain networks during cognitive processes. Figure 1. Illustration of the investigation structure for each of the three task-related paradigms. A. The fundamental processing pipeline applied on each subject data from sensor-level (using non-invasive MEG technique) to cortical-level (using beamforming as the inverse problem solution) to dynamic functional connectivity computation (S-dFC) (using the sliding window approach). B. Concatenation of S-dFC of all subjects along time axis to form a group data referred to as G-dFC, C. Comparative analysis between nine different source separation (SS) methods (six variants of ICA, PCA, NMF and Kmeans) applied on both group-level (Xgroup) and subject-level (Xsubj) data in order to derive k dominant taskrelated spatiotemporal components (the mixing matrices represent brain spatial maps while the extracted sources represent corresponding temporal weights fluctuations).

Dataset1: 'Self-paced button press task'
This dataset includes 15 healthy righthanded participants (9 male and 6 female, aged 25 ± 4 years (mean ± SD)). They were asked to press a button with the index finger of their non-dominant hand, once every 30 seconds, and should not count the time between presses. More details about this dataset can be found in (Kabbara et al., 2019;O'Neill et al., 2017).

Dataset2: 'HCP left hand movement Task'
61 healthy participants (28male and 33 female, aged 22-35) completed the MEG Motor task provided by the Human Connectome Project (HCP) (MEG-1 release) (Van Essen et al., 2012). The corresponding experimental protocol was adapted from Buckner and colleagues Thomas Yeo et al., 2011). It was performed in two sessions of 14min each, with a small break between them. Each session consisted of 42 total blocks randomly distributed; 32 of them were partitioned into 16 hand movements blocks (8 right and 8 left), and 16-foot movements blocks (8 right and 8 left), and the remaining 10 blocks were interleaved resting/fixation blocks. Each motor effector block was preceded by a 3sec visual cue that prompts participants to either tap their left or right index and thumb fingers or squeeze their left or right toes. The block lasted for 12sec and consisted of 10 sequential movements, each initiated with 150ms pacing stimuli followed by 1050ms black screen for task execution. Here, for simplicity, we were interested in the trials related to the left hand moves only. MEG data was recorded at Saint Louis University at 508.6275Hz sampling frequency and co-registered with the available subject specific MRI. EMG activity was also recorded from each limb.

Dataset3: 'Sternberg Working Memory Task'
19 healthy participants (10 male and 9 female, aged 25±3 years (mean ± SD)) performed Sternberg task, in which two example visual stimuli, mainly abstract geometric shapes, were successively presented on a screen; each for 0.6sec and separated by 1sec. Then, a maintenance period of 7sec was left before the presentation of a third probe stimulus. Consequently, subjects were asked to press a button with their right index finger only if the probe stimulus matched either of the two example stimuli and an immediate feedback will be given to show their response correctness. 30 trials were presented separated by 30sec of rest. In both datasets 1 and 3, MEG data were recorded using a 275-channel CTF MEG system at 600Hz sampling frequency and co-registered with subject-specific MRI. Both datasets were approved by the University of Nottingham Medical School Research Ethics Committee (O'Neill et al., 2017;Vidaurre et al., 2018a).

Preprocessing
Both datasets 1 and 3 were received already preprocessed as described in (O'Neill et al., 2017).
Briefly, bad segments produced by muscles, eye or head movement were already visually inspected and removed. For dataset 2, we used the preprocessing pipeline offered by the HCP consortium, which includes removing bad channels, segments and bad independent components from task data. Segments were retrieved from the dataset 1 in the interval [-15; +15sec] relative to the button press onset, and from the dataset 3 in the interval of [-16; +28sec] relative to stimulus presentation. In HCP analysis, we chose data epochs time-locked to EMG onset as we were concerned in exploring brain networks involved during movement execution. Thus, trials were segmented in [-1.2; +1.2sec] relative to EMG onset. Then, as functional connectivity was proved to be frequency-dependent (Baker et al., 2014;Hipp et al., 2012), each dataset was preprocessed in its appropriate frequency band actively involved in the corresponding cognitive task. While beta band [13-30Hz] was used for self-paced motor task, working memory data was filtered in a broader band [4-30Hz] as it is has been shown to involve multiple frequency bands, according to previous studies (Brookes et al., 2012;O'Neill et al., 2017). Higher gamma frequencies  were used for HCP data that deals with very rapid time scale tasks. After these preprocessing steps, an average of 34, 150 and 29 per subject were kept from dataset 1, 2 and 3, respectively.

Source Reconstruction and Functional Connectivity
In order to localize brain sources and reconstruct their activities, we used the Linearly Constrained Minimum Variance Beamforming (LCMV) approach on parcellated cortex using AAL atlas (N=78 regions of interests -ROIs-). Following this, we estimated the functional connectivity by computing the amplitude envelope correlations (using Hilbert transformation) between all ROIs. Symmetric orthogonalization was also performed to remove signal leakage for both datasets 1 and 3 as recommended by (Colclough et al., 2015).

Dynamic Functional Connectivity Analysis (dFC)
To estimate the dynamic functional brain networks, we adopt the widely used approach of sliding windows for all datasets. To this end, a time window of length 6sec with 0.5sec was used for datasets 1 and 3 as applied by Oneill et al.(O'Neill et al., 2017). Concerning the dataset 3, we used a 0.6sec length window with 0.05sec step to have similar range of number of windows obtained for all datasets. As a result, we obtained, for each subject trial, a 'dynamic functional connectivity (dFC)' matrix of dimension [NxNxT], where T=49, 36 and 75 for datasets 1, 2 and 3 respectively.
Next, due to symmetry, we unfolded this matrix into a 2-D [Nx(N−1)/2×T] matrix by removing the redundant connections in each time window. Then, the mean of each row of this matrix is subtracted from the data. Finally, all subjects' trials dFC were concatenated along the temporal dimension. We defined this matrix as a 'Group dynamic functional connectivity matrix (G-dFC)', denoted 'X'.

Problem statement
The resultant G-dFC matrix representing the time-varying features can be expressed as as a linear mixture of elementary brain networks that fluctuate dynamically over time. Such issue is the main concern of Source Separation (SS) approach aiming at recovering 'k' hidden sources from a set of observations with minimal priori knowledge about these sources. In this context, the SS problem can be formulated as follows: Where: • 'X' is the computed G-dFC matrix of dimension [qxm]: o q= Nx(N−1)/2 with N=78, representing connectivities between all ROIs.
o m=TxNtot with T is the number of time windows and Ntot is the total number of trials for all subjects.
• 'A' is the mixing matrix of dimension [qxk] illustrating the contribution weights of each individual connection to the components sources, thus the spatial maps of brain networks (k<min(q,m)).
• 'S' is the sources matrix of dimension [kxm] representing temporal sources signatures of G-dFC, collapsed across all connections.
Although it is difficult to determine an optimal setting for the number of reduced components, 'k' was set to 10 in this study for all SS algorithms, consistent with several previous works. It was chosen as a trade-off value to be able to detect sufficient task-related sources without missing genuine components at the same time. The effect of k was evaluated in simulated data (see supplementary material). Therefore, our main objective is to uniformly explore several SS methods abilities to find the appropriate mixing and sources matrices of G-dFC. Among existing SS algorithms, we chose nine popular/well-known methods: six different variants of temporal Kmeans as a state-of-the-art clustering method. They all transform the desired matrix factorization into spatial maps and time series. However, they differ primarily in the constraints imposed on decomposed components. Below, we will give an succinct description about these methods.

Independent Component Analysis: 'Temporal Statistical Independence'
ICA tends to linearly transform multivariate observations into a set of 'statistically mutually independent' latent variables under the hypothesis that these variables are as 'non-Gaussian' as possible. In our study, we examine temporal ICA (tICA) adopted by several previous studies (O'Neill et al., 2017;Yaesoubi et al., 2015) in order to obtain states that fluctuate cumulants. In addition, FastICA and PSAUD use a deflation process for decomposition while other ICA variants jointly separate sources.
All ICA algorithms start with preprocessing steps including centering, whitening, and dimensionality reduction. Then, each method performs the source separation following its own criteria and hypothesis as described in supplementary materials.

Principal Component Analysis: 'Variance maximization'
PCA is a basic linear technique widely used for data dimensionality reduction. It involves a mathematical procedure that transforms a set of observations of possibly correlated variables into smaller number of orthogonal, hence linearly uncorrelated variables called principal components or 'eigenvectors. This procedure is defined in such a way that the variance or 'eigenvalues' of the data is maximized. Then, a fixed number 'k' of eigenvectors and their respective eigenvalues can be chosen to obtain a consistent representation of the data. Here, we apply the Singular Value Decomposition (SVD) algorithm of PCA(Golub and Reinsch, n.d.) on our predefined input matrix 'X', where eigenvectors and eigenvalues are computed from the correlation matrix of 'X' defined as: In this way, Where, 'Λ' is a diagonal matrix of the same dimension of 'X' with non-negative diagonal elements in decreasing order of variance, 'U' is a unitary matrix containing orthonormal eigenvectors and 'V' the corresponding temporal coefficients. Thus, the mixing matrix 'A' is computed as the first 'k' elements of × ∧, and 'S' as the first 'k' elements of 'V'' coefficients.

Non-Negative Matrix Factorization: 'Positivity'
Nonnegative matrix factorization (NMF) is an unsupervised machine-learning technique (Lee and Seung, 1999) that imposes 'non-negativity' constraint on the decomposed factors when solving SS problem. When applied to G-dFC data 'X', NMF leads to parts-based representation that captures additive combination of basis subgraphs 'A' at each time window with temporal coefficients 'S' eliminating negative signal variations. Among several existing NMF approaches, we selected Alternating Least Squares (ALS) algorithm that has previously shown good performance in fMRI context (Ding et al., 2013).

Kmeans Clustering: 'Sparsity'
Kmeans is one of the simplest unsupervised learning algorithms that solve the SS problem through clustering approach (Lloyd, 1982). The algorithm works iteratively to assign each point to only one of the 'k' groups based on feature similarity. Mathematically, by considering G-dFC spatial connectivity as a set of m-dimensional vectors {C1, C2, ... , Cm}, the Kmeans clustering aims to generate a partition into 'k' clusters A = {A1, A2, ... , Ak} to minimize the sum of within-cluster distances 'J' over multiple iterations J: where μi is the mean of spatial connectivity in Ai, and d represents the distance between the two vectors. In our framework, the sparse coding adopted by Kmeans restricts a single time point to have a unique activated network state. The computed clusters 'A' represents the structure of common connectivity patterns across subjects. Then, 'S' is calculated as the frequency of reoccurring of each cluster at each time point through clusters indices. Here, we adapted the same procedure of Kmeans used by Allen et al. (Allen et al., 2014): L1 (Manhattan) distance is used, as it was suggested to be more effective than L2 (Euclidiean) distance for high-dimensional data (Aggarwal et al., 2001). The algorithm is replicated 100 times to increase chances of escaping local minima, and centroid positions were randomly initialized.

MEG group-level analysis
For each dataset, the SS method generated k=10 components. Among these 10 components, identifying those that reflect genuine brain activity related to the task is critical. In this paper, we followed a testing procedure adopted by (O'Neill et al., 2017) and previously described in (Hunt et al., 2012;Winkler et al., 2014) to determine significant components modulated by the tasks. The testing relies on the construction of empirical null distribution based on a 'sign flipping' permutation approach. In this context, we selected randomly half of the subjects and flipped the sign of their contribution to the extracted components. To give all possible realizations, each time a different set of subjects was selected for sign flipping, yielding to 6435 combinations for the selfpaced data and 92378 for the Sternberg data. Regarding HCP Motor data, the same combination procedure could not be conducted directly on the total number of subjects for computational reasons. Therefore, we split the 61 subjects into 20, 20 and 21 subjects and performed separately three combinations, two consisting of 184756 repetitions and one of 352716 repetitions. In this way, the null distribution is built and an independent component was considered significant if, at any time point, the corresponding time signal, averaged over trials, fell outside a threshold defined at 0.05 with corrections. For all datasets, 2-tailed distribution was allowed, and Bonferroni corrections were applied for multiple comparisons across the 10 components and across temporal degree of freedom. It is assumed that a single temporal degree of freedom was added each time the window shifts by more than half of its width. This meant a total of 8 temporal degrees of freedom in the self-paced data, 12 in the Sternberg data and 6 in the HCP Motor data. Thresholds were therefore set at (0.05/(2*10*8)= 0.0003) for the self-paced experiment, (0.05/(2*10*12)= 0.0002) for Sternberg experiment and (0.05/(2*10*6)= 0.0004) for HCP Motor data.

MEG subject-level analysis
Besides group-level analysis, it is crucial to test the performance of each method when applied directly on individual dFC. To this end, instead of concatenating trials from all subjects as in the final step of 'G-dFC' computation, we perform, for each subject, a dFC concatenation of all trials related only to this subject to form a subject specific dFC, denoted 'S-dFC'. Then, all selected SS methods were applied on 'S-dFC' matrix to extract subject-specific spatial and temporal signatures (k=10). In order to quantitatively evaluate and compare methods strength at subject-level context, we measure, for each method, both spatial and temporal similarities between each extracted S-dFC component and significant G-dFC components. These parameters are: (1) Average Distance (AD) for 'spatial similarity': Where ( 1 , 3 ) is the Euclidian distance between the node 1 of S-dFC network and the nearest node 3 from the significant G-dFC network. 1 is the total number of nodes in S-dFC network, and 2 denotes the total number of nodes in G-dFC network. All networks were 70% thresholded. Lower values of AD indicate stronger spatial similarity between S-dFC and G-dFC networks.
(2) Correlation Signals (CS) for 'Temporal Similarity': Where is the temporal signal of each S-dFC component of length 1 and represents temporal signals of G-dFC significant component of length 2 . Higher values of CS reveal stronger temporal similarity between S-dFC and G-dFC signals.
We perform this analysis on each subject among the 15 subjects of the MEG dataset 1 (Motor task). Therefore, for each method, we counted the number of subjects that show satisfactory results performance in the context of S-dFC, based on the previously explained measures. Then, to approximate the number of subjects/trials needed for each SS method to give significant results, we follow the same procedure explained above, but instead of single subject S-dFC computation, we increased the number of concatenated subjects in dFC computation from Nsubj=2 to 14, progressively. In order to have generalized and reliable results, we considered all possible where different sets of Nsubj subjects were selected among the 15 existing data subjects.

Data availability
Data supporting the findings of this study are available in the link ( https://github.com/judytabbal/dynbrainSS.git). All HCP data are available on https://www.humanconnectome.org/software/hcp-meg-pipelines. The datasets 1 and 3 are available upon request.

Code availability
Codes supporting the findings of this study are available in the link

Results
To track the fast reconfiguration of functional brain networks during cognition, we compared the performance of nine source separation (SS) methods, six variants of ICA (JADE, InfoMax, SOBI, FastICA, CoM2 and PSAUD), PCA, NMF and Kmeans. In the following, we present our evaluation study on real MEG data, however our methodology was also tested on simulated data.
These results can be found in the supplementary material. Briefly, the simulation-based analysis showed that all methods provide satisfactory results in terms of spatial and temporal similarity between reconstructed and simulated components with the best performance for NMF method and the worst for SOBI. All methods, except for FastICA and Kmeans, provided consistent results.
CoM2 and PCA were the fastest. Results demonstrated that FastICA, Kmeans and PSAUD depend on the imposed number of components while FastICA was affected by the noise level. Reader can refer to supplementary material to see the detailed quantitative analysis on simulated data.
The SS methods were compared on three MEG datasets consisting of two motor tasks i) self-paced (N=15), ii) HCP left hand movement (N=61) and iii) working memory task (N=19). Our goal was to extract the brain networks with the corresponding time course that are mainly involved during each task. To achieve this goal, subjects' data were preprocessed and filtered and corresponding brain sources activity was reconstructed using Beamforming with reduction of spatial leakage in order to reduce the effects of volume conduction. Then, dynamic functional connectivity matrices were computed on time-windowed brain sources signals using Amplitude Envelope Correlation

Self-paced button press task
In this task, participants were asked to press a button with the index of their non-dominant hand every 30 seconds. Based on the results obtained by all SS methods, we divide all derived significant components into 6 networks, as denoted in  In summary, all methods were able to extract at least one significant motor network at button press time, while SOBI and PCA selected extra components, not directly related to the studied task, as significant components.

Left-hand movement task
This task is also motor but different than the previous one. Here the participants were asked to rapidly and successively tap their left index and thumb fingers. Six network patterns were observed over all the methods, as denoted in Figure  CoM2, PCA and NMF methods. This network follows the same temporal variation of the previous one. Interestingly, this network emphasizes on the activation of right motor cortex that is consistent with the nature of the task (left hand movements). SOBI, CoM2, PCA and NMF methods have also extracted a 'MotL' network. This network seems to have opposite temporal variation from the previous ones (significant increased connectivity at +0.2sec for SOBI and PCA, and decreased connectivity at -0.2sec for NMF). Two networks 'CingPar' and 'Front' were commonly derived by SOBI, PCA and Kmeans for the former and InfoMax, SOBI, PSAUD and NMF for the latter.
Both networks show decreased modulation around -0.4sec then an increase around 0.2sec.
The 'other' networks were found in SOBI (inferior frontal to temporal connections decorrelated with medial frontal and cingulate connections), PSAUD (medial frontal with temporal, motor and cingulate connections) and NMF (mid and post cingulate with parietal and motor connections) as illustrated in the Figure 3. Regarding temporal evolution, the hand movement here are much more frequent than the previous task. Clearly, the fast neural activity due to the short time between successive button presses is expressed as an oscillatory behaviour of brain network activity around the zero time button press, as showed also previously (Vidaurre et al., 2018a). This yields to the obtained temporal variation where the motor network state seems to have high connectivity before button press and begin to have a drop-in connectivity to reach its significant peak after ~0.2sec (referred to as a desynchronization in high frequencies (Vidaurre et al., 2018a)). In this context, the same oscillatory time variation is followed by other networks with similar or opposite modulation sign.
In summary, results show that all methods were able to extract a task-related component ('Mot') but some methods were more (with lower false positives) to extract the motor networks (JADE, FastICA and CoM2) than other methods.

An interactive version of ICA-JADE results can be found on on our github https://github.com/judytabbal/dynbrainSS.git using rotatable MATLAB figures.
Reproducing these results is also possible/available using the MATLAB interface on github.

Working memory task
This task is much more complex comparing to the other two tasks. Subjects here were asked to visualize and memorize two visual shapes and respond to a third probe stimulus by a button press (with their right index finger) in case of matching. The increased cognitive load evoked by the Sternberg task is expected to induce variations in a greater number of significant brain networks including stimulus visualisation (visual network), semantic processing and pattern recognition (semantic, language networks) and button press response (sensorimotor network). Eight networks were obtained from the different methods: Visual ('Vis'), Semantic ('Sem'), Sensorimotor  The temporal evolution of other SS methods can be found in Supplementary Figure S7. Starting from t=0sec, two visual stimulus (shapes) were presented successively, each for 0.6sec. During this period, all methods were able to extract one or more significant 'Vis' network. Some auxiliary connections to parietal or temporal regions can be seen in SOBI, FastICA2, PSAUD, NMF and Kmeans. Time variation of this network shows significant peak during the first two seconds period.
At the same time, we can notice a decreased modulation in 'Cing' network connectivity in JADE, SOBI and PCA.
Following stimulus presentation, subjects should retain the observed shapes in working memory.  Figures S4, S5, S6. For further clarity in visualisation and interpretation, we illustrated, in Figure 5, the spatiotemporal reconfiguration of the functional brain networks as obtained by ICA-JADE.  Furthermore, we discriminated the different SS methods performance in terms of activated brain networks proportion in each task. For example, in motor tasks, we compared the proportion (referred as 'weight') of all extracted motor networks (Mot, MotR, MotVis, MotO) with that of auxiliary networks (that did not involve motor regions). For working memory task, visual, semantic, language, sensorimotor and auxiliary networks were evaluated with all remaining significant components occupancies for each method. Results are shown in Figure 6 with the spatial representation of the corresponding relevant brain regions. Figure 6. SS methods performance evaluation for real MEG tasks. For each task, brain regions involved in each relevant network are illustrated on the left side while brain networks weights are shown on the right side. The weights express the percentage occupancy of a defined brain network relative to all significant extracted components. For motor tasks, motor network ('Mot') was emphasized with auxiliary networks relative to all significant components, while visual ('Vis'), semantic ('Sem'), language ('Lang'), sensorimotor ('Sens') networks are highlighted in contrast to other auxiliary networks. Referring to these representations, capabilities of different SS methods in extracting task-related components can be evaluated.

Performance of each SS methods at subject-level
Here our objective is to evaluate the performance of the methods at the subject-level. We test i) the capacity of each method to extract significant components related to the task: to do so, we computed the correlation between the components obtained by each method on each subject with the significant network obtained at the group level and ii) the number of subjects needed for each method to detect 'expected' networks: here we tested the overall performance of each method by increasing the number of subjects, going from 1 to 15 as we performed subject-level analysis on the self-paced data. Figure 7.A summarizes the subject-level analysis scenario. For each method, we chose one of the significant motor components derived from the decomposition of the grouplevel (N=15subjects), mostly the one showing little intervention from regions other than sensorimotor ('Mot' or 'MotR') and having high temporal coefficients amplitude (supposed to be the best for each method). This component illustrates a 'group' motor network with temporal modulation at the button press time, and will, eventually, serve as a 'mask' component for subjectlevel analysis.
For each SS method, NC=10 components were derived from each subject data. Then, Average Distance (AD) and Correlation Signals (CS) between each of these 10 components and the 'group' motor component relative to the SS method were computed in order to quantify the ability of the method to extract, from a single subject, a task-related component in space (motor network) and time (temporal modulation at button press time) respectively. Following this, only one of these 10 components is selected for results calculation. This selection is based on two conditions criteria on AD and CS values. In the case where AD component is less than a threshold (set as the average of AD values of all components for all subjects and SS methods), and CS is higher than 0.7, then the component is considered to be a motor component. Thus, if at least one of the 10 components pass these conditions, corresponding AD and CS values are denoted and the number of subjects that give similar results to group-level is raised by 1. Otherwise, we selected component as the nearest component to the group-level result, with a compromise between spatial and temporal similarities.
A typical example is illustrated in Figure 7. As a result, two parameters were collected and represented in Figure 7.B and 7.C respectively.
Group-subject similarity percentage was calculated as the number of subjects that gives a motor component similar to the group-level result relative to the total number of subjects (N=15). Figure   7.B illustrates this parameter for all SS methods. We can see that JADE was able to extract a task-  Overall, ICA methods and specially those based on the high order statistics (such as JADE), outperform other methods in extracting networks at the subject-level.

Discussion
Emerging evidence show that discrete and repetitive brain states govern the spatio-temporal dynamics of functional networks. The identification of these connectivity patterns can be performed using different SS methods where each one imposes its specific constraints which yield to different results. Therefore, recent studies have been conducted to evaluate and compare the performance of SS methods in describing the brain states fluctuating over time (Leonardi, n.d.;Meyer-Baese et al., 2004;Miller et al., 2016;Xie et al., 2017). However, these studies suffer at least from one of these limitations: (1) applying a stationarity framework-based comparison, (2) considering a limited number of compared SS methods, (2) using only one database and (3) lack of simulation-based comparison providing a ground-truth validation.
In this study, we have systematically evaluated the robustness of the most popular SS methods applied to extract the main brain networks fluctuating during time in order to help researchers make a rational choice among the multitude of available methods. Specifically, nine algorithms have been compared using simulated data (see supplementary materials) and three independent MEG datasets (N=95) recorded during motor and memory tasks. The discrepancy in the datasets size and behavioral tasks performed allows testing SS methods performance on different scenarios.
As the evoked responses (analyzed here) last for hundreds of milliseconds, we conducted our comparative analysis based on MEG datasets to benefit from the excellent temporal resolution of this technique. However, the same pipeline study can be applied in task-related fMRI context.
Our results show mainly that temporal ICA methods reached the highest spatial and temporal accuracies compared to other SS methods. Among ICA subtypes, high statistical order-based methods induce more robust results than those based on lower order. Therefore, brain components are better described as non-Gaussian independent sources. When considering multiple criteria, JADE and CoM2 methods outperform other SS methods, as we will discuss later in details.
The quantitative comparison performed on simulated dynamic networks showed that all SS methods have successfully separated functional networks based on their connectivity time courses.
However, spatial and temporal similarities in SOBI were significantly lower than other SS methods, especially for the second simulated state (P2) as shown in Supplementary Figure S3, which involves more complex spatiotemporal activity than other states. As expected, FastICA, NMF and Kmeans methods were proved inconsistent with multiple runs. This is caused by the nature of these algorithms that is based on random input initialisations until solution convergence.
Thus, it is important to keep in mind these limitations especially for clinical related studies, where results consistency is of high interest. In addition, results revealed that the performance of FastICA, PSAUD and Kmeans depend on the selection of the number of components. This is an important property to take into account when dealing with complex cognitive tasks where the number of involved brain states is hard to predict. The noise effect on the results obtained was also tested and showed that FastICA seems to be the only method affected by the noise level of data. Hence, FastICA is less recommended to be applied when working on electrophysiological data that appears to be noisy. Regarding computation time, CoM2, PCA and PSAUD were the fastest whereas InfoMax and JADE were the slowest. Still, the executed time of these algorithms is sensible to dataset's features (size, complexity, type…), and may therefore vary between different datasets. However, it is noteworthy to mention that this criterion is not enough to determine the computational complexity of each method; other metrics such as the number of floating-point operations (FLOPs) required for the algorithm completion could be also tested. We also suggest for future studies to explore other data simulation approaches that build the desired ground-truth brain states based on more realistic modeling.
Methods performance were evaluated on three real MEG datasets already published and tested by previous studies (Casorso et al., 2019;O'Neill et al., 2017;Tewarie et al., 2019;Vidaurre et al., 2018a;Zhu et al., 2020). According to the motor tasks (self-paced and HCP left hand movements tasks), results clearly showed that all SS methods have successfully extracted one or more significant networks that involve strong connectivity between sensorimotor regions ('Mot') for both tasks. These regions mainly include nodes from central gyrus in the case of self-paced task, and central gyrus in the case of HCP task. The implicated networks are strongly coherent with the task (Melnik et al., 2017;Yousry, 1997) since it requires both movement (through button press or hand movement) and tactile response (as the subject will feel the button or his fingers tape). The effect of right-handedness of all participants of self-paced dataset is also revealed by the presence of motor network, where only sensorimotor regions from the right cortex are implicated ('MotR').
Yet, this network is not present in PCA and NMF methods. Similarly, 'left' hand integration is revealed by the presence of 'MotR' in most SS methods in HCP dataset. Note that all methods, except CoM2 and Kmeans, highlighted significant connections in the visual lobe ( Figure 2) as previously noticed by Oneill et al. studying the same task (O'Neill et al., 2017). This can be interpreted as a cross modal synchronization between visual and sensorimotor cortex as previously studied (Bauer et al., 2020). We can also explain visual nodes activation by the fact that subjects may try to imagine the 'self' cue to press the button at each trial, similarly to the motor imagery context. Regarding temporal evolution, it is clear that all networks modulate significantly with the exact button press time for self-paced task. Differently, the temporal variation related to HCP motor task takes an oscillatory shape, which was also reported by other studies dealing with the same dataset (Vidaurre et al., 2018a;Zhu et al., 2020). A possible reason for this activity was suggested by Vidaurre et al. (Vidaurre et al., 2018a) considering a leakage effect of temporal activity of previous button press into the next trial due to the fast successive trials. In both tasks, SOBI and PCA extracted components not directly related to the motor cortex. One possible explanation of these methods' fragility is that they are both based on second order cumulant in separating sources. In addition, HCP results revealed the superiority of JADE, FastICA and CoM2 over other methods as they better reflect integrity for motor task related networks.
In order to evaluate the spatial and temporal accuracy of SS methods at higher levels of complexity, we tested the methods on Sternberg working memory task. All SS methods strongly highlighted visual network, which is consistent with the presentation of visual stimuli at two different times.
Regions in the primary visual lobe related to stimulus visualisation (Grill-Spector et al., 1998) and lateral occipital cortex responsible for object/shape recognition (Corbetta et al., 1991;Grill-Spector et al., 2001;Kourtzi and Kanwisher, 2001) were present in these networks. The button press response is reflected by sensorimotor connections only using ICA subtypes methods consistently with previous working memory studies (Metzak et al., 2011;Yamashita et al., 2015). Strong integration of left cortex in this network may be explained by the fact that right-handed participants were asked to use their right index finger to press the button in case of matching stimuli. In order to process and maintain observed stimuli as a way to memorize them, a higher level of cognition is illustrated by a 'semantic network', which mainly encompasses bilateral parietal and temporal areas activation in all SS methods, except for Kmeans. This is coherent with previous studies that demonstrate the evident role of parietal cortex as a workspace for sensory and perceptual processing in working memory framework (Chai et al., 2018) through angular (Frackowiak, 1992;Vandenberghe et al., 1996), precuneus (Cavanna and Trimble, 2006), and hippocampal (Baddeley et al., 2011) areas. Bilateral inferior temporal regions also play important role in semantic processing (Nestor et al., 2006;Vigneau et al., 2006). Fusiform gyri, strongly modulated in our results, has also shown a particular concern in this context (Mion et al., 2010). The extraction of the 'language' network by JADE, CoM2, PSAUD, PCA and Kmeans methods, was compatible with previous findings (Brookes et al., 2011b;O'Neill et al., 2017). This left lateralised network includes connections from inferior frontal gyrus (more robust in JADE method) to nodes spanning central lobe (as in JADE and CoM2). Temporal and parietal lobes were remarkably activated by these methods, mainly the parahippocampal and supramarginal gyri respectively. These regions are critical in memory encoding and retrieval and semantic cognition (Axmacher et al., 2008;Caminiti et al., 2015;Demb et al., 1995;Derrfuss et al., 2004;Deschamps et al., 2014;Vigneau et al., 2006). In a similar (abstract shape based) working memory task, the interpretation of this network was related to a verbalisation naming strategy employed by participants as a way to aid in memory encoding (Caminiti et al., 2015;O'Neill et al., 2017). Therefore, this network's activation may be possible with the task as it modulates strongly with the probe presentation and response time.
Although the main objective of this study is to help researchers choosing the appropriate SS methods in their studies, the applications of such dFC analysis/framework in the clinical domain is such a promising topic. Many aspects of motor abilities and cognitive competences, which can be revealed by motor and working memory tasks respectively, are demonstrated to be perturbed for patients. Consequently, future studies can apply the methodology used in the present paper in order to extract spatiotemporal dynamics of networks from both control and patients groups for a better understanding of dysconnectivity induced by neurological disorders(van den Heuvel and Hulshoff Pol, 2010). We can go beyond this application towards establishing a 'neuromarker' for clinical diagnosis. This requires the analysis of subject-specific dynamic connectivity states rather than examining differences in dynamic network properties. To this end, some previous studies worked on the assessment of subject-specific brain states using results of group-level analysis through back-reconstruction approach such as group information guided ICA (GIG-ICA) (Du and Fan, 2013) or spatio-temporal (dual)-regression (Beckmann et al., 2009). However, we argue that these results may depend on the quality and quantity of group data. Alternatively, in order to satisfy clinical demands and ensure consistent and reliable results on single subject data, we tested SS methods performance when directly applied on subject-level dFC (S-dFC) as previously explained.
Results of our analysis revealed the superiority of ICA-based approaches, as they tend to extract task related component with less required number of subjects/trials than other SS methods. One explanation is related to the algorithm dependency on input data dimensions. For example, it is known that clustering methods as Kmeans require large set of informative input to adequately cluster sources (Dolnicar, n.d.). On the other hand, less number of data points are necessary for ICA algorithms. This question remains challenging due to the inter-subject variability as revealed by the timecourses of connectivity in the constructed null distribution reported ( Supplementary  Figures S4, S5, S6). New avenues are opened here for further efforts in the field of subject-level analysis.
Regarding methodological considerations, first, the optimal number of components to be derived was still a challenging question for all SS methods. The influence of this factor on the results obtained was tested using simulated data. According to real MEG data, we referred to previous experiments (O'Neill et al., 2017;Zhu et al., 2020) to set this input to 10 extracted components for all SS methods. However, not all these components are necessarily essential, especially in the case of simple tasks as motor tasks. To this end, we followed the approach of the null distribution based on sign flipping algorithm (Hunt et al., 2012;O'Neill et al., 2017;Winkler et al., 2014;Zhu et al., 2020) to select only components whose temporal dynamics significantly modulate with the task.
In this way, we ensure that brain dynamics relative to the studied behavioural tasks can be summarized and described by the retained components through an automatic way that allows us to objectively compare the SS methods performance. In addition, the fact that this technique is a purely data-driven procedure that does not require any prior hypothesis or conditions manipulation makes it likely adapted to the specific examined dataset. Two points regarding significant components selection are important to mention here. First, in the applied null distribution, networks were defined to be significant if they fell outside the null distribution in either positive or negative sides because they reflect trial-onset-locked that either increases or decreases in connectivity across subjects (as amplitude envelope correlation was adopted). Second, it should be noted that components significance was evaluated relative to the specific task duration. For example, temporal duration of the entire analysis for self-paced and working memory tasks was set to be large and thus, can include dynamics that should be excluded from the analysis. Whereas, HCP motor task duration was relatively smaller so that all components extracted significantly at any time should be considered. In this context, we limited our significance interpretation/assessment in the interval of [-3;+3sec] relative to the button press instant in the case of self-paced motor task, and [-3;+16sec] relative to the visual stimulus presentation for memory task. In addition, few limitations are to be discussed when dealing with this selection. First, it was not convenient to rely on this technique when the number of trials and subjects was either too small or too big. A small number will not allow to build a reliable null distribution while a huge one will have its computational cost regarding all possible subjects' combinations for sign-flipping procedure, as already executed in the HCP analysis. Moreover, there is no consensus about the thresholds/margins that define well a limit level for component's amplitude. For instance, there exists networks whose temporal variation peaks at the limit of null distribution envelope. These are considered to be critical components that may be integrated in the task but considered not to be following the automatic criteria of this null distribution. Future works should therefore investigate more about this methodological approach in the framework of cognitive tasks, in addition to resting state experiments. Also, it is crucial to seek more methods to use or combine with the applied technique in order to have more robust basis for significant components selection.
It is noteworthy to report that null distribution-based technique was applied uniformly for all SS methods, thus our main objective of comparison was built on a unified evaluation framework.
Second, we used the same pipeline supported by the previous studies dealing with the same dataset (cortical parcellation, source reconstruction, functional connectivity metric and source leakage correction, frequency bands and sliding window settings) (Kabbara et al., 2019;O'Neill et al., 2017). By applying already tested and validated methodological approaches, we avoid influencing factors on the comparison performed. However, we point out that other methodological solutions could be exploited by other researches using the same pipeline adopted in this work. Regarding cortical parcellation, we chose AAL atlas based on its successful use in previous MEG investigations (O'Neill et al., 2017;Tewarie et al., 2016). This atlas also provides good basis for the orthogonalisation procedure adopted since its number of regions is sufficiently low (78 ROIs) and well separated (Colclough et al., 2015). The beamformer spatial filtering was selected as the inverse problem solution due to its demonstrated efficiency in the measurement of static (Brookes et al., 2011a) and dynamic (Baker et al., 2014) functional connectivity. Functional connectivity between ROIs regions was estimated through Amplitude Envelope Correlation (AEC). This technique has been successful in elucidating electrophysiological networks of functional connectivity (Colclough et al., 2016). Other methods, such as phase couplings can be considered as an alternative way to probe different type of functional connectivity (Lachaux et al., 1999).
Sliding window settings (length and step) were selected carefully as a trade-off between temporal resolution and the accuracy of the derived adjacency matrices (length=6sec, step=0.5sec for selfpaced and working memory tasks) (O'Neill et al., 2017). Similarly, the same conceptual strategy was adopted for window specifications in HCP motor task to have comparable number of adjacency matrix (length=0.6sec, step=0.05sec). Concerning frequency bands, it was crucial to preprocess each dataset in its appropriate bandwidth. For example, brain signals in self-paced motor tasks were proved to be more active in the beta band, while broader range of frequency bands are integrated in complex cognitive tasks as working memory (O'Neill et al., 2017). We filtered HCP hand movements data in both beta  and gamma bands  separately. However, results presented in this paper refer to gamma band filtering as we noticed that motor regions were integrated much more strongly when using gamma band. This can be explained by the very rapid timescale of these executed movements which make data preprocessing in high frequencies more adequate. In this context, recent works demonstrated prominent amplitude increases in high gamma band during motor execution in motor brain regions, while beta power was enhanced after execution (delay period) (Combrisson et al., 2017;Hosaka et al., 2016). Another reason could be related to the fact that working with broader frequency range  decreases the noise of adjacency matrix (O'Neill et al., 2017). Finally, to overcome source leakage problem, symmetric orthogonalisation method was applied (Colclough et al., 2015).
Nevertheless, this correction method works properly only if the number of ROIs (78 in our study) was less than the degree of freedom ( = ℎ × ℎ). This condition cannot be satisfied in the case of HCP data that deals with very small window length due to the short period time of trials. Therefore, no correction was applied with HCP data since it requires more time points.