A Nontargeted UHPLC-HRMS Metabolomics Pipeline for Metabolite Identification: Application to Cardiac Remote Ischemic Preconditioning

a validated non targeted metabolomics strategy with pipeline for unequivocal metabolites identification using the MSMLS™ molecule library. We achieved an in house database containing accurate m/z values, retention times, isotopic patterns, full MS and MS/MS spectra . A UHPLC HRMS Q Exactive™ method was developed and experimental variations were determined within and between 3 experimental days. The extraction efficiency as well as the accuracy, precision, repeatability, and linearity of the method were assessed, the method demonstrating good performances. The methodology was further blindly applied to plasma from Remote Ischemic Pre Conditioning (RIPC) rats. Samples, previously analyzed by targeted metabolomics using completely different protocol, analytical strategy and platform, were submitted to our analytical pipeline. A combination of multivariate and univariate statistical analyses was employed. Selection of putative biomarkers from OPLS DA model and S plot was combined to jack knife confidence intervals, metabolites VIP values and univariate statistics. Only variables with This is a 2 steps study, with the first part related to the validation of the developed method, the creation of the in house molecules library, and a second part dedicated to the implementation of the methodology and its application to a RIPC clinical cohort. Further the application of the current to cohort.

60 ACCEPTED PAPER strong model contribution and highly statistical reliability were selected as discriminated metabolites.
Three biomarkers identified by the previous targeted metabolomics study were found in the current work, in addition to three novel metabolites, emphasizing the efficiency of the current methodology and its ability to identify new biomarkers of clinical interest, in a single sequence. The biomarkers were identified to level 1 according to the Metabolomics Standard Initiative and confirmed by both RPLC and HILIC HRMS.

!" !
Metabolomics is defined as the comprehensive analysis of low molecular weight metabolites, typically <1500 Daltons, which are highly context dependent, varying according to the physiology, developmental or pathological state of a cell, tissue, organ or organism 1 . Two major analytical techniques are mostly used: nuclear magnetic resonance spectroscopy (NMR) and mass spectrometry (MS), with the latter becoming more widely exploited in the field 2 . In biomarker discovery, MS has the advantage of very high sensitivity and the ability to detect a high number of different metabolites depending on the experimental setup. Moreover, the specificity of MS, through high resolution and/or multidimensional MS n techniques, further facilitates the structural elucidation of metabolites of interest 3,4 . In the recent years, the amount of clinical investigations based on metabolomics has considerably increased, although often without thorough assessment of the analytical methods applied to acquire data especially in non targeted metabolomics.
Metabolomics needs highly standardized methods to avoid bias and data misinterpretations.
Thus, efforts have been made for defining appropriate validation parameters. Challenges in non targeted strategies are different to targeted methods for which published guidelines exist 5 8 . Targeted metabolomics approaches focus on the quantification of a limited number of well characterized pre selected molecules. In contrast, non targeted metabolomics methods are more exhaustive but they often lack the undeniable characterization of the metabolites of interest. In 2007, minimum meta data relative to instrumental performance and method validation have been proposed by the Metabolomics Standards Initiative (MSI) 9 , followed by minimum reporting standards for data analysis associated with metabolomics experiments 10 . More recently, a relevant review compiling alternative approaches used to validate metabolomics methodologies was published 11 and further validation criteria for non targeted metabolomics were recommended. Criteria include the assessment of: the accuracy and precision for selected compounds with different physico chemical properties, retention time, pooled QCs to measure the repeatability within an analytical run and filtering data before analysis (considering the drift in signal variations), diluted pooled QCs for checking linearity, and total signal plot from each chromatogram to verify instrumental repeatability.
With the aim of setting up clinical metabolic profiling strategy, the above mentioned recommendations were considered in order to provide high degree of confidence in the developed  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60 ACCEPTED PAPER 3 method and to ensure that the methodologies were fit for purpose. A method based on Ultra High Performance Liquid Chromatography coupled to High Resolution Mass Spectrometry (UHPLC HRMS) Q Exactive ™ was developed and subjected to a validation process. To overcome the difficulties of metabolites identification and unequivocally biomarkers identification, an extended library of more than 500 molecules intended for mass spectrometry metabolomics applications was used. Identification and confirmation of biomarkers from metabolomics investigations are essential.
Further, the methodology was blindly applied to plasma from RIPC rats, previously analyzed by completely different targeted metabolomics protocol, analytical strategy and platform 12 .
The use of the same sample cohort in the current research had several purposes: (1) to validate the developed workflow, (2) to test the in house database and (3) to see whether similar biomarkers can be obtained using different analytical platforms and technologies.

#$ # # # !
A schematic diagram of the experimental design is illustrated in Figure 1.
Description of the solvents, chemicals, and authentic standards used can be found in Supporting Information.

% & '
This is a 2 steps study, with the first part related to the validation of the developed method, the creation of the in house molecules library, and a second part dedicated to the implementation of the methodology and its application to a RIPC clinical cohort.
Human plasma samples were obtained from Angers University Hospital. These plasmas were collected from consented patients attending clinics. One mL plasma from four patients was pooled together and constituted sample matrix for the method development and validation. Besides, plasma from rats involved in a controlled RIPC experimentation was further used to assess the potential of the developed strategies for clinical metabolomics and biomarkers identification.
The RIPC animal experimentation has been previously described 13 . Briefly, 20 male Wistar rats, 8 to 10 weeks old, were randomly assigned to either RIPC group (T) or control group (C). Plasma was collected using standard procedures and stored at 80°C prior to use. Further details on the RIPC procedure can be found in Supporting Information.
For the RIPC investigation, a pooled quality control sample deriving from all rat subjects was prepared, to ensure that no or minimal metabolic information was lost 14 . QC dilution series (1:2, 1:4 in reconstitution solvent) were also carried out and provided robust quality assurance for each metabolic feature detected. v/v) was as follow: hold initial conditions 98:2 for 2 min, followed by a linear gradient from 98:2 to 0:100 over a 15 min period, hold at 0:100 for 3 min, return to initial conditions 98:2 over 2.5 min and then hold these conditions for a further 2 min. A constant flow rate of 0.300 mL min 1 was used; the injection volume was optimized at 5 µl and samples injection order was randomized. A divert valve was used and the eluent was directed to waste at 22.45 min. Between each injection, the system was equilibrated for 1.5 min.
High resolution MS was acquired in positive and negative ionization mode, respectively (a distinct run for each modality). Full scan mass spectra (Full MS) were acquired and data dependent MS/MS (ddMS2) experiments were performed at the start and the end of each sequence, on several QCs, and acquired in 'Top5' data dependent mode. MS conditions can be found in Supporting Information.
Xcalibur 2.2 software (Thermo Fisher Scientific, San Jose, CA, USA) was used for data acquisition.
Prior to each sequence acquisition, the mobile phase was run for 1h30, followed by injection of 3 solvent blanks and 5 QC samples to allow column equilibration and conditioning. Column pressures and isotope labeled standards monitoring were performed at every analytical run for verifying the current state of the whole system, as a quality control procedure.
A Hydrophilic Interaction Chromatography (HILIC) -HRMS method was further implemented to ascertain the biomarkers identified by RPLC HRMS. The chromatographic separation was achieved with an Acquity® BEH HILIC 1.7µm 150x2.1mm column together with the corresponding pre acetic acid. The system was programmed to perform an analysis cycle consisting of holding initial conditions 95% B for 2 min, followed by a linear gradient from 95% to 80% B over a 3 min period, from 80% to 60% B over 7 min, from 60% to 40% B over 2 min, hold at 40% B for 2 min, return to initial conditions over 2 min and then hold of these conditions for a further 10 min. The flow rate was 0.400 mL min 1 and the injection volume was 10 µl. High resolution MS was acquired in positive ionization mode. Targeted SIM (t SIM) and targeted MS2 (t MS2) experiments were performed for each marker of interest. MS conditions are detailed in Supporting Information.

+ %
There are no guidelines for validating analytical methods in non targeted approach. Nonetheless, the following articles constitute adequate references for conducting method validation. Indeed, analytical methods used for non targeted metabolomics investigations should be thoroughly assessed prior to their use. Validation of the analytical method took into account the suggestions of the MSI 9 , recent propositions for method validation in non targeted metabolomics 11 and considered also conventional guidelines for quantitative methods. The method was validated in terms of extraction recovery, selectivity, repeatability, method precision, linearity, and instrumental precision. Limit of detection (LOD) and limit of quantification (LOQ) of the internal standards were also assessed, based on signal to noise ratios at 3 and at 10, respectively.
Pooled human plasmas were used to estimate the extraction recovery; to this end, samples were spiked at start (six independent replicates, QC_SS) and at the end of the extraction procedure (six independent replicates, QC_SE) with isotope labeled compounds exhibiting different functional groups, polarities, and molecular masses. Fortified materials were extracted following the procedure described above. This procedure was performed over three different days and for each ionization mode (positive and negative, respectively). The extraction recovery was determined by comparing the peak area of isotope labeled compounds in pre fortified test material extract (QC_SS, fortified before extraction procedure) with post fortified extract (QC_SE , fortified after extraction).
Selectivity of a method refers to the extent to which it can determine particular analyte(s) in a complex mixture without interference from other components in the mixture (15) . Here, selectivity was provided by detection of metabolites in the matrix, based on compounds' exact masses. Serial dilutions of QC extracts (1:2 and 1:4) were used to assess the linearity of the response for isotope labeled standards.
Instrumental precision was assessed by evaluating analytical reproducibility (intensity and peak area accuracy), system stability (mass accuracy, chromatograms alignment) and chromatographic reproducibility (retention time accuracy, chromatograms alignment). Chromatograms intensities were further plotted against all samples involved through the validation process, i.e., at day1, day2, and day3 for each ionization mode.
" SIEVE™ v2.2 software (Thermo Fisher Scientific) was used to preprocess '.raw' files from the UHPLC HRMS. The algorithm 'Component' was applied for background subtraction, component detection, peak alignments and framing. This step was meant to ensure the validity of the sequence, with appropriate chromatogram alignment before further processing. TraceFinder™3.0 software (Thermo Fisher Scientific) was employed for data processing, as it allowed us to verify automatically: isotopic pattern, expected m/z experimental m/z, expected retention time (RT) experimental RT, peak integration, dilutions linearity (1, ½, ¼ dilutions), MS/MS fragments (Top5 ddMS2 experiments were initiated for RIPC rat cohort), comparison with the MSMLS™ library for database match and metabolites identification. Inter batch normalization was applied to data from day1, day2 and day3 of the validation process; each sampling day corresponding to one analytical batch (sequence). The normalization followed the procedure previously described, assuming that the measurement errors in a single batch are randomly distributed and that different batches can be compared and corrected using the average or median value of the QC samples in a batch 16 .
MSMLS™ is a collection of high quality small biochemical molecules that span a broad range of primary metabolism. From the MSMLS™ metabolites library, an in house database (internal Appropriate personal protection equipment was used and chemicals were handled in a fume hood.
Flammable items were kept in chemical safety cabinet and material wastes were disposed of through clinical wastes.

# " " !
As quality control procedure, the following practices were systematically adopted before any data processing and analysis: comparison of column pressure to the previous analytical run, internal standards variation in QCs/study samples, and instrumental stability (reproducibility of RT and accurate masses) check along the sequence. These observations served as grounds for the acceptance of the analytical run. This is further reinforced through peaks alignment ( Figure S 1) confirming chromatographic reproducibility and system stability of the whole system.
The isotope labeled endogenous metabolites used were of diverse nature and chemical structure, covering a broad range of molecular masses, functional groups, polarities, and with RT covering the The objective of spiking with isotope labelled endogenous metabolites was to obtain a general snapshot of the method, even though results cannot be formally extended to all metabolites in the sample. The analytical performances of the method are presented in Table S 1. As can be depicted from Peak area and intensity repeatability for isotope labeled standards was set at 30%; it is common in non targeted metabolomics analysis by LC-MS to proceed with ions exhibiting CV below 30% in QCs, since it is considered that ions with higher CV would not be good candidates as biomarkers.
Checking the linearity of metabolites in diluted QCs was evaluated during the validation study, as we were planning to use the criteria of linear trend to filter metabolites in the data matrix. We only performed ½ and ¼ dilutions as we considered that further dilutions might lead to exclusion of some potential metabolites of low abundance.

%
The MSMLS™ library contains 619 unique metabolites, of a broad spectrum of key primary metabolites and intermediates including the following classes of compounds: carboxylic acids, amino acids, biogenic amines, polyamines, nucleotides, coenzymes, vitamins, mono and disaccharides, fatty acids, lipids, steroids, and hormones. Among the 619 metabolites of MSMLS™ library, we were able to reliably analyze 499 metabolites, in positive and negative ionization mode (Table S 2). Urea, initially not contained in the MSMLS™ library was added to our database. Our in house database includes 500 metabolites of key pathways, for which full MS, MS/MS spectra, RT and isotopic pattern were acquired, under our current chromatographic and mass spectrometry conditions.
Identification and confirmation of biomarkers from metabolomics investigations are essential for precision medicine.
In the case of isomer metabolites emerging as putative biomarkers, and not chromatographically separated, only the chemical formula is given. A specific chromatography method (different column chemistry, technique) will be used to achieve further metabolite separation, and thus confirm the identity of the metabolite. The HILIC HRMS method developed for biomarkers identification could be employed for that purpose. As a single analytical approach is not enough to cover the entire metabolome, under the generic chromatographic and mass spectrometry conditions used, 119 metabolites contained in the MSMLS™ library were not detected. These include some very highly phosphorylated molecules such as adenosine 5' triphosphate, cytidine 5' triphosphate, guanosine 5' triphosphate, requiring different column chemistry or the addition of ion pairing buffer to enhance their retention 18 .
The integration of multi approach including both RPLC and HILIC chromatography are necessary to circumvent this issue.   prediction ability. This approach clearly distinguished C from T groups. To further specify metabolites associated to the groups' separation, an S plot was generated and highlighted several potential biomarkers (Figure 2). The examination of the S plot was combined to jack knife confidence intervals (as displayed on the loading column plot). This was meant to foresee metabolites with high statistical reliability. Reliable metabolites presenting also variable importance in the projection (VIP) values higher than 1 were subsequently selected as potential biomarkers. Figure S 6, presents in details the selection process of the potential biomarkers.
At this stage, 7 potential biomarkers were revealed, i.e., 5 hydroxyindoleacetate, glycine, kynurenine, ornithine, Marker A (adenosine 5' monophosphate), Marker B (Aspartate), and Marker C (xanthosine). Univariate statistics were conducted to test the significance of the discriminant biomarkers in C and T samples. The distribution was significantly different from a normal distribution for L ornithine and Marker C; results for the Kolmogorov Smirnov test are presented in Table 1.
Levene's test for equality of variances indicated unequal variances in C and T samples, for 5 hydroxyindoleacetate, ornithine, Marker A, and Marker C (Table 1). Unequal variance two tailed t test was applied to all 7 putative biomarkers, in preference to the Student's t test and the Mann Whitney test 21 . Six metabolites out of the seven putative biomarkers were therefore found statistically significant, namely, glycine, kynurenine, ornithine, Marker A, Marker B, and Marker C ( Table 1).
Confirmation of the biomarkers was achieved by comparing experimental spectra obtained in rat plasma to authentic metabolite spectra available in the in house library. This succeeded in perfect matches. Figure 3 provides confirmation of kynurenine and Figure S 7 presents the confirmation data for glycine and ornithine.
Additionally, a HILIC HRMS method was developed to further ascertain the six biomarkers reported in the present investigation (Figure S 8, Table 2). As could be depicted from  1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60 ACCEPTED PAPER were performed a year after the originally RPLC experiments. The remaining rats plasma stored at 80°C were thawed once more to be re extracted for confirmation of the biomarkers in HILIC HRMS. Table 2 summarizes the characteristics of the identified and statistically significant biomarkers associated to the RIPC investigation. The biomarkers reported in this research were allocated to identification level 1, according to the current MSI reporting standards. From data acquired in negative ionization mode, 68 ions were identified, but none of them were found discriminant and statistically significant between RIPC and control groups.
The results obtained corroborate with findings from the LC MS/MS targeted metabolomics study previously performed 12 . The 3 main biomarkers identified by the latter study (glycine, kynurenine, ornithine), and confirmed in rats and human, were found in the current work. In both research, RIPC was found to be associated with a plasmatic decrease in ornithine and increase in kynurenine and glycine concentrations in rats. Spermine and carnosine were not found in the current investigation, while serotonin was detected but did not exhibit high statistically reliability. As shown on Figure S 5, serotonin did not pass our selection filters; it had a jack knife confidence interval through zero, a VIP inferior to 1, and was not statistically significant (p>0.05) following unequal variance two tailed t test.
3 additional cardioprotective metabolites were further evidenced in the present work, i.e., Marker A, and different laboratories and researchers.

! !
The analytical strategy reported here performs adequately and the workflow proved its applicability for metabolomics investigations. It allows a consistent exploration of metabolic signatures, with identification and confirmation of biomarkers in a single experimental sequence, using an in house metabolites library containing accurate m/z values, retention times, isotopic pattern, full MS and MS/MS spectra. Importantly, the three main biomarkers identified by a previous quantitative MS/MS targeted metabolomics investigation were found in the current investigation, and we further identified three additional potentially cardio protective metabolites. The six biomarkers found in the study were confirmed by RPLC and HILIC HRMS. Altogether, these findings prove that scientific reproducibility in metabolomics can be achieved successfully. Further perspectives include the application of the current pipeline to investigations with a greater samples cohort.