Multiscale analysis of genome-wide replication timing profiles using a wavelet-based signal-processing algorithm

In this protocol, we describe the use of the LastWave open-source signal-processing command language (http://perso.ens-lyon.fr/benjamin.audit/LastWave/) for analyzing cellular DNA replication timing profiles. LastWave makes use of a multiscale, wavelet-based signal-processing algorithm that is based on a rigorous theoretical analysis linking timing profiles to fundamental features of the cell's DNA replication program, such as the average replication fork polarity and the difference between replication origin density and termination site density. We describe the flow of signal-processing operations to obtain interactive visual analyses of DNA replication timing profiles. We focus on procedures for exploring the space-scale map of apparent replication speeds to detect peaks in the replication timing profiles that represent preferential replication initiation zones, and for delimiting U-shaped domains in the replication timing profile. In comparison with the generally adopted approach that involves genome segmentation into regions of constant timing separated by timing transition regions, the present protocol enables the recognition of more complex patterns of the spatio-temporal replication program and has a broader range of applications. Completing the full procedure should not take more than 1 h, although learning the basics of the program can take a few hours and achieving full proficiency in the use of the software may take days.


IntroDuctIon
Replication of eukaryotic genomes is an essential process that guarantees the accurate copying of genetic information before cell division. The regulation of this process during the synthesis phase (S phase) of the cell cycle is fundamental for genome stability. Replication starts from a set of initiation loci, called replication origins, where two replication forks are assembled and begin replicating DNA while proceeding in opposite directions, away from the loci; fork progression continues until two converging forks 'collide' at a terminus of replication 1,2 .
The DNA replication program in a cell is defined as the temporal sequence of locus replication events during the S phase. The program depends on the locations of the replication origins, their activation times and the speed at which replication forks move along the DNA double helix. Among these parameters, the location of the active replication origins has been the subject of extensive experimental interest. In Saccharomyces cerevisiae, replication origins are well defined by 100-to 150-bp-wide loci that present an 11-bp consensus sequence motif 1,2 . By contrast, the detection of replication origins in higher eukaryotes has proven to be more elusive. For many years, only tens of origins had been experimentally identified in metazoan genomes, including the human genome 3,4 , to be compared with the tens of thousands that are required to complete each replication cycle in mammals. The number of identified origins has markedly increased owing to high-throughput approaches based on the purification of replication bubbles and of small nascent DNA strands [5][6][7][8][9][10][11] . For example, a few hundred replication origins were recently mapped over ≤1% of the human genome 7,8 , and several thousands were mapped over 60 and 120 Mb of the mouse and Drosophila genomes, respectively 10 .
Note that comparative analyses of experimentally determined replication origin data sets for the same organism proved to be rather disappointing with low overlap (from < 5 to < 25%), and this inconsistency was observed even when the same technique was used 8,9 . The reasons for these discrepancies are still not known 4 , but they could be the result of insufficient origin DNA purification and amplification biases. Nevertheless, these analyses clearly point to the flexibility in replication origin usage across developmental stages and underline the importance of the local context, such as the chromatin state, the transcriptional activity and the activation patterns of neighboring replication origins [12][13][14][15][16][17] . Although the principles of replication initiation are likely to be conserved across eukaryotes, they are complex and far from being fully understood, particularly in higher eukaryotes 3,[18][19][20] .
The fact that robust methodologies exist to analyze the DNA replication program averaged over large populations of cells contrasts with the extreme difficulty of identifying individual replication origins 19,21,22 . Most methods of analysis of DNA replication programs consist of the initial sorting of cells according to their advancement in the cell cycle; for each locus, either the DNA contents of cells in the S and G1 phases, respectively, are compared or this comparison is performed between the amounts of newly synthesized DNA in two or more fractions of cells in the S phase 19,21 .
In all cases, at the end of the analysis, each locus is associated with a measure of the production of DNA as the S phase progresses, a measure that is assumed to be representative of the average replication timing of the specific locus in the cell population under investigation. However, the exact relationship between changes in the extent of DNA production and the mean timing of replication Multiscale analysis of genome-wide replication timing profiles using a wavelet-based signalprocessing algorithm along the S phase is usually not known. For example, regardless of the length of the S phase, the ratio between the DNA content in cells in the S phase and that of cells in the G1 phase (S/G1 ratio) is close to 2 at early-replicating loci, whereas it is close to 1 at latereplicating loci.
Today, replication timing profiles are available for several eukaryotic organisms ranging from yeast 23 to Drosophila 15,24,25 to mice 26,27 and to humans [28][29][30][31][32][33][34][35] . These data have been extensively analyzed in relation to transcriptional activity, chromatin state and genomic features, such as gene density and GC content. In metazoa, significant correlations have been observed between loci that are replicated early in the S phase and regions that present a strong transcriptional activity, an open chromatin state, a high gene density or a high GC content 15,[24][25][26]29,31,33,34 .
Replication timing profiles have also been used to identify replication domains along chromosomes. A frequently adopted approach is to look for constant timing regions (CTRs) using algorithms designed to delimit regions of locally flat profile, such as those developed to analyze variation in genome copy number (microarraybased comparative genomic hybridization) 22,26,27,29,32,36 . Alternatively, timing transition regions (TTRs) have been extracted from replication timing profiles directly by looking for long, steep transition regions of constant slope 33 . Such segmentations of replication timing profiles implicitly assume a crude dichotomous nature of replication domains, in which CTRs are regions of coordinated origin firings and the intervening TTRs are originless regions replicated by the unidirectional progression of a single fork 26,27,33 . Recent DNA-combing data have questioned these interpretations 37 .
Here we propose a novel method for analyzing replication timing profiles that does not assume this dichotomy.
For humans, mice and Drosophila, genome-wide replication timing data have been collected for several cell types 25,31,[33][34][35]38,39 ; this provides the opportunity to study changes in the replication program in relation to cell differentiation and across evolutionary history. Evidence shows that each cell type presents specific replication timing patterns; in fact, different cell type-dependent timing patterns occur in the replication of more than half of the human and mouse genomes 27,34 . In that respect, cell differentiation is concomitant with global changes in replication timing profiles, and some cell type-specific timing patterns are conserved between corresponding cell types 27,38,39 .
Single-cell determination of the replication timing profile cannot be currently achieved experimentally. As mentioned above, replication timing data are instead obtained from large cell populations (thousands to millions), and thus they only determine a population's average replication program. If we assume a nearly deterministic replication program, in which at each cell cycle nearly the same set of replication origins is used, and in which each replication origin fires at (roughly) the same time from cell cycle to cell cycle, then the average replication program reflects faithfully what occurs at the individual cell level. However, a growing body of evidence suggests that the replication program is stochastic in nature [40][41][42][43] and that no two replication cycles actually occur via the same set of replication origins and the same firing times.
If we accept the stochasticity of the replication program, some care is therefore needed in interpreting mean replication timing profiles [44][45][46] . For instance, the gradient of the mean replication timing profile has been proposed as a possible measure of the replication fork velocity 23 . However, this approximation holds true only for a nearly deterministic replication program. In fact, as replication forks propagate, the replication timing at a given locus depends not only on the local initiation properties, but also on the initiation properties of neighboring loci 44,47 . Replication origins have also been proposed to correspond to the minima of the mean replication timing profile 23 , but this intuitive claim turns out to be incorrect because of the confounding effect of passive replication by forks originating from nearby replication origins [44][45][46] . A careful and rigorous analysis is therefore necessary to measure replication kinetic parameters from mean replication timing profiles 44,47,48 . A very challenging 'inverse' problem is to infer the underlying initiation properties from the replication timing data 47,49,50 . Preliminary studies suggest that the time-dependent rate of origin activation averaged over the entire genome is conserved across eukaryotes 49,51 .
With the perspective of extracting information about the replication program directly from the DNA sequence, our research group carried out an in silico analysis of the strand composition asymmetry, also called skew (a measure of the difference of nucleotide compositions between the two complementary DNA strands), along the human genome in relation to replication 52,53 and transcription 54,55 . We found evidence that the sign of the skew abruptly changed at known origins, presumably reflecting the inversion of replication fork polarity at these sites as expected at replication origins from which emerge two divergent replication forks. We identified more than a thousand putative replication origins where the skew showed a similar abrupt inversion. These origins border 663 genomic N-domains, which are so called because their skew profile has an N-like shape that is attributed to differences in the mutation rates between the complementary DNA strands associated with the replication process 53,56-60 . To the best of our knowledge, these data provide the largest set of human replication origin predictions available to date.
Overall, skew N domains have a mean length L = 1.2 ± 0.6 Mb and cover 29.2% of the human genome 57,58 . Notably, the putative origins at N-domain borders were also shown to be at the heart of a remarkable gene organization 57 : genes neighboring these borders are abundant and broadly expressed, and their transcription is mainly directed away from the borders. These features (gene density, expression breadth, preferential gene orientation) weaken progressively as the distance between genes and these domain borders increases. N-domain borders were hypothesized to be replication origins active in the germ line, possibly specified by an open chromatin structure favorable for early replication initiation and permissive to transcription 61 . However, no data on germ-line replication origins enabled us to test this hypothesis. The analysis of human replication timing data 31 showed that a significant number of N-domain borders are initiation zones that replicate earlier than their surrounding regions during the S phase, whereas N-domain central regions replicate late 56 . These very promising results have been confirmed in recent analyses 59,62 of newly available genomewide replication timing data in several human cell types [33][34][35]38,39 .
In contrast with the viewpoint that replication domains are CTRs, we have shown that skew N-domains correspond to U-shaped replication timing patterns (in which the derivative of the replication timing is N-shaped) that are not specific to the germ line but are generally observed in multiple human cell types 62 . The early initiation zones bordering the replication timing U-domains have been found to be significantly enriched in open chromatin markers, including the insulator-binding protein CTCF 63,64 , and to be prone to gene expression.
Analysis of genome-wide chromatin conformation capture (Hi-C) data 65 has revealed a link between replication timing and spatial chromatin organization 38 . Further study has highlighted replication timing U-domains as self-interacting structural chromatin units 62 . Hence, the segmentation of the genome into replication U-domains coincides with the segmentation into topological domains identified by the analysis of chromatin interaction data 66 . Together, these results make a compelling case that the 'islands' of open chromatin observed at replication timing U-domain borders are at the heart of a compartmentalization of chromosomes into chromatin units of independent replication and of coordinated gene transcription 62 .
We present here the original wavelet-based multiscale methodology that we have developed for the analysis of replication timing data in order to extract information about the mechanisms underlying the spatio-temporal replication program in higher eukaryotes 35,37,59,62 . Our approach takes advantage of the knowledge of the replication timing continuously along the chromosomes.
In accordance with rigorous mathematical arguments, we consider the timing values per se, but we also consider the space derivatives (i.e., the local shape) of the timing profiles. In contrast with model-fitting strategies 47,67 , in this approach we extract information independently of any prior knowledge of parameters such as potential replication origin locations. By partitioning the genome on the basis of the apparent speed of replication (the inverse of the timing profile slope) 37 , we are in the position to question the reported absence of replication origins within TTRs 26,27,33 and to determine the extent of origin synchrony within CTRs. Moreover, as explained in the theoretical background section, given reasonable assumptions, the described wavelet-based methodology enables us to extract fundamental parameters, such as the average replication fork polarity and the difference between the densities of replication origins and termination sites 62 .
In this protocol, we describe the main steps of this signal processing strategy when applied to mean replication timing profiles in HeLa cells that have been fully calibrated to real time along the S phase 37 . Note that if this normalization is not possible, then replication time and speed estimates will not be absolute but will be relative to the total duration of the S phase. The absence of time calibration is a limitation to our protocol (as well as of any other method used so far), and so are the limited temporal and spatial resolutions of experimental replication timing data that are variable from one experiment to another. Our protocol has been implemented using LastWave (http://www.cmap.polytechnique. fr/~bacry/LastWave/), an open-source signal-processing command language. The PROCEDURE section is devoted to the description of the distinct LastWave commands and scripts that constitute the standard protocol for (i) loading an experimental timing profile, (ii) computing a space-scale map of the apparent speed of replication, (iii) detecting peaks in the timing profile as putative replication origins and (iv) delineating replication timing U-domains.

Applications
The wavelet-based multiscale methods presented here have a broad range of applications, for example, in physics, geophysics and image processing [68][69][70][71] . In the context of the analysis of biological data 60 , they have been used by our group (i) to demonstrate the existence of long-range correlations in DNA sequences related to the nucleosomal structure 72-74 , (ii) to characterize the organization of human and microbial genomes [75][76][77] , (iii) to analyze strand compositional asymmetry profiles in relation to transcription and replication 58,78 , (iv) to assist diagnosis in digitized mammograms 79 and (vi) to determine the shape of chromosome territories from confocal microscopy images 80,81 . Finally, we are currently developing novel procedures to define, from chromosome conformation data 65 , an objective segmentation of the human genome into topological chromatin domains 66 , which will enable us to quantitatively assess the relationship between replication domains and chromatin domains.

Theoretical background
The central hypothesis at the basis of our approach to extracting fundamental clues about the spatio-temporal program of DNA replication from mean replication timing profiles is that the replication fork speed v is constant. In this scenario, the spatio-temporal program of replication of one cell cycle is completely specified by the position x i and the firing time t i of the n activated bidirectional replication origins O i (refs. 44,47). Upon definition of T i as the termination locus in which the replication fork coming from O i meets the replication fork coming from O i + 1 (Fig. 1), straightforward calculations lead to the space-time coordinates (y i , u i ) for T i : and to the replication timing profile r(x) and replication fork polarity p(x) around origin O i (y i − 1 ≤x ≤y i ): where sign(u) = + 1 if u ≥ 0 and = − 1 if u < 0. Finally, by using the Dirac function δ to represent origin locations δ(x − x i ) and termination sites δ(x − y i ) ( Fig. 1c), we obtain the following fundamental relationships: In other words, we can extract, up to a multiplicative constant, the fork polarity p(x) (Fig. 1b) and the location of origins and of termination sites (Fig. 1c) by simply taking successive derivatives (equations (3) and (4)) of the timing profile r(x) (Fig. 1a).
As mentioned, today's experimental replication timing profile protocols require a large number of cells (millions), so that only population statistics data can be derived. Moreover, experimental timing profiles 31,[33][34][35]38,39 have a finite spatial resolution (tens of kb at best). For application to experimental replication data, as taking the spatial derivative commutes with statistical and spatial average, equations (3) and (4) remain valid after averaging: where 〈.〉 Cells,∆x stands for the average over many cells and over the spatial resolution ∆x and N x x Cells Ori , ( ) ∆ and N x x Cells Ter , ( ) ∆ are the number of origins and termination sites, respectively, per unit length averaged over many cells and the spatial resolution of ∆x. Note that when the speed of the replication fork is inhomogeneous, but fork speed fluctuations do not depend either on replication (6) (6) timing or on chromosomal position, then equations (5) and (6) remain valid, with v representing the average replication speed.
A concern about experimental data is the presence of noise. Strictly speaking, the derivative of a noisy profile is not defined and, correspondingly, the naive derivative of a profile based on the numerical difference between two successive samples is ill defined and numerically unstable. However, the derivative of a smoothed version of a noisy profile is well defined. In other words, the rate of signal variation has to be estimated over a sufficiently large number of data points. This objective can be achieved using the (continuous) wavelet transform (WT), which provides a powerful framework for the estimation of signal variations over different length scales 68,69 .
The WT is a space-scale analysis, which consists in expanding signals in terms of wavelets that are constructed from a single function, the analyzing wavelet ψ, by means of dilations and translations 68,69,82,83 : x and a ( > 0) are the space and scale parameters, respectively, and α is the normalization exponent. When using the derivatives of the Gaussian function, namely , then the WT of a profile f has the following property: of the Gaussian function. The regularization (smoothing) process underlying the use of the WT is at the heart of various applications of the WT microscope as a very efficient multiscale technique to detect singularities (e.g., discontinuities) of functions 68,69,[82][83][84] . Note that the norm of the rescaled Gaussian N a ( ) 0 is 1, so that the convolution N f a ( ) 0 * is a moving average of the f profile. When applied to replication timing profile 〈r(x)〉 Cells,∆x the spatial resolution in equations (5) and (6) becomes ∆x + a. It reduces to a for a >> ∆x and to ∆x for a << ∆x (at small scales, the resolution remains limited by the intrinsic spatial resolution of the data). Ori.

Spatial position
The fundamental hypothesis at the base of these representations is that the replication fork velocity v is constant.

EQUIPMENT
A computer running Unix/Linux or Mac OSX with access to the Internet  crItIcal Windows users should refer to the relevant comments in Equipment Setup. DNA replication timing profile data formatted as a column text file, in which a value of -1 is imposed for missing data points  crItIcal Other data file formats can be used, but the PROCEDURE detailed below must be modified accordingly (see the TROUBLESHOOTING section).
Branch of the LastWave signal-processing command language (freely available from our group (B.A.)). As compared with the main branch of the program, this branch implements the color-coding of WTs on the basis of the signed  loading the timing profile 2| Change the working directory to the directory in which the data file is stored. As an example, in this protocol, the HeLa cell timing profile of chromosome 1 determined in a 100-kb-wide window that slides along from the start of the chromosome by incremental steps of 10 kb (the sampling period is thus 10 kb) is uploaded 37 . The name of the one-column text file that contains these data is chr1_timing_HeLa_w100kbp_skip10kbp.dat, which can be found in replication_ timing/data. file cd _scriptDirectory[0] + "/replication_timing/data" 3| Upload the timing profile from a column text file into LastWave. Provide the name of the data file, as well as the coordinate of the first point and the distance between data points, to the loading routine; for example, for the data file provided, use 0.05 0.01 to have spatial axis expressed in Mb units. Note that the loading routine assumes that timing values for windows in which timing cannot be measured are set to − 1.
{rep_timing masked_regions masked_sig} = [load_timing_data "chr1_timing_HeLa_w100kbp_ skip10kbp.dat" 0.05 0.01] The routine returns a list containing three elements: the timing profile signal (rep_timing), in which missing data have been linearly interpolated; a binary signal (masked_sig) that is set to 1 in data-less regions and to 0 outside of them; and a list of signals of size 2 (masked_regions) corresponding to the start and end points of each of these data-less regions. The variables masked_sig and masked_regions keep track of the regions with missing data. Ensure that things look as expected by using the command disp to display the timing profile and the binary signal. The researcher will obtain the WT described by equation (7) [4,2]; see the LastWave manual). As a min = 5.0, and the sampling period is 10 kb, when the scale assumes its minimum value the smoothing Gaussian has a standard deviation of 50 kb. As a max = 5.0×2 (6 − 1) + 9/10 ≈299, when the scale assumes its maximum possible value the smoothing Gaussian has a standard deviation of 2.99 Mb. Option -e enables to specify the normalization exponent α of the WT (equation (7)); in order to obtain the first and second derivatives of the smoothed replication timing profile, according to equation (8), α is set to − 2 and − 3, respectively.  crItIcal step Beware that the actual meaning of a specific value for scale a is relative to the definition of the analyzing wavelet. The distance between the two extrema of N (1)  disp rep_timing w_N1 w_N2 -..* -norm 'signedglobal' 9| Explore specific regions of the display of the timing profile and of the space-scale representation of its first and second derivatives obtained in Step 8 using the mouse buttons (see LastWave manual for a description of the different zooming modes). As illustrated in Figure 2, when smoothing the timing profile over large distances (large values of scale a), both its first and second derivatives (equation (8)) appear black (null values of the derivatives), as the signature that the average fork polarity (equation (5)) and the difference between replication origin and termination site densities (equation (6)) are null. At small scale, red (positive values) and blue (negative values) regions are visible; these regions correspond to areas of positive and negative average fork polarity and to areas with either a density of replication origins larger than the density of termination sites or the opposite, respectively (Fig. 2).  The resulting space-scale map (Fig. 2d) presents three types of regions: at small scales, very few black regions are present; by definition, these are regions in which the timing profile space derivative is > 0.5 (these regions are given a value of 0 in the map), which corresponds to an apparent speed under 2 kb min − 1 . Green regions (areas given a value of 1 in the map) correspond to an apparent speed between 2 kb min − 1 and 10 kb min − 1 . Red regions (areas given a value of 2 in the map) correspond to an apparent replication speed over 10 kb min − 1 . Single fork speed estimates in higher eukaryotes are of the order of 1 kb min − 1 (refs. 37,85,86), which suggests that, for the timing profile analyzed here, only very few regions of size > 100 kb are replicated systematically by a single fork 37 (ANTICIPATED RESULTS).  crItIcal step If one assumes that, regardless of the individual cell taken into account, a specified DNA region R is systematically replicated by one individual fork moving across the region with a specified direction, then | < p(x) > Cells,R | = 1, and the derivative of the timing profile within the mentioned R region (equation (5)) is an estimate of the inverse of the fork speed. This assumption leads to define the inverse of d ,∆ as the local apparent speed of replication, that is, the speed of a single fork that would result in the same local slope of the timing profile. As the timing profile has a null derivative-an infinite apparent replication speed-in many loci, instead of taking the inverse of the WT obtained with N (1) , one can define space-scale regions belonging to different speed categories; for instance, | | . ,∆  ), as it is expected at the tip of a peak symmetrical about a vertical axis. As expected for regions that contain a replication origin, such loci indeed correspond to regions with null average fork polarity (equation (5)) and a higher value for the replication origin density than for the termination site density (equation (6)).  crItIcal step In mathematics and signal processing, graphs are habitually orientated in such a way that Y values increase upward. According to this convention, loci containing strong replication origins would be expected to appear as downward-pointing peaks, that is, loci with a positive local curvature. To follow instead the common practice in biology, here we choose to display the replication timing axis from the top to the bottom (early S phase at the top of the graph). This objective is achieved by setting the value of gclass.FramedView.default.reverse to 'none' in the file replication_timing/rep_timing.mlw.  15| Display the results of Steps 12-14 according to the example reported in Figure 3 and explore different regions of the profile using the mouse buttons. It is apparent in Figure 3d that connected areas of space-scale regions determined in Step 15, which present a large vertical extension up to small scales, highlight the existence of well-defined peaks in the timing profile. This vertical extension guarantees the robustness of the detection of the underlying putative replication origins with respect to the scale of observation. ? trouBlesHootInG Identifying u-shaped domains along replication timing profiles 16| If early replicating regions often correspond to peaks in the timing profile, the situation is different in late-replicating regions in which the profile frequently presents an approximately parabolic shape. In fact, a local U-shaped profile bordered by early replicating regions is a recurrent motif in replication timing profiles, which defines replication U-domains along the genome 62 . To systematically identify these domains, look for regions in the timing profile that are bordered by points that are local maxima of the curvature (corresponding to the location in which the U-shape breaks) and that have substantial negative curvature in their central regions, which is the hallmark of a parabolic curve. Note that in these U-domains, the derivative of the replication timing profile has an N-like shape, which indicates the existence of a gradient of replication fork polarity 62 , a characteristic previously recognized for the germ line in the skew profiles along the human chromosomes 52,53,56-58 .
17| Compute the WT of the replication timing profile with N (2) and the scale normalization exponent α = −1. Use the command extrema to compute the extrema representation associated with this WT (at each scale, the maxima of the modulus of the WT are computed and stored in an extrema representation variable w_N2.extrep; see LastWave documentation and online help: help extrema).  crItIcal step The convenient normalization exponent α of the WT (equation (7)) to estimate the threshold in the central curvature c depends on the characteristics of the U-shaped timing motif. If the depth of the U-shaped region is proportional to its width (i.e., the value of the depth is proportional to x 2 / L, where L is half the width), then c is proportional to 1 / L. If the depth of the U-shaped region is constant (i.e., its value is proportional to x 2 / L 2 ), then c is proportional to 1 / L 2 . For a parabolically shaped profile of width 2L, the scale where extreme curvature is observed using the WT is proportional to L. Thus, in order to apply a constant threshold at each scale, in the first scenario a normalization exponent α = − 2 should be used, whereas in the second scenario, the exponent should be α = −1. In this protocol, we emphasize the detection of the largest U-domains, the depth of which is constrained by the duration of the S phase. We therefore choose to represent our data according to the second scenario, which is more permissive than the first for the detection of large U-shaped regions.  (8)) and rejecting all U-domains in which this proportion is larger than 4%).
? trouBlesHootInG step 3 problem: Error message appears when reading the timing data file. possible reason: Incorrect format of timing data file. The loading procedure provided in Step 3 applies to replication timing data formatted as a one-column text file, in which a value of − 1 is imposed for missing data points. solution: LastWave can read data in a wide variety of formats so that users can modify the loading procedure definition in the replication_timing/rep_timing.mlw file in order to comply with their data format. See the commands 'read' and 'listv cread' for alternate ways to read data. help listv cread help read . For the purposes of this protocol, a global signed mapping rule has been implemented that can be accessed with the display option -norm 'signedglobal' used above. The reason for this choice is exemplified by the fact that displaying the derivatives of the smooth timing profile in Step 8 using the default color transfer function (disp rep_timing w_N1 w_N2) instead of the prescribed signed mapping rule (-..* -norm 'signedglobal') would produce pictures with high-intensity blue and red colorations at all scales, which would in turn prevent researchers from making the observation that the amplitude of the derivatives is negligible at large scales (Fig. 2), as discussed in Step 9. Note that users have control of the color maps. A color map from blue to red is set as the default color map in the replication_timing/rep_timing.mlw file using the command colormap current 'b2r'; an alternative rainbow color map can be chosen with the display option -cm 'color'. Color maps can be visualized using cmdisp 'color' and cmdisp 'b2r'. step 15 problem: When displaying the result of the peak detection protocol as exemplified in Figure 3, the user notices space-scale patterns that are not coherent across scales and that are in clear contrast with the well-defined vertical structures marked by the red dashed lines in Figure 3. Step 13). Please note that in the 'opposite' scenario, in which the threshold values are too stringent, users will notice that obvious replication timing peaks are missed, resulting in false negatives.

• tIMInG
Computing time and memory space requirements are not a limitation with today's desktop computers. The complete procedure, as described above, should not take more than 1 h, and it can be reproduced in tens of minutes by an experienced user. However, in order to become proficient, a new user needs to learn about the general syntax and capabilities of LastWave signal-processing command language. LastWave offers comprehensive documentation (that can be downloaded from the website) and online help for most commands (use help < command name > ), as well as interactive demonstrations; just type Demo from the LastWave command line and follow the instructions. Learning the basics of the program can take a few hours, but achieving the proficiency to be able to fully exploit the scripting power of the software and learning how to use the C-API may take days.

antIcIpateD results
The procedure presented here enables researchers to extract quantitative information that can be used to model the spatio-temporal DNA replication program in multicellular organisms. For instance, the histograms of apparent replication speed values reported in Figure 5a and obtained through this protocol show that, in HeLa cells, practically no regions larger than 100 kb replicate at a speed lower than 2 kb min − 1 (ref. 37). This scenario corresponds to the quasi-absence of black regions in Figure 2d. In these cells, the speed of a single replication fork measured using DNA combing was shown to have a narrow distribution around a mean value of ~0.7 kb min − 1 (ref. 37). Accordingly, the histograms of apparent replication speed in Figure 5a indicate that in HeLa cells chromosome 1 has no large DNA regions that undergo replication in a strictly unidirectional manner and that replication timing gradients correspond to a sequential pattern of replication origin activations 37 . Note that this result holds true for all chromosomes. Detection of peaks in the replication timing profile achieved as described in Steps 11-15 (Fig. 3) has been applied to the DNA replication timing profiles of six different human cell types 59 . Comparison of data on the locations of these ( > 1,000) putative replication origins reveals a substantial degree of conservation of replication origins among cell types. These data also provide further evidence in favor of the putative germ-line replication origins previously identified via the analysis of skew profiles 52,53,57,58 .
The procedure detailed here for detecting U-shaped motifs in the replication timing profile (Steps 16-22 and Fig. 4) enabled us to reliably detect megabase-size U-domains (Fig. 5b), which cover about half of the genome in seven human cell types (from 39.6% in lymphoblastoid cell line TL010 to 61.9% in stem cell line BG02) 62 . Notably, Hi-C data 65 show that U-domains in the DNA replication timing profile correspond to self-interacting high-order chromatin structural units. Replication U-domains are thus indicators of a segmentation of the genome into independent replication units, and their identification affords insight into the organization of the replication program in the human genome.