Convergent consequences of parthenogenesis on stick insect genomes

The shift from sexual reproduction to parthenogenesis has occurred repeatedly in animals, but how the loss of sex affects genome evolution remains poorly understood. We generated reference genomes for five independently evolved parthenogenetic species in the stick insect genus Timema and their closest sexual relatives. Using these references and population genomic data, we show that parthenogenesis results in an extreme reduction of heterozygosity and often leads to genetically uniform populations. We also find evidence for less effective positive selection in parthenogenetic species, suggesting that sex is ubiquitous in natural populations because it facilitates fast rates of adaptation. Parthenogenetic species did not show increased transposable element (TE) accumulation, likely because there is little TE activity in the genus. By using replicated sexual-parthenogenetic comparisons, our study reveals how the absence of sex affects genome evolution in natural populations, providing empirical support for the negative consequences of parthenogenesis as predicted by theory.

The raw mate-pair reads were de-linked and reverse complemented using NxTrim (v. 0.4.1) (82) with the parameter "--preserve-mp". Unlinked pairs without identified adapter sequence, called unknown pairs, were also considered as valid mate pairs as they had a similar distribution of insert sizes as mate pairs with identified linker sequence.
Scaffolds without hits to metazoans were removed from the assemblies. The genome assembly completeness was assessed with BUSCO (v. 3.0.2) (21) against the insecta_odb9 lineage and the -long option. For genome annotation, we took a total of 231 publically available RNA-seq libraries (Bioproject Accessions: PRJNA679785, PRJNA678950, PRJNA380865, PRJNA392384) for Timema from different tissues, sexes and developmental stages as expression evidence (min per species = 12, see Table S8) (34,50,51). Before mapping reads to the genomes, adapter sequences were trimmed from raw reads with CutAdapt (v. 1.15) (87). Reads were then quality trimmed using Trimmomatic (v. 0.36) (81), clipping leading or trailing bases with a phred score of <10 from the read, before using a sliding window from the 5' end to clip the read if 4 consecutive bases had an average phred score of <20. Any reads with a sequence length of <80 after trimming were discarded. All trimmed RNA-seq reads were then mapped against the genomes as single end reads using STAR (v. 2.5.3a) (88) under the "2-pass mapping" mode and default parameters. The STAR outputs were then used to produce transcriptome assemblies using Trinity (v. 2.4.0) (52) "genome guided" mode (Parameters: --genome_guided_max_intron 100000 --SS_lib_type R). Finally, the transcriptome assemblies were filtered following Trinity A combination of UniProtKB/Swiss-Prot (release 2018_01) (92) and the BUSCO insecta_odb9 proteome were used as protein evidence. The Trinity assembled RNAseq reads (described above) were used as transcript evidence. The resulting gene models were then used to retrain Augustus as well as SNAP (v. 2013.11.29) (93) and a second iteration was performed. Predicted protein coding genes were then functionally annotated using Blast2GO v5.5.1 (94,95) with default parameters against both the NCBI non-redundant arthropods protein database, and the Drosophila melanogaster (drosoph) database, to produce two sets of functional annotations, one derived from all arthropods and one specifically from Drosophila melanogaster.

Horizontal Gene Transfers (HGTs) are not facilitated by parthenogenesis
Genomic analyses of bdelloid rotifers, a group that likely persisted and diversified in the absence of canonical sex for over 40 million years (96), revealed that bdelloids carry an unusually large amount (6.2% -9.1%) of horizontally acquired genes compared to sexual lophotrochozoans (0.08% -0.7%) (16,(97)(98)(99). Unusually high proportions of HGT-derived genes were also identified in parthenogenetic root-knot nematodes (100,101) and springtails (102). These findings led to the suggestion that parthenogenesis might favor the retention of horizontally acquired genes, and may perhaps confer adaptive benefits that could compensate for the absence of recombination and outcrossing (100)

Analysis of heterozygosity for SNPs and SVs
We present two estimates of heterozygosity, one based on a reference-free technique (kmer spectra analysis using Genomescope (v. 2) (22), the other using sequencing reads mapped to reference genomes to call SNPs with GATK ((62), see Methods).
The kmer spectra of all sexual species displayed distinct haploid coverage peaks representing heterozygous kmers (Fig. S9A), contrasting with the kmer spectra of parthenogenetic species, where no distinct peaks were visible (Fig. S9B). To confirm that no heterozygous kmers were present at the expected haploid coverage in parthenogens, we used Smudgeplot (22), a technique to extract closely related kmer pairs representing heterozygous and paralogous kmer pairs. While in the sexual species, kmers from the 1n peak paired together in heterozygous kmer pairs (AB smudge on Fig. S9C), no diploid kmer pairs were detected in the parthenogenetic species (Fig. S9D). We conclude that heterozygosity estimates for the parthenogenetic species cannot be based on k-mer spectra analyses because the heterozygosity levels are too low to reliably fit the distribution estimating haploid kmers in the kmer spectra. Unreliable heterozygosity estimates based on k-mer spectra analyses for species with very low heterozygosity was already reported in Jaron et al. (16), suggesting that with the current quality of sequencing data, kmer methods do not have resolution for very small heterozygosity levels.
Because we could not estimate heterozygosity of parthenogens using kmer-spectra analyses, we estimated nucleotide heterozygosity using SNP calling. It is important to note, however, that this method generates an underestimation of heterozygosity given our fragmented reference genomes ( We further investigated if there were any heterozygous structural variations in parthenogenetic Timema, as those could be potentially hidden to SNP analysis. Consistent with the previous two analyses, the SV heterozygosity levels were substantially lower in the parthenogens than in their sexual sister species (Fig. 2).
However, we also detected a non-negligible amount of heterozygous structural variants. We therefore manually curated all heterozygous structural variants found in T. monikensis using samplot (v1.0.1) (106), but did not find a single variant clearly supported by reads (results not shown). Since structural variant calling from short read data has a high rate of false positives regardless of the method used (107), we decided to verify variants using a PacBio long-read dataset (~32x coverage) for one of the parthenogenetic species (T. douglasi; PRJNA673001). We assembled the long-read data of T. douglasi using Redbean (formerly wtdbg; v2.5) assembler (108) with parameters recommended for moderately sized genomes: -L 1000 -x preset3 -g 1300m. This genome assembly was used for SV calling using ngmlr (v0.2.7) and the Sniffles (v1.0.11) pipeline (109) with default parameters for SV calling using long read data. In total, we found only 6 heterozygous SVs: 4 deletions and 2 insertions. We visualized the SVs alongside their read support using samplot and found that none were well supported ( Fig. S10) suggesting that the heterozygous SVs called using short read data represent noise in the absence of a signal from real heterozygous SVs.
In conclusion, we used four complementary approaches based on three different data sources: kmer spectra analysis on raw sequencing reads of the reference individuals, SNP and SV heterozygosity estimates using variant calling based on resequencing data, and finally a long read dataset of T. douglasi, which was independently assembled and is therefore free of any potential biases introduced in a short read assembly. Our analyses comprehensively show the absence of heterozygous loci in the parthenogenetic Timema genome assemblies. Residual heterozygosity could be potentially found in repetitive regions, such as centromeres and telomeres (see also above), as all our effort to detect heterozygosity focused on alleles with 1n coverage (half of the genome coverage). However, detecting heterozygosity in such regions requires chromosome-scale assemblies based on long-read sequencing technologies, which are currently not available for parthenogenetic Timema.

Locating microsatellite markers in the genome assemblies
Previous research, based on microsatellite markers, suggested that oogenesis in parthenogenetic Timema was functionally mitotic, as there was no loss of heterozygosity between females and their offspring (17). Yet our genome data reveal complete or almost complete homozygosity in the genome assemblies of parthenogens (see main text). The most likely reconciliation of these contrasting results is that heterozygosity is maintained in only a small portion of the genome, for example the centromeres or telomeres, or between paralogs.
To investigate these possibilities, we searched for the primer pairs used to amplify the nine microsatellites in the genome assembly v1.3 of the sexual species T. cristinae from Nosil et al (32). This assembly is currently the most complete and least fragmented Timema assembly available, and the microsatellites used by (17) were originally developed for T. cristinae. We used Blast to find primer pairs <500 bp apart, on opposite strands, and retained significant hits with at least 80% of the primer sequences covered. We then verified whether the retained hits comprised the expected microsatellite repeat motif.
Using this approach, we were able to locate six of the nine microsatellites in the v1.3 assembly (Table S9). Two of the six microsatellites had multiple hits in the genome (Table S9). In combination, these results support the idea that microsatellite heterozygosity detected in Timema parthenogens may be a combination of heterozygosity in centromere or telomere regions (microsatellites not detected in the assembly) and heterozygosity between paralogs (microsatellites with multiple copies in the T. cristinae assembly).

Polymorphism in parthenogenetic and sexual Timema populations
To compare the distribution of polymorphism along different genomes, we mapped population-level variation for SNPs and SVs inferred from 2 to 5 re-sequenced individuals per population to our species-specific reference genomes (see main text).
We then anchored our reference genome scaffolds to the 12 autosomal linkage groups PRJNA725673) and females (this study) revealed that LG13 did not correspond to the X. We also removed X-linked scaffolds assigned to autosomal LGs in the v1.3 T.
cristinae assembly and used this "cleaned" set of linkage groups (referred to as v1. 4) in all our analyses with positional information. Depending on the species, we were able to anchor between 59 and 558 Mbp of our genomes to the T. cristinae LGs.
We also examined how genetic variation was distributed between individuals by producing phylogenetic trees for the re-sequenced and reference individuals of each  (113). Trees were generated with RAxML (61), with a GTR+gamma model with 40 rate categories for each codon position (i.e. each codon position (1st ,2nd, 3rd) was partitioned to allow a distinct model to be fitted to it) to produce an ML tree with 1000 bootstraps.

Polymorphism and color morphs on LG8 in the species T. californicum and T. monikensis
We found very high population polymorphism for both SNPs and SVs on LG8 in T. LG8, while green individuals had 1/1 genotypes (Fig. S11B).
We also investigated the presence of an approximately 0.5 Mb deletion on LG8, suggested by Villoutreix et al. (115) to determine the green morphs in T. californicum, in our ~24 Mbp long haplotypes. Because our reference genome was based on a grey individual (which would be homozygous for the deletion-free haplotype), we expected to observe normal coverage for this region in the re-sequenced grey individual, zero coverage in the green re-sequenced individual homozygous for the alternative haplotype, and half the coverage in the three heterozygous individuals. We observed a coverage reduction in all individuals at the focal region, which could be due to an enrichment in repetitive sequences (non-uniquely mapping reads are not included for coverage estimations). Nevertheless, the grey individual featured somewhat higher coverage, consistent with the deletion suggested by Villoutreix et al. (115) (Fig. S12).
Further studies are required to characterize the contribution of the two divergent, ~24 Mbp long haplotypes, and the putative 0.5 Mbp deletion in one of the haplotypes, to color polymorphism in T. californicum.

Transposable element activity in the Timema genus
The overall TE content was very similar in all ten species (20 -23.6%), and different only by 1.3% between the two clades separating at the ancestral node (see main text). We found that TE copies with high similarity (indicating recent duplication) had low overall abundance, supporting the idea that recent TE activity is low across Timema species (Fig. S13). Between sexual and parthenogenetic sister species, highly similar TE copies appear to be more abundant in parthenogenetic than sexual species. This result however is likely artefactual, caused by the systematically better genome assembly quality in the parthenogenetic than sexual species (Table S3), meaning similar TE copies will be collapsed more often in the sexual than parthenogenetic genome assemblies. This effect can be shown by the fact that the coverage of low divergence TEs is systematically higher for sexual species than parthenogenetic species, but this is not seen for bins with higher divergence (Fig. S14).          The region with the expected deletion is highlighted in yellow. If there was a deletion on LG8 determining the green morph, grey individual Tcm_02 should feature higher coverage than the other individuals which are green.

Fig. S13. TE landscapes indicate recent TE activity is low. Kimura substitution
level is a measure of divergence of the TE copies to the consensus TE sequence.
Recent TE expansions would cause high peaks close to 0. Note that the plots for closely related Timema species are very similar (i.e., the abundance peaks are at similar divergence levels), reflecting the TE activity in their common ancestors. Also note that although TE copies with small divergence levels (up to ~3%) appear to be more abundant in parthenogenetic than sexual species, this is likely an artefact caused by the systematically better genome assembly quality in the parthenogenetic species (see main text and Fig. S14), with similar TE copies more often collapsed in the assemblies of sexual than parthenogenetic species.

Fig. S14. Coverage of TE landscapes shows TE copies with low divergence from the consensus TE sequence (measured as Kimura substitution level) is higher
for sexual species than parthenogenetic species. This indicates that similar TE copies are more often collapsed in the assemblies of sexual than parthenogenetic species. Coverage was normalised by the median TE coverage for each species separately.