Large Gene Family Expansions and Adaptive Evolution for Odorant and Gustatory Receptors in the Pea Aphid, Acyrthosiphon pisum

Gaining insight into the mechanisms of chemoreception in aphids is of primary importance for both integrative studies on the evolution of host plant specialization and applied research in pest control management because aphids rely on their sense of smell and taste to locate and assess their host plants. We made use of the recent genome sequence of the pea aphid, Acyrthosiphon pisum , to address the molecular characterization and evolution of key molecular components of chemoreception: the odorant (Or) and gustatory (Gr) receptor genes. We identiﬁed 79 Or and 77 Gr genes in the pea aphid genome and showed that most of them are aphid-speciﬁc genes that have undergone recent and rapid expansion in the genome. By addressing selection within sets of paralogous Or and Gr expansions, for the ﬁrst time in an insect species, we show that the most recently duplicated loci have evolved under positive selection, which might be related to the high degree of ecological specialization of this species. Although more functional studies are still needed for insect chemoreceptors, we provide evidence that Grs and Ors have different sets of positively selected sites, suggesting the possibility that these two gene families might have different binding pockets and bind structurally distinct classes of ligand. The pea aphid is the most basal insect species with a completely sequenced genome to date. The identiﬁcation of chemoreceptor genes in this species is a key step toward further exploring insect comparative genetics, the genomics of ecological specialization and speciation, and new pest control strategies.


Introduction
Aphids are model organisms in both evolutionary and applied biology. This status is mainly related to features of their biology that enable them to locate and exploit their host plants (Powell et al. 2006). Evolutionary biologists focus on host plant selection and use in phytophagous insects because species and host races that are specialized on different plants are model organisms for the study of population differentiation in sympatry and ecological speciation (Via 2001;Berlocher and Feder 2002). Ecologically based divergent (i.e., positive) selection is expected to drive the evolution of host plant specialization, and the joint evolution of host preference and host-associated performance is critical to speciation in phytophagous insects in the presence of gene flow (Drès and Mallet 2002). Particularly important is the possibility that the same recognition processes, and so genes, may underlie both preference and performance and incidentally generate assortative mating, overcoming the barrier to speciation caused by recombination (Jiggins et al. 2005;Servedio 2009).
More than 90% of the ;4,000 species of aphid are specialists on one or a few plant hosts and host races exist within many taxa (Bush and Butlin 2004). In particular, extensive studies on host races of the pea aphid (Acyrthosiphon pisum) have shown that these populations are ecologically specialized and genetically differentiated for host use and that reproductive isolation among races has evolved as a by-product of host plant specialization (the reduction of gene flow occurring by a combination of habitat choice, selection against migrants and ecologically based selection against hybrids) (Via 1999;Via et al. 2000;Hawthorne and Via 2001;Via and Hawthorne 2002;Ferrari et al. 2006Ferrari et al. , 2008. The availability of the genome sequence of the pea aphid (The international Aphid Genomics Consortium, submitted) means that this species has enormous potential for understanding the genetic basis of host shifts and, more generally, of adaptive evolution and ecological speciation.
Many aphid species are also known as major crop pests: By damaging crops through direct impact on plant tissues and virus transmission while feeding on their host plants, aphids are responsible for crop losses estimated at hundreds of millions of dollars annually worldwide (e.g., Morrison and Peairs 1998). Therefore, applied research challenges reside in the development of pest control strategies to reduce the impact of aphids on crop plants (Pickett et al. 1997).
Host plant selection by aphids is not a random process; these insects employ a variety of sensory and behavioral mechanisms to locate and recognize their host plants, and chemoreception plays a fundamental role in this host selection process (Pickett et al. 1992;Park and Hardie 2003). Indeed, the feeding and reproduction of an aphid on a plant depends upon the successful completion of a series of behavioral acts based on olfaction and gustation: Firstly, the insect lands on a potential host, responding to visual and olfactory cues (Pickett et al. 1992;Powell and Hardie 2001;Pickett and Glinwood 2007); after settling, aphids probe the cuticle and the epidermis of the leaf with their mouthparts, this probing phase being the key phase in host acceptance as demonstrated in host races of the pea aphid (Caillaud and Via 2000). In addition, the choice of a plant which to land on can also be influenced by the recognition-through pheromone communicationof conspecifics present on a plant (Pope et al. 2007). Studies have already been started to decipher the neurophysiological pathways underlying such chemosensory processes in aphids (Hardie et al. 1995;Visser et al. 1996;Park et al. 2000;Park and Hardie 2004;Kristoffersen et al. 2008) and chemical ecologists have identified some semiochemicals and pheromones that mediate these behaviors (Birkett and Pickett 2003;Del Campo et al. 2003;Pickett and Glinwood 2007;Webster et al. 2008). However, although taste and smell appear to be the main senses underlying host selection in aphids, the genes underlying the detection of chemical compounds in the environment are still entirely unknown in this group of insects in which new insights into the mechanisms of host plant selection would be of great interest for a large community. Moreover, identifying the genes underlying chemoreception is a key step toward understanding the evolution of traits involved in host plant specialization and ecological speciation and the role of ecologically based positive selection in driving adaptive divergence among populations.
During the last decade, large families of genes encoding receptors responsible for detecting chemical stimuli in the environment have been revealed from genomes in various phyla (Matsunami and Amrein 2003;Ache and Young 2005;Bargmann 2006). The insect chemoreceptor (Cr) superfamily of 7TM ligand-gated ion channels (Sato et al. 2008;Wicher et al. 2008) consists of a basal gustatory receptor (Gr) family, of great protein diversity, and a more derived odorant receptor (Or) family of more limited diversity. Both families have been shown to have a primary role in stimulus detection and discrimination (Vosshall and Stocker 2007). The Gr and Or families can be recognized based on sequence homology and their expression and function primarily in gustatory and olfactory neurons, respectively (reviewed, e.g., in Rutzler and Zwiebel 2005;Benton 2006;Hallem et al. 2006;Vosshall and Stocker 2007). Insect Crs are characterized by high sequence divergence, gene duplication and gene loss, and low levels of expression in specific sensory neurons (e.g., Robertson and Wanner 2006;Nozawa and Nei 2007), these features making their discovery highly dependent on genome sequences (e.g., first characterization in Drosophila melanogaster Clyne et al. 1999;Gao and Chess 1999;Vosshall et al. 1999;Clyne et al. 2000;Robertson et al. 2003). As newly sequenced insect genomes have become available, structure and sequence similarity comparisons with already identified receptors have allowed the molecular characterization of the Cr repertoire in several Drosophila species (Guo and Kim 2007;McBride 2007;Gardiner et al. 2008), other Diptera (Anopheles and Aedes mosquitoes: Hill et al. 2002;Kent et al. 2008) and insects in several other orders (e.g., honey bee Apis mellifera: Robertson and Wanner 2006; red flour beetle Tribolium castaneum: Consortium T.G.S. 2008; Engsontia et al. 2008;silkworm moth Bombyx mori: Wanner and Robertson 2008). However, interspecific divergence, scarcity of orthologous groups, and species-specific expansions of particular genes (Krieger et al. 2003;Robertson and Wanner 2006) continue to challenge automated gene prediction software and the process of annotation still requires intensive manual effort, especially for groups like aphids that are distantly related to the taxa already characterized.
Only one protein putatively involved in olfaction has been identified in the entire hemipteroid assemblage (Dickens et al. 1998). However, the pea aphid (A. pisum) genome has recently been sequenced (The international Aphid Genomics Consortium, submitted), offering a unique opportunity to identify the chemoreceptor repertoire for the first time in an aphid species and to address the evolution of this gene superfamily. If ecologically based selection drives host plant specialization and if host plant recognition relies on chemosensory processes, chemoreceptor genes are expected to show some signature of evolution under positive selection. Here, we report the manual annotation and characterization of the Or and Gr gene families in the genome of the pea aphid and we test the above prediction by testing for positive selection within sets of paralogous Or and Gr expansions. We provide evidence that the most recently duplicated Or and Gr loci have evolved under positive selection, which might be related to the high degree of ecological specialization of this species.

Identification of Pea Aphid Ors and Grs by Bioinformatics
Known insect Ors and Grs whose sequences have been entered into GenBank (National Center for Biotechnology Information, NCBI) were used to search for similar genes in the pea aphid genome sequence (TBLASTN searches against pea aphid raw traces and the first genome Assembly 1.0 available at http://www.hgsc.bcm.tmc.edu/-Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX; http://genoweb1.irisa.fr/AphidBase/Blast/ Blast.php/-AphidBase, INRA, Rennes, France, and in the Whole Genome Shotgun database-http://blast.ncbi. nlm.nih.gov/Blast.cgi-NCBI at the NIH, United States). Identified pea aphid Ors and Grs were in turn employed in searches to find more genes in an iterative process and PSI-BlastP searches were used to detect highly divergent sequences (Arthropod PSI-BLASTP server at http:// insects.eugenes.org/arthropods/blast/psiblast.html). Genomic scaffold sequences were used to construct Or and Gr genes manually using known Or/Gr exons as templates and using programs to edit and manipulate sequences (PAUP* v4. 0b10, Wilgenbusch andSwofford 2003 andDNA Dynamo, BlueTractorSoftware Ltd, 2007) and to predict exon-intron splice sites (SplicePredictor, http://deepc2.psi.iastate.edu/ cgi-bin/sp.cgi/; BioEdit, Hall T., Ibis Biosciences, 2007). Pea aphid Or and Gr genes/proteins were named ''ApGr'' and ''ApOr'' followed by a number. Due to high divergence and/or discontinuity among contigs/scaffolds, not all detected Or/Gr genes could be entirely annotated: In these cases, deduced amino acid sequences shorter than 150 amino acids were discarded as probable gene fragments and for long-enough partial sequences a suffix N or C after the gene-protein name was used to indicate that the N or C terminus is missing (Kent et al. 2008). When frameshifts or stop codons were identified in the gene sequence, we defined these genes as putative pseudogenes (suffix P). When sequences looked very similar, we used pairwise identity measures to distinguish between paralogs or allelic variants, as well as their locations (if on a separate small contig theywere considered to be likely allelic variants, but if in long tandem arrays in one contig, they were considered to be paralogs). A few apparent such alternative haplotypes were found (seven in the Grs and one in the Ors) and were excluded from the final list of Cr genes. That the intact genes belong to the chemoreceptor gene family was 2074 Smadja et al. supported by matches to other insect Crs (BlastP searches against the NCBI nonredundant protein database) and by the number and location of trans-membrane domains in the predicted proteins (TransMembrane prediction using Hidden Markov Models [TMHMM]: Krogh et al. 2001). Manually obtained gene models were checked on the final draft Assembly 1.0 and compared with automated predictions. Only a handful of manually annotated genes were perfectly automatically predicted, thus confirming that the annotation of chemoreceptor genes still requires intensive manual effort. Protein alignments were used to identify irregularities and refine the gene structures (ClustalX, Thompson et al. 1997). Two of these gene models were confirmed by the presence of some representative genes in the Expressed Sequence Tag (EST) database (ApGr24 and ApOr18), see Results for details. Although most of the Cr genes are likely to have been identified here, we cannot exclude having missed some highly divergent genes, despite using PSI-BLASTP searches, because if there is not at least a partial gene model in the automated gene set, PSI-BLASTP searches will not find them, and TBLASTN searches can miss genes with short exons encoding divergent amino acid sequences.

Phylogenetic Analyses
For each family, phylogenetic analyses were conducted on aligned aphid protein predictions (all identified Ors; all but four Grs-excluding two short and two divergent sequences) as well as other representative insect Crs showing evidence of ''orthologous'' relationships with aphid sequences in BlastP searches. Gr and Or sets of sequences (respectively, 101 and 83 sequences) were aligned by ClustalX, with manual adjustments. N-terminal, C-terminal and a few internal regions of poor alignment and large gaps were removed from the alignment. Phylogenetic analysis (distance and maximum likelihood trees) of these two large data sets was performed using corrected distance analysis in Tree-Puzzle v5.0 (Schmidt et al. 2002) and PAUP*v4.0b10 (see Hill et al. 2002;Robertson et al. 2003;Robertson and Wanner 2006;Kent et al. 2008). Bayesian analysis was performed using MrBayes v3.1 (Ronquist and Huelsenbeck 2003) with the Jones, Taylor, Thornton (JTT) substitution model (Jones et al. 1992), four chains, one million generations, and two runs. Trees were sampled every 100 generations, discarding a burn in of 250,000 generations. Support for major branches was calculated as percent of 1,000 uncorrected distance bootstrap replications, 10,000 quartet maximum likelihood puzzling steps, and Bayesian posterior probabilities (PPs).

Molecular Evolution Analysis
Tests for variation in selective pressures and for positive selection were performed using the codeml program of the PAML package (Yang 1997), which estimates by maximum likelihood method x ratios of the normalized nonsynonymous substitution rate (d N ) to the normalized synonymous substitution rate (d S ). x . 1 is considered to be evidence of positive selection for amino acid replacements, whereas x , 1 indicates purifying selection. We tested for patterns of positive selection in three groups of genes that underwent recent expansions with a mean synonymous distance less than 1 (these groups being called ''Gr clade A,'' ''Or clade A,'' and ''Or clade B,'' see details in the Results section and in figs. 1 and 2). We used only the putative intact genes, removing all the pseudogenes and partially annotated genes in this analysis (tables 1 and 2). We aligned the protein sequences using ClustalX (Thompson et al. 1997) and the nucleotide sequence alignment was guided by the protein sequence alignment. For each clade, the phylogenetic tree of intact sequences was rebuilt using the Neighbor Joining method (Saitou and Nei 1987) in MEGA 4 (Tamura et al. 2007). Patterns of variation in selective pressures and of positive selection among paralogous genes in each of the pea aphid-specific clades were tested using ''branch-specific'' and ''site-specific'' methods. The branch-specific method compares the following two models: The null model (one-ratio model) assumes all branches examined have the same x ratio, whereas the alternative model (alternative model) allows the x ratio to vary among branches in the phylogeny (Yang 1998;Yang and Nielsen 1998). Thus, a significantly higher likelihood of the alternative model than that of the null model indicates variation of selective pressures among branches, some of which might have undergone positive selection (Yang 1998;Yang and Nielsen 1998). For the analysis of the site-specific model, we first compared the recently implemented codon-substitution models M8a and M8. This test has been shown to be conservative and robust against violations of various model assumptions and to produce less false positives than the M7-M8 comparison (Swanson et al. 2003;Wong et al. 2004). The alternative model (M8) assumes a beta distribution to model variable selection pressure among sites and allows for an extra x parameter that can be greater than 1. The null model (M8a) differs from the alternative model in that the extra x is fixed at 1. The comparison between models was assessed using Likelihood-Ratio Tests (LRTs) for hierarchical models (Anisimova et al. 2001), a significantly higher likelihood of the alternative model than that of the null model indicating positive selection in the data set examined. Although M8 and M8a models were shown to be reliable and robust for detecting positive selection in computer simulations (Swanson et al. 2003;Wong et al. 2004), it remains possible that the evolutionary patterns of chemosensory receptors in the pea aphid genome could violate the assumptions of M8 and M8a models, which could lead to false detections of positive selection. To further confirm our results, we additionally compared LRT of M1a (Nearly Neutral) and M2a (Selection) models, which has been shown to be one of the most conservative tests for selection (e.g., Wong et al. 2004;Yang et al. 2005). For both M8a-M8 and M1a-M2a comparisons, we used degree of freedom, df 5 2. For each analysis, correction for multiple testing (Bonferroni correction) was applied. Only in cases where LRT was significant, we used the Bayes empirical Bayes (BEB) procedure to calculate the PPs to identify sites under positive selection (Yang et al. 2005). We further examined the distribution of the inferred positively selected sites by mapping the amino acids under positive selection onto the chemoreceptor topology using TMHMM.
Evolution of Chemoreceptor Genes in the Pea Aphid Genome 2075 FIG. 1.-Phylogenetic relationships of the pea aphid ApGrs with representative Grs from other insects. This corrected distance tree was rooted at the midpoint. Major gene lineages discussed in the text are indicated by vertical bars on the right, including the clade examined for selection. Support for major branches is shown above them as percent of 1,000 uncorrected distance bootstrap replications, 10,000 quartet maximum likelihood puzzling steps, and Bayesian PPs, but only if they were above 50%. Suffixes after gene-protein names are: P-pseudogene, F-fixed gene model, J-joined gene model, N-N-terminal exon(s) or region missing, C-C-terminal exon(s) or region missing, I-internal exon(s) or region missing. Species abbreviations are Am-Apis mellifera, Ap-Acyrthosiphon pisum, Bm-Bombyx mori, Dm-Drosophila melanogaster, and Tc-Tribolium castaneum. Names of Grs from other species are in bold type. 2076 Smadja et al.

The Gustatory Receptor Family
We identified 77 ApGr genes in the draft genome sequence (supplementary table S1a, Supplementary Material online; deduced amino acid sequences in supplementary table S2a, Supplementary Material online). Only three of these gene models are present in the REFSEQ set of gene models for aphid generated by NCBI, presumably because REFSEQ gene models require considerable comparative or experimental support, and only one corresponds perfectly FIG. 2.-Phylogenetic relationships of the pea aphid ApOrs. This corrected distance tree was rooted by declaring the DmOr83b orthologs as the outgroup, based on the basal location of DmGr83b within the Or family and its similarity to the Grs (see Robertson et al. 2003). Other details are as in figure 1 legend.
Evolution of Chemoreceptor Genes in the Pea Aphid Genome 2077 with our annotation. As described further below, few of these ApGrs have significant sequence similarity to Grs in other insects, and there is EST support for just one gene model (see below). Automated gene models in the combined GLEAN gene set are present for an additional 48 genes; however, only four of these are correct. The remainder required changes ranging from addition of an exon, to splitting or fusion of existing gene models. Some of these remain as partial or truncated gene models with exons missing in assembly gaps, whereas others are in fact pseudogenes. Twenty-six new gene models are proposed, of which nine encode apparently full-length intact and potentially functional Grs. All of these gene models and proposed changes have been communicated to AphidBase and will be included in future releases of the genome annotation. Genes were numbered in a phylogenetic order as described below.
ESTs usually provide a source of experimental support for gene models, however the chemoreceptors in general (Vosshall et al. 1999), and the gustatory receptors in particular (e.g., Thorne and Amrein 2008), are usually ex-pressed at such low levels that they seldom show up in EST sets (e.g., Patch et al. 2009). As expected then, when 168,186 pea aphid ESTs (e.g., Sabater-Munoz et al. 2006) were searched for matches to the Gr proteins encoded by our gene models using the TBlastN algorithm, there were several hits but most of these were overlapping Untranslated Regions from flanking genes or unspliced sequences of undetermined origin. Just one EST (GenBank accession EX606636.1) is from a spliced mRNA from ApGr24, which is the third Gr, in addition to two of the sugar Grs, for which there is a REFSEQ at GenBank. The splicing of three introns in this EST and gene model agrees with all the others in the set of 21 genes in an aphid Gr clade described below, providing additional confidence in our gene models. Finally, we note that no obvious instances of alternative splicing of long N-terminal exons into shared C-terminal exons, which are found in the Grs of flies (e.g., Hill et al. 2002;Robertson et al. 2003;Kent et al. 2008) and Tribolium (Consortium T.G.S. 2008), were detected.  For phylogenetic analysis, we included 26 Grs from other species to provide context for the ApGrs, yielding a total of 101 proteins (ApGr12 and 13 were excluded from the analysis because they are so highly divergent they lead to alignment difficulties and exceptionally long branches, as were ApGr28 and 29 because they are short and severely damaged pseudogenes). The Grs from other species (see fig. 1 for species name abbreviations) are 5 carbon dioxide receptors (DmGr21a and 63a, and TcGr1-3), 11 sugar receptors (DmGr5a and 64a, BmGr4-8, TcGr4 and 19, and AmGr1/2), three orthologs of DmGr43a (BmGr10, TcGr20, and AmGr3), and AmGr4-10 representing the most basal published Grs (Robertson and Wanner 2006). The resultant phylogenetic tree based on corrected amino acid distances is shown in figure 1. Note that in the rest of the article, major Or and Gr clades will be sometimes referred to as subfamilies within each chemoreceptor family.
As was the case for A. mellifera (Robertson and Wanner 2006), no orthologs were discovered for the carbon dioxide heterodimer (D. melanogaster, Jones et al. 2007;Kwon et al. 2007) or heterotrimer (mosquitoes, B. mori, and T. castaneum, Lu et al. 2007;Robertson and Kent 2009). This gene lineage is also missing from all available more phylogenetically basal insect and related arthropod genomes, despite being highly conserved within the insects listed above, so it appears to have been lost independently from multiple insect lineages, or conserved only after the divergence of the Hymenoptera ) . However, it is still unknown whether aphids can detect carbon dioxide and whether this gas affects aphid oviposition (Peltonen et al. 2006;Guerenstein and Hildebrand 2008). There is also no apparent ortholog for the otherwise well-conserved DmGr43a protein, the ligand of which is unknown.
The pea aphid Gr family does contain several members of the sugar receptor subfamily that is present in all other studied insects  and references therein). This highly divergent Gr subfamily is not well resolved in the tree in figure 1 which should only be taken to show the presence of candidate sugar receptors in the pea aphid. More detailed analysis of these candidate aphid sugar receptors along with the data set of Kent and Robertson (2009) better reveals their relationships (data not shown). ApGr1-4 are a recent set of gene duplicates sharing at least 65% encoded amino acid identity in a tandem array in SCAFFOLD15735. They are most closely related to the AmGr1 and BmGr5-8 lineage in other insects, a relationship confirmed by the presence of a unique intron near the 5# end of these genes, intron ''g'' in Kent and Robertson (2009). ApGr5 is related to the AmGr2 and BmGr4 lineage in other insects, a relationship supported by the apparent presence of a unique intron in the middle of these genes, intron ''l'' in Kent and Robertson (2009), although the gene model is not completely unequivocal in this region (this gene model is also missing at least one exon at the Nterminus which might be in an upstream assembly gap). The presence of these two gene lineages in the pea aphid supports the prediction of Kent and Robertson (2009) that they represent an ancient and basal heterodimeric sugar receptor. In the pea aphid, like B. mori, the duplication of the first lineage into ApGr1-4 suggests that each of these four Grs might form heterodimers with ApGr5, allowing detection and differentiation of various sugars. The pea aphid also has a third candidate sugar receptor lineage, the divergent protein ApGr6. It has no simple relationship to any of the other sugar receptors, but does appear to belong in the subfamily both phylogenetically and by virtue of having a glutamic acid immediately after the conserved TY pair in trans-membrane 7 domain, a defining feature of the sugar receptor subfamily , as well as many other distinctive amino acids and intron locations described therein. Detailed expression and functional studies will be required to determine whether it might form heterodimeric sugar receptors with ApGr1-5.
ApGr7-10 constitute another tandem array of four genes in 48-kb SCAFFOLD16347 and share at least 43% encoded amino acid identity. They have no simple phylogenetic relationship to any other known Grs. ApGr11-15 are five highly divergent proteins each on separate scaffolds (supplementary table S1a, Supplementary Material online), with no simple phylogenetic relationships with each other, other pea aphid Grs, or other insect Grs. Indeed, ApGr12 and 13 are so highly divergent they are hard to align with any of the other proteins and hence have very long branches in phylogenetic analysis and were excluded from figure 1. We are nonetheless confident they belong in the chemoreceptor superfamily because they have the conserved TYhhhhhQF motif in TM7 domain that is characteristic of the Gr family (where h is any hydrophobic amino acid) (e.g., Wanner and Robertson 2008). ApGr16-36 form a monophyletic lineage of 21 genes sharing at least 25% amino acid identity, but dispersed around the genome in several sets of tandem duplications (supplementary table S1a, Supplementary Material online). This lineage includes two highly damaged pseudogenes that were not included in figure 1 (28 and 29), as well as two truncated gene models missing C-terminal exons in assembly gaps (22 and 26).
Finally, ApGr37-77 belong to a monophyletic lineage of 41 genes in several sets of recent tandem duplications (supplementary table S1a, Supplementary Material online). This lineage includes 11 incomplete gene models, all of which are assumed to be complete in the genome but are missing exons or parts of exons in assembly gaps (two of these gene models involve joining N-and C-terminal parts across interscaffold gaps that do not yet have experimental support for these joins, and another three are incomplete in the assembled genome but were repaired using raw traces from the Trace Archive at the NCBI). It also contains eight apparent pseudogenes containing various defects that should preclude their function, such as stop codons within alignable exons, intron splice defects, and frameshifts or large internal insertions or deletions. This lineage also contains the vast majority of the gene fragments encoding less than 50% of an average Gr, which were excluded from the numbering system. Most of these appear to be truly fragments in the genome rather than genes truncated by assembly gaps. This lineage therefore appears to be the most rapidly evolving Gr lineage in the pea aphid. The Grs in this lineage also have considerable divergence of the otherwise conserved TYhhhhhQF motif in TM7, being (S/T)(G/ A)hhThhQM in these proteins.

Evolution of Chemoreceptor Genes in the Pea Aphid Genome 2079
The Odorant Receptor Family We identified 79 ApOr genes in the draft genome sequence (supplementary table S1b, Supplementary Material online; deduced amino acid sequences in supplementary table S2b, Supplementary Material online). Similar to the Grs, only three of these gene models are present in the RE-FSEQ set of gene models for aphid generated by NCBI. ApOrs, even more than ApGrs, are highly divergent from all other insect odorant receptors identified so far, making their identification by sequence similarity quite difficult. Although automated gene models in the combined GLEAN gene set are present for an additional 61 genes, only five of these are correct and the remainder required many changes. Among the 79 gene models proposed, 48 encode apparently full-length intact and potentially functional Ors. Ten other full-length genes showing frameshifts or stop codons are considered here as putative pseudogenes. When ESTs were searched for matches to the Or proteins encoded by our gene models, just one EST (GenBank accession CN586195) corresponded to a spliced mRNA from an ApOr gene (Or18). Nevertheless, this result provides additional support for the gene model characterizing the aphid Or subfamily to which ApOr18 belongs (described below). All of these gene models and proposed changes have been communicated to AphidBase and will be included in future releases of the genome annotation. Genes were numbered in a phylogenetic order as described below.
One Or, the DmOr83b protein from D. melanogaster, is unusual in that it is the heterodimeric partner for all the other Ors, at least in D. melanogaster (e.g., Larsson et al. 2004). Experimental evidence suggests the same in other insects such as moths (Kiely et al. 2007) and bees (Wanner, Nichols, et al. 2007). It is an unusually conserved protein encoded by a single copy gene in species studied to date. As expected, therefore, we found a single ortholog for DmOr83b and named it ApOr1. In our phylogenetic analysis, it clusters confidently with the single DmOr83b orthologs from other insects, and this protein lineage was declared the outgroup to root the tree ( fig. 2). Unlike the Gr analysis, additional Ors from other insect species were not included in the tree analysis, because there are no hints of ''orthologous'' relationships between the remaining 78 ApOrs and other insect Ors. That is, all the other ApOrs form species-specific gene subfamily expansions or are highly divergent singletons. This observation is expected because even in comparisons between Drosophila and mosquitoe species there are few orthologous relationships (Hill et al. 2002), and the B. mori, T. castaneum, and A. mellifera Ors also all form lineage-specific subfamilies (Robertson and Wanner 2006;Wanner, Anderson, et al. 2007;Engsontia et al. 2008).
ApOrs form two large lineage-specific subfamily expansions, 1 of 11 genes (ApOr5-15) and one of 64 genes (ApOr16-79). These expansions include some tandem arrays of genes (i.e., ApOr20-22 on SCAFFOLD42, ApoOr23-24 on SCAFFOLD6001, ApOr40-41 on SCAF-FOLD150003; see supplementary table S1b, Supplementary Material online) and most of the genes in these two main clades have apparently undergone relatively recent gene duplications. The 64-Or expansion actually consists of two main clades: ApOr17-42 and ApOr43-79, the latter being the most recently expanded clade. The remaining three ApOrs (2-4) are singletons with different gene structures and no convincing relationships to the other ApOrs (although ApOr2 does consistently cluster with the ApOr5-15 subfamily), or other insect Ors.

Patterns of Positive Selection
To investigate the evolutionary forces shaping these newly duplicated paralogous genes, we compared the rates of synonymous (d S ) and nonsynonymous (d N ) nucleotide substitutions on each branch of Or and Gr lineage-specific clades using the branch-specific model (Yang 1998;Yang and Nielsen 1998). To make our results conservative, we performed all PAML analyses on clades showing a moderate sequence divergence (0.5 , d S , 1) because the LRT is reliable and powerful in detecting positive selection with computational simulations at this sequence divergence level and saturation may only be a problem at higher divergence levels (Yang 1998;Anisimova et al. 2001Anisimova et al. , 2002. Following this criterion, clade B in the Gr gene family (mean d S 5 1.46) and Or clade B (mean d S 5 1.54) were excluded from our analysis (although we do not exclude the possibility that some of the genes in these two clades are under the positive selection) and PAML analyses were only performed on the three remaining aphid-specific clades called Gr clade A, Or clade A, and Or clade B (tables 1 and 2 and figs. 1 and 2).
As shown in table 1, when the free-ratio model and one-ratio model were compared, the LRT was significant in all three clades examined, suggesting that selective pressure varies among branches in each targeted clade and that some of these proteins might have evolved under positive selection. However, this observation does not rule out the alternative possibility of a relaxation of selective constraint on some branches. To further test for positive selection, we evaluated the selective pressures on homologous sites among sequences in each clade using the site-specific models, a random distribution of amino acid substitutions being expected under a relaxation of selective pressure, whereas overrepresented amino acid changes in functional domains are expected under positive selection. Estimates of the parameter values of the x distribution under M8 (table 2) indicate that a fraction of sites are under positive selection in each of the clades tested. Even after correction for multiple testing, LRTs (see Material and Methods) of M8 and M8a were significant and confirmed that the variation in selection pressure was due to the evolution of a subset of sites by positive Darwinian selection (table 2). Essentially similar results were obtained with the even more conservative M1a-M2a comparison, although the marginal significance for ''Or clade B'' under M8 was not detected under the M2a model (table 2).
We further examined the distribution of the inferred positively selected sites under the M8 model in the two clades showing some significant results for both tests (Gr clade A and Or clade A). We identified, at the PP . 50% level, 18 sites as potential targets of positive selection in Gr clade A and 24 sites in Or clade A (and, respectively, 12 and 11 overlapping sites between the M8 and M2a models) (table 2). For both Ors and Grs, TMHMM algorithms predicted an intracellular orientation of the NH 2 -tails and an extracellular orientation of the COOH tails, supporting the ''reverse'' membrane topology of insect chemoreceptors (Sato et al. 2008;Smart et al. 2008;Wicher et al. 2008). When all the putatively positively selected sites under M8 were plotted onto the chemoreceptor topology, their distribution was clearly heterogeneous between the different protein regions ( figs. 3 and 4). The proportion of positively selected sites in extracellular regions (ER) was 72% for Gr clade A, significantly greater than expected under a homogeneous distribution model (v 2 test, P 5 6.3E À5 ), as the ER only constitutes about 25% of the entire protein.
By contrast, the percentage of positively selected sites in trans-membrane regions (TM) was 75%, significantly overrepresented in Or clade A (P 5 4.9E À4 ). Therefore, we find that the distribution of candidate sites among regions differed between the receptor types (G 4 5 26.16, P 5 2.9E À5 , fig. 4). If only the overlapping sites were considered, essentially similar results were obtained (table 2). In addition, this pattern is confirmed when we restrict our analysis to sites with high PP (Gr clade A, at PP . 95%, P 5 0.04; Or clade A, at PP . 90%; P 5 0.03). We caution that the likelihood method is known to falsely detect positively selected sites under certain conditions (Suzuki and Nei 2002;Zhang 2004), so the positively selected sites reported here should be confirmed by other statistical methods and/ or experiments.

Discussion
Our study addressing the molecular characterization and evolution of the chemoreceptor superfamily in the pea aphid genome provides the first molecular insights into the mechanisms of chemoreception in a hemipteran species.
Like in some other insect lineages (Robertson and Wanner 2006;Robertson and Kent 2009), no carbon dioxide receptors were identified. As far as we are aware, it is still unknown whether aphids can detect carbon dioxide (Peltonen et al. 2006;Guerenstein and Hildebrand 2008), but if they can, like honeybees they must be using other receptors for this purpose. Moreover, we show that the pea aphid has no simple ortholog of the DmGr43a gene lineage, whereas it is present in other insects as a single reasonably well-conserved ortholog in Diptera and Apis or as a duplication or an expansion of the lineage in B. mori (Wanner and Robertson 2008) and T. castaneum (Consortium T.G.S. 2008). This result might indicate that this gene was either lost at some point in the pea aphid lineage or conserved only in endopterygote insects.
Several members of the sugar receptor subfamily were identified in the pea aphid genome, which allows us to predict that aphids can detect sugars, as one might expect but which does not appear to have been demonstrated experimentally. Separate detailed analysis of these candidate sugar receptor genes and proteins compared with those from other insects examined by Kent and Robertson (2009) reveals that they represent three lineages, two of which are ancient within insects and are hypothesized to form a functional heterodimer. Independent expansion of one of these lineages in both B. mori and the pea aphid suggests that unlike A. mellifera and Nasonia vitripennis, which have single representatives of each lineage, aphids might be able to differentiate various sugars. The third lineage is a highly divergent protein of unclear role in sugar perception.
In addition to five highly divergent singletons (ApGr11-15), as expected from previous experience with other insects from different orders (B. mori, T. castaneum, and A. mellifera), there are three aphid-specific expansions of 4, 21, and 41 Grs. These expansions reveal typical features of recently expanded gene subfamilies, including commonly being in tandem duplications on single scaffolds. They also contain the few pseudogenes recognized in the Gr family. Although the functions of the four-gene subfamily (ApGr7-10) and the five highly divergent singletons (ApGr11-15) are quite obscure, we propose that the other two Gr subfamily expansions of 21 and 41 genes might represent bitter taste receptors involved in detecting the many plant defensive compounds aphids must utilize in finding suitable host plants (e.g., Pickett and Glinwood 2007;Webster et al. 2008). These candidate bitter receptors show no close relationship to the only fly receptor whose bitter ligand has been identified, the DmGr66a receptor for caffeine (Moon et al. 2006), but this is not surprising as this receptor is only conserved in Diptera, with no simple relatives in other insects. The same is true for the other Grs in flies that have been implicated in perception of bitter compounds (Thorne et al. 2004;Wang et al. 2004).
As expected, the pea aphid genome encodes a single conserved ortholog of DmOr83b, the unusual member of the odorant receptor family that plays a distinct role as a heterodimeric partner for the specific Ors. In addition to three highly divergent singletons, the remainder of the pea aphid odorant receptor family consists of 2 aphid-specific subfamilies of 11 and 64 genes. We propose that these Ors are involved primarily in detection of host plant volatiles involved in host selection. However, a new family of chemoreceptor genes (ionotropic receptors, IRs), which has recently been discovered in D. melanogaster (Benton et al. 2009), may also play a role in the detection of odorants and tastants. As yet, nothing is known about these receptors in aphids.
Interestingly, the presence of aphid-specific expansions in the Or and Gr repertoires seems to be correlated with some signature of positive selection, with the clades experiencing the most recent duplications showing a high rate of nonsynonymous substitutions (figs. 1 and 2; table 1). Although previous studies on selection in insect Ors/Grs focused on Drosophila species orthologous comparisons (e.g., McBride 2007; Gardiner et al. 2008), our study for the first time addresses selection within sets of paralogous Or and Gr expansions in an insect species.
A relaxation of purifying selection could also explain the observed pattern (e.g., McBride 2007): If, for example, a protein under strong purifying selection is no longer in use in a new environment, the mutations that cause amino acid changes or premature stop codons in this superfluous receptor would no longer be deleterious and could become more frequent, leading to an overall x closer to 1. However, one characteristic of the substitution rate that might help to rule out the possibility that a complete relaxation of purifying selection underlies x close to 1 or slightly significantly greater than 1 is its spatial distribution along proteins. Although identifying the protein sites that could be the target of selection, we showed that these sites are not evenly distributed along the proteins ( fig. 3), whereas we would expect new mutations to be fixed at random positions and the spatial distribution of amino acid substitutions not to mirror specific regions of the receptors if purifying selection was completely relaxed.
The amino acid residues we identified to be subject to positive selection may determine binding specificity and provide useful information on the diversity of odorants and tastants that aphids encounter when they explore new habitats and environments. An interesting observation from our analysis is that Grs and Ors have different sets of positively selected sites, suggesting the possibility that these two gene families might have different binding pockets and bind structurally distinct classes of ligand. Recently, Gardiner et al. (2009) analyzed the distribution of amino acid sites in Grs and Ors that show evidence for divergence under either positive selection or relaxed purifying constraints, in the genomes of 12 Drosophila species, and they also found significant differences between these two 2082 Smadja et al. receptor types (Gardiner et al. 2009). The two studies might not be directly comparable because the Drosophila study focused on orthologous comparisons and overall the distribution pattern of positively selected sites was quite different. Nevertheless, our result here is broadly consistent with their finding and suggests that insect Ors and Grs might have distinct molecular properties and mechanisms of ligand recognition and/or signal transduction.
It is also interesting to note that the majority of positively selected sites in vertebrate bitter taste receptors (T2Rs) are located in ERs that are thought to be involved in ligand binding (Shi et al. 2003;Soranzo et al. 2005;Shi and Zhang 2006), whereas positive selection has also been detected in trans-membrane regions, which are believed to form the binding pocket in vertebrate Ors (Emes et al. 2004). Therefore, our analysis might reflect the convergent evolution of insect and vertebrate chemoreceptors in binding of environmental chemicals. Functional studies on these receptors might benefit from focusing on the regions we identify as undergoing positive selection, and eventually a convergence of functional and evolutionary studies will hopefully contribute to an improved understanding of how this novel superfamily of chemoreceptors works.
The fact that aphid Ors and Grs in the most recent gene expansions underwent positive selection might indicate that gene duplication was important for aphids to acquire new chemosensory functions and may have been driven by a change of ligand (Nei et al. 2008). For example, in Drosophila species, lineage-specific gene duplication seems to have led to additional specialization in response to specific ecological conditions (Guo and Kim 2007;McBride 2007;McBride et al. 2007). Aphids are known to rely heavily on their senses of smell and taste to recognize stimuli in their environment, such as resources, natural enemies, and mates (Pickett et al. 1992). In particular, it has been shown that host plant acceptance, a key component of host plant specialization, relies mainly on chemosensory processes in the pea aphid (Caillaud and Via 2000) and probably in related species as well (Park et al. 2000;Hardie 2003, 2004). In this context, the acquisition of a novel host may drive the adaptive divergence of sensory systems by positive selection, and the abandonment of an ancestral host may result in the deterioration of older sensory adaptations by genetic drift or positive selection. Therefore, the observation of signatures of positive selection in Or and Gr genes might be related to the striking ecological specialization observed in A. pisum (Via 1999;Ferrari et al. 2006Ferrari et al. , 2008, the chemosensory genes being subject to divergent evolutionary pressures when aphids enter new niches during host shifts or host specialization events. Future comparative studies among several insect species, and population analyses using polymorphism data within aphid species (like in D. melanogaster, Aguade 2009), will help to further explore the role of chemoreceptor genes in adaptive ecological specialization.

Conclusion
Emerging information on the molecular basis of aphid host plant recognition influences our understanding of the mechanisms underlying speciation and biodiversity in aphids (Smadja et al. 2008;Via and West 2008) and the development of new pest control strategies (Pickett et al. 1997). Therefore, the identification of genes responsible for olfaction and gustation in the pea aphid will have significant positive impacts on fields as diverse as evolutionary biology, comparative insect genetics and agriculture. First, it offers new opportunities for comparative analyses across several insect orders. Indeed, although these chemoreceptors are being characterized in a growing number of insect species, the pea aphid chemoreceptors currently represent an outgroup to the other insect chemoreceptors. In this context, metanalyses of paralogous comparisons across several insect species is a promising route to identify Or and Gr candidate ligand-binding domains. Moreover, gaining insights into the molecular basis of olfaction and gustation in the pea aphid is a key step toward a better understanding of the biology of an insect relying mainly on chemical cues to locate and assess food, habitat, and conspecifics. Another application of our findings might be the definition of new pest control strategies based on the manipulation of the attraction of aphids to specific agricultural cultivars, this being potentially applied to pea aphids as well as other related species. Finally, the identification of genes involved in chemoreception and showing patterns of evolution under selection provides interesting candidate genes on which further investigation of the genetic basis of host plant specialization and speciation in aphids can be developed.

Supplementary Material
We provide supplementary tables S1 and S2 are available at Molecular Biology and Evolution online (http:// www.mbe.oxfordjournals.org/).