Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2006 Jul;16(7):875-84.
doi: 10.1101/gr.5022906. Epub 2006 Jun 2.

Ubiquitous selective constraints in the Drosophila genome revealed by a genome-wide interspecies comparison

Affiliations
Comparative Study

Ubiquitous selective constraints in the Drosophila genome revealed by a genome-wide interspecies comparison

Daniel L Halligan et al. Genome Res. 2006 Jul.

Abstract

Non-coding DNA comprises approximately 80% of the euchromatic portion of the Drosophila melanogaster genome. Non-coding sequences are known to contain functionally important elements controlling gene expression, but the proportion of sites that are selectively constrained is still largely unknown. We have compared the complete D. melanogaster and Drosophila simulans genome sequences to estimate mean selective constraint (the fraction of mutations that are eliminated by selection) in coding and non-coding DNA by standardizing to substitution rates in putatively unconstrained sequences. We show that constraint is positively correlated with intronic and intergenic sequence length and is generally remarkably strong in non-coding DNA, implying that more than half of all point mutations in the Drosophila genome are deleterious. This fraction is also likely to be an underestimate if many substitutions in non-coding DNA are adaptively driven to fixation. We also show that substitutions in long introns and intergenic sequences are clustered, such that there is an excess of substitutions <8 bp apart and a deficit farther apart. These results suggest that there are blocks of constrained nucleotides, presumably involved in gene expression control, that are concentrated in long non-coding sequences. Furthermore, we infer that there is more than three times as much functional non-coding DNA as protein-coding DNA in the Drosophila genome. Most deleterious mutations therefore occur in non-coding DNA, and these may make an important contribution to a wide variety of evolutionary processes.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Mean divergence (±95% confidence interval) versus mean length for different length categories of (A) introns, (B) 5′ intergenic sequences, and (C) 3′ intergenic sequences. Within each class of site, we divided the data into 20 categories, based on length, such that there were equal numbers of observations in each category. Divergence estimates were corrected for multiple hits using the method of Kimura (1980), and confidence intervals were obtained by bootstrapping 1000 times by observation. The dashed line for each class of site shows the linear regression of divergence on log length. The solid vertical line in A is drawn at the division between the long and short intron class (80 bp).
Figure 2.
Figure 2.
Mean divergence (±95% confidence interval as a gray box) in short introns plotted as a function of distance from the 5′- and 3′-ends of the intron. Divergence estimates for each position were corrected for multiple hits using the method of Kimura (1980). The 95% confidence interval of the mean for each position was obtained by bootstrapping 1000 times by intron.
Figure 3.
Figure 3.
Mean constraint (±95% confidence interval as a gray box) for short (dark gray) and long (light gray) introns, plotted as a function of distance from the 5′- and 3′-ends of the intron. Confidence intervals were obtained by bootstrapping 1000 times by genomic section.
Figure 4.
Figure 4.
Mean constraint (±95% confidence interval as a gray box) in intergenic sequences flanking coding sequences, plotted as a function of distance from the coding sequence boundary. Constraint in 5′ sequences is shown on the left, and constraint in 3′ sequences is shown on the right. (A) Mean constraint in the first 500 bp flanking a coding sequence plotted for two arbitrary size classes of intergenic DNA; short (≤500 bp; dark gray) and long (>500 bp; light gray). (B) Mean constraint for 5 kb of flanking sequence in all lengths of intergenic sequence. Mean constraint was calculated for 20-bp nonoverlapping blocks, and the confidence interval for each block was obtained by bootstrapping 1000 times by genomic section.
Figure 5.
Figure 5.
Observed (circles) and predicted (triangles) frequency distributions of distances between substitutions for (A) introns and (B) intergenic sequences longer than 1000 bp. The predicted frequency distribution assumes that the distances between substitutions are geometrically distributed and that the mean distance between substitutions is equal to that observed in the real data.

References

    1. Akashi H. Inferring weak selection from patterns of polymorphism and divergence at “silent” sites in Drosophila DNA. Genetics. 1995;139:1067–1076. - PMC - PubMed
    1. Akashi H. Molecular evolution between Drosophila melanogaster and D. simulans: Reduced codon bias, faster rates of amino acid substitution, and larger proteins in D. melanogaster. Genetics. 1996;144:1297–1307. - PMC - PubMed
    1. Andolfatto P. Adaptive evolution of non-coding DNA in Drosophila. Nature. 2005;437:1149–1152. - PubMed
    1. Bergman C.M., Kreitman M., Kreitman M. Analysis of conserved noncoding DNA in Drosophila reveals similar constraints in intergenic and intronic sequences. Genome Res. 2001;11:1335–1345. - PubMed
    1. Bergman C.M., Pfeiffer B.D., Rincón-Limas D.E., Hoskins R.A., Gnirke A., Mungall C.J., Wang A.M., Kronmiller B., Pacleb J., Park S., Pfeiffer B.D., Rincón-Limas D.E., Hoskins R.A., Gnirke A., Mungall C.J., Wang A.M., Kronmiller B., Pacleb J., Park S., Rincón-Limas D.E., Hoskins R.A., Gnirke A., Mungall C.J., Wang A.M., Kronmiller B., Pacleb J., Park S., Hoskins R.A., Gnirke A., Mungall C.J., Wang A.M., Kronmiller B., Pacleb J., Park S., Gnirke A., Mungall C.J., Wang A.M., Kronmiller B., Pacleb J., Park S., Mungall C.J., Wang A.M., Kronmiller B., Pacleb J., Park S., Wang A.M., Kronmiller B., Pacleb J., Park S., Kronmiller B., Pacleb J., Park S., Pacleb J., Park S., Park S., et al. Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome. Genome Biol. 2002;3:research0086.1–0086.20. - PMC - PubMed

Publication types

LinkOut - more resources