Skip to content

Thorpe (2016) Intergenic sites in bacteria are selectively constrained

Louis Maddox edited this page Aug 20, 2016 · 1 revision

"The large majority of intergenic sites in bacteria are selectively constrained, even when known regulatory elements are excluded"

  • biorXiv, by Harry A Thorpe, Sion C Bayliss, Laurence D Hurst, Edward J Feil
  • doi:10.1101/069708

There are currently no broad estimates of the overall strength and direction of selection operating on intergenic variation in bacteria. Here we address this using large whole genome sequence datasets representing six diverse bacterial species; Escherichia coli, Staphylococcus aureus, Salmonella enterica, Streptococcus pneumoniae, Klebsiella pneumoniae, and Mycobacterium tuberculosis. Excluding M. tuberculosis, we find that a high proportion (62%-79%; mean 70%) of intergenic sites are selectively constrained, relative to synonymous sites. Non-coding RNAs tend to be under stronger selective constraint than promoters, which in turn are typically more constrained than rho-independent terminators. Even when these regulatory elements are excluded, the mean proportion of constrained intergenic sites only falls to 69%; thus our current understanding of the functionality of intergenic regions (IGRs) in bacteria is severely limited. Consistent with a role for positive as well as negative selection on intergenic sites, we present evidence for strong positive selection in Mycobacterium tuberculosis promoters, underlining the key role of regulatory changes as an adaptive mechanism in this highly monomorphic pathogen.

  • WGS from large bacterial isolates ⇒ power to dissect evol. Processes

 tests for selection are routinely carried out on the ~85-­90% of bacterial genomes corresponding to protein ­coding genes, the strength and direction of selection operating on intergenic regions (IGRs) tends to be overlooked

N Molina, E van Nimwegen - Genome research, 2008

Abstract To investigate the dependence of the number of regulatory sites per intergenic region on genome size, we developed a new method for detecting purifying selection atnoncoding positions in clades of related bacterial genomes.

Toward a synthesis of genotypic typing and phenotypic inference in the genomics era. Feil EJ

Future Microbiol., 2015

A gene-by-gene approach to bacterial population genomics: Whole genome MLST of Campylobacter.

Sheppard SK, Jolley KA, Maiden MC

Genes , 2012

BIGSdb: Scalable analysis of bacterial genome variation at the population level.

Jolley KA, Maiden MC

BMC Bioinformatics, 2010

The standard approach to measuring selection, the ratio of non­synonymous to synonymous changes (dN/dS), is clearly not valid for IGRs, and the perceived lack of established methodology can in part explain the rather casual dismissal of the adaptive relevance of intergenic variation

  • many recent examplesdemonstrating the phenotypic impact of mutations in riboswitches, small RNAs, promoters,terminators, and regulator binding sites.

See: Waters and Storz (2009) Regulatory RNAs in Bacteria

  • SNPs and INDELs significant and validated

    • KO confirmed role of reg. RNAs in virulence as well as naturally occuring mutation > These well characterised regulatory elements are clearly expected to be under strong selective constraint, but to what extent are these examples typical with respect to IGRs in general? If such functional sites are rare a presumption that the great bulk of IGR sequence is neutrally evolving  and functionless is sustainable - ...typical features of genome reduction such as high genomic AT content, gene loss, and pseudogenisation… reflect inefficient selection (or equivalently a high rate of genetic drift) resulting from intracellular lifestyles and small effective population sizes
      • Neutral intergenic sites in endosymbiont genomes are therefore expected to be rapidly degraded and deleted

      • retention of IGR sequences in this species suggests they’re functional

      • no broad quantitative estimates of the commonality of selective constraint operating on IGRs “Here we use two independent approaches to address these questions, one based on the established logic of site frequency spectra (the Proportion of Singleton Mutations; PSM), and the other a modification of dN/dS (dI/dS; where dI is the number of intergenic SNPs per intergenic site)”

      • according to nearly neutral theory, deleterious mutations can persist for a while (determined by the selectioncoefficient and the effective population size)

      • Weak effects are lost (unless small population) so difference can be a measure of evolutionary distance except:

      • Unlikely dS is a perfect neutral benchmark

      • Synonymous mutations are under translational selection

      • Confirm results of Muto and Osawa (“showed that fourfold degenerate sites exhibit the widest range of GC content, whereas non­degenerate second codon positions exhibit the narrowest range; this can be explained byconstraint on non­synonymous mutations at second codon positions”)

      • ...p.9

{TODO:finish}