GENOME REPLICATION AND TRANSCRIPTION

 

The putative origin of replication was positioned in the intergenic region between the rpmH and dnaA genes, as determined by GC skew analysis (Fig. A1), and supported by the presence of six dnaA boxes, AT-rich repeat and genes typically associated with the origin (e.g., gyrB, dnaA, dnaN, rpmH) (Fig. A2). The location of the putative terminus of replication at around 1.70 Mbp is also supported by strong GC skew inflection at this point.

Operon predictions were accomplished using the PathoLogic system. PathoLogic uses the following criteria to call a gene group a putative operon: a) transcriptional orientation (sets of genes transcribed in one direction are called directons); b) the distances between the genes within a directon; and c) the genes’  functional classes and pathway assignments. Altogether there are 482 predicted operons containing up to 12 genes.

Coding strand preference was calculated as the difference between the number of coding bases in the two strands using a sliding window (Fig.A3).

The genomic GC-skew (Fig.A1) was used to confirm the location of the chromosomal origin of replication. In some other bacteria, it has been shown that deviations from a smooth curve in GC-skew profiles correspond to regions with major genomic rearrangements. The Legionella profile is similar to that of E. coli (Roten et al, Nucl.Acid Res. 2002, 30(1):142-144; http://www2.unil.ch/igbm/genomics/Genomic_landscape.html) with sites of origin and termination of replication positioned unambiguously, and does not display any evidence of genomic rearrangements (compare for examples with described cases of major genomic translocations detected by GC skew for H.pylori (Grigoriev, TIG 2000 16(9):376-378)). If such events did in fact occur in the form of major translocations or horizontal transfer during the genome’s evolution, these exchanged or inserted regions either had similar GC content to the genome as a whole, or these events took place early in the species’ history allowing the rearranged regions enough time to conform to the surrounding sequence attributes.

 

 

Fig. A1. Cumulative GC skew in a sliding window of 3 Kbp. For illustrative purposes position 0 was shifted away from the oriC. Minimum of the function corresponds to the origin and maximum to the terminus of replication.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Fig. S3. Genes and motifs in the genomic origin of replication region. Orange boxes - genes, with arrows showing the transcription orientation, blue - dnaA boxes and AT-rich repeats.

 

 

 

 

 

 

 

 

 

Fig.A3. Coding strand preference in Legionella genome calculated in a sliding window of 100 Kbp. Green - leading strand, gray - lagging strand. Position 0 corresponds to oriC, terminus located approximately at 1.7 Mbp, at the position of the coding preference function inflection.