GENOME REPLICATION AND TRANSCRIPTION
The putative origin of replication was
positioned in the intergenic region between the rpmH and dnaA
genes, as determined by GC skew analysis (Fig. A1), and supported
by the presence of six dnaA boxes, AT-rich repeat and
genes typically associated with the origin (e.g., gyrB, dnaA,
dnaN, rpmH) (Fig. A2). The location of the putative
terminus of replication at around 1.70 Mbp is also supported by
strong GC skew inflection at this point.
Operon predictions
were accomplished using the PathoLogic system. PathoLogic uses
the following criteria to call a gene group a putative operon: a)
transcriptional orientation (sets of genes transcribed in one
direction are called directons); b) the distances between the
genes within a directon; and c) the genes functional
classes and pathway assignments. Altogether there are 482
predicted operons containing up to 12 genes.
Coding strand
preference was calculated as the difference between the number of
coding bases in the two strands using a sliding window (Fig.A3).
The genomic GC-skew
(Fig.A1) was used to confirm the location of the chromosomal
origin of replication. In some other bacteria, it has been shown
that deviations from a smooth curve in GC-skew profiles
correspond to regions with major genomic rearrangements. The Legionella
profile is similar to that of E. coli (Roten et al, Nucl.Acid
Res. 2002, 30(1):142-144;
http://www2.unil.ch/igbm/genomics/Genomic_landscape.html) with
sites of origin and termination of replication positioned
unambiguously, and does not display any evidence of genomic
rearrangements (compare for examples with described cases of
major genomic translocations detected by GC skew for H.pylori
(Grigoriev, TIG 2000 16(9):376-378)). If such events did in fact
occur in the form of major translocations or horizontal transfer
during the genomes evolution, these exchanged or inserted
regions either had similar GC content to the genome as a whole,
or these events took place early in the species history
allowing the rearranged regions enough time to conform to the
surrounding sequence attributes.
Fig. A1. Cumulative GC skew
in a sliding window of 3 Kbp. For illustrative purposes position
0 was shifted away from the oriC. Minimum of the function
corresponds to the origin and maximum to the terminus of
replication.

Fig. S3. Genes
and motifs in the genomic origin of replication region.

Fig.A3. Coding strand
preference in Legionella genome calculated in a sliding
window of 100 Kbp. Green - leading strand, gray - lagging strand.
Position 0 corresponds to oriC, terminus located
approximately at 1.7 Mbp, at the position of the coding
preference function inflection.