Phylogenetic footprinting is
an in vogue technique to
discover Transcription Factor Binding Sites, TFBSs, in
eukaryotes.
Orthologous sequences of
different species are compared pairwise looking for peaks of similarity
in a certain region (window). In this example we have the gene
NM_022435, Sp5, of mouse compared with orthologous sequences in human,
dog and fugu.
Comparative studies have been very
effective at identifying
conserved non-coding elements (CNEs) that might have regulatory
functions; however CNEs may regulate a broad variety of biological
functions, not necessarily confined to transcriptional regulation. For
example, CNEs may be involved in the process of DNA replication or mRNA
splicing. So one of the main reasons of high false positive rate of
present TFBS discovery software is that they are mainly exploiting only
the Kimura rule: "functionally less important molecules or parts of
molecules evolve (in term of mutant substitutions) faster than more
important ones". They take into account only the alignment of sequences
and the percentage of conservation among them while there is extra
information which could be used as the modular structure of TFBSs.
Modular binding of proteins to a
regulatory region: they typically bind in a restricted region and then
act cooperatively to induce transcription activity. Two possible
scenarios are avilable: a single transcription factor which binds
repetitively to the same kind of binding sites (simple module) or
different transcription factors which bind to different binding sites
(complex module).
The current view is that once a
regulatory region is accessible it
is bound by a combination of transcription factors. Binding of proteins
is generally cooperative: while one protein binds weakly, multiple
transcription factors involved in protein-protein interactions increase
their affinities to the regulatory region.
Cooperative activity of transcription factors: a few proteins
bind to a Cis-Regulatory Module (CRM) and then intereact with the
co-activator complex to enhance transcription. From: "Applied
bioinformatics for the identification of regulatory elements", W.
Wasserman and A. Sandelin, Nature Reviews Genetics.
Although a well defined structure for regulatory regions has not yet been described in detail we can summarize it as follow:
Moreover most of current software is based on global alignment while the modular structure of binding sites suggests local alignment to be used. Even in tools which carry out local alignment to discover regulatory regions there is no direct use of the modular structure of TFBSs.