TY - JOUR
T1 - Sequence signatures of nucleosome positioning in caenorhabditis elegans
AU - Chen, Kaifu
AU - Wang, Lei
AU - Yang, Meng
AU - Liu, Jiucheng
AU - Xin, Chengqi
AU - Hu, Songnian
AU - Yu, Jun
N1 - Funding Information:
This work was supported by the National Basic Research Program (973 Program) from the Ministry of Science and Technology of the People’s Republic of China ( 2006CB910404 to JY). We would like to acknowledge Guangyu Zhang and Yuxin Yin for helpful discussions in improving the manuscript.
PY - 2010/6
Y1 - 2010/6
N2 - Our recent investigation in the protist Trichomonas vaginalis suggested a DNA sequence periodicity with a unit length of 120.9 nt, which represents a sequence signature for nucleosome positioning. We now extended our observation in higher eukaryotes and identified a similar periodicity of 175 nt in length in Caenorhabditis elegans. In the process of defining the sequence compositional characteristics, we found that the 10.5-nt periodicity, the sequence signature of DNA double helix, may not be sufficient for cross-nucleosome positioning but provides essential guiding rails to facilitate positioning. We further dissected nucleosome-protected sequences and identified a strong positive purine (AG) gradient from the 5'-end to the 3'-end, and also learnt that the nucleosome-enriched regions are GC-rich as compared to the nucleosome-free sequences as purine content is positively correlated with GC content. Sequence characterization allowed us to develop a hidden Markov model (HMM) algorithm for decoding nucleosome positioning computationally, and based on a set of training data from the fifth chromosome of C. elegans, our algorithm predicted 60%-70% of the well-positioned nucleosomes, which is 15%-20% higher than random positioning. We concluded that nucleosomes are not randomly positioned on DNA sequences and yet bind to different genome regions with variable stability, well-positioned nucleosomes leave sequence signatures on DNA, and statistical positioning of nucleosomes across genome can be decoded computationally based on these sequence signatures.
AB - Our recent investigation in the protist Trichomonas vaginalis suggested a DNA sequence periodicity with a unit length of 120.9 nt, which represents a sequence signature for nucleosome positioning. We now extended our observation in higher eukaryotes and identified a similar periodicity of 175 nt in length in Caenorhabditis elegans. In the process of defining the sequence compositional characteristics, we found that the 10.5-nt periodicity, the sequence signature of DNA double helix, may not be sufficient for cross-nucleosome positioning but provides essential guiding rails to facilitate positioning. We further dissected nucleosome-protected sequences and identified a strong positive purine (AG) gradient from the 5'-end to the 3'-end, and also learnt that the nucleosome-enriched regions are GC-rich as compared to the nucleosome-free sequences as purine content is positively correlated with GC content. Sequence characterization allowed us to develop a hidden Markov model (HMM) algorithm for decoding nucleosome positioning computationally, and based on a set of training data from the fifth chromosome of C. elegans, our algorithm predicted 60%-70% of the well-positioned nucleosomes, which is 15%-20% higher than random positioning. We concluded that nucleosomes are not randomly positioned on DNA sequences and yet bind to different genome regions with variable stability, well-positioned nucleosomes leave sequence signatures on DNA, and statistical positioning of nucleosomes across genome can be decoded computationally based on these sequence signatures.
KW - HMM
KW - Nucleosome positioning
KW - Periodicity
KW - Sequence signature
UR - http://www.scopus.com/inward/record.url?scp=77955335647&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=77955335647&partnerID=8YFLogxK
U2 - 10.1016/S1672-0229(10)60010-1
DO - 10.1016/S1672-0229(10)60010-1
M3 - Article
C2 - 20691394
AN - SCOPUS:77955335647
VL - 8
SP - 92
EP - 102
JO - Genomics, Proteomics and Bioinformatics
JF - Genomics, Proteomics and Bioinformatics
SN - 1672-0229
IS - 2
ER -