I am trying to understand the role of sequence in determining the three-dimensional structure of the DNA double helix. For a basic introduction to nucleic acids, click here. The DNA base sequence is now recognized as playing a structural role in modulating the biological activity of genes, and understanding the correlation between sequence and structure helps us to understand the role of structure in such processes as protein binding and drug intercalation. The focus of my summer project has been to construct a parameterization of the experimental structures found in the Nucleic Acid Database using Euler Symmetric Parameters, and to find a correlation between the parameters and sequence. I started with a set of embedded base pair coordinate frames and, through a series of calculations, extracted a set of four parameters for each base pair step. With these parameters I attempted to extract a correlation between parameter-values and step using different methods of regression analysis and model-fitting.
One of the things I spent the most time on while doing this project was mistakes due to errors in the conventions I was using. Therefore I will establish now the conventions that gave me the right sets of numbers to work with (of course this is arbitrary, and calculations in one coordinate frame should give the same overall information as calculations in another--but in checking these calculations for errors it was necessary for me to choose one particular convention).