Fast Fourier Transform-Based Correlation Of Dna-Sequences Using Complex-Plane Encoding
Computer Applications In The Biosciences
The detection of similarities between DNA sequences can be accomplished using the signal-processing technique of cross-correlation. An early method used the fast Fourier transform (FFT) to perform correlations on DNA sequences in O(n log n) time for any length sequence. However, this method requires many FFTs (nine), runs no faster if one sequence is much shorter than the other, and measures only global similarity, so that significant short local matches may be missed. We report that, through the use of alternative encodings of the DNA sequence in the complex plane, the number of FFTs performed can be traded off against (i) signal-to-noise ratio, and (ii) a certain degree of filtering for local similarity via k-tuple correlation. Also, when comparing probe sequences against much longer targets, the algorithm can be sped up by decomposing the target and performing multiple small FFTs in an overlap-save arrangement. Finally, by decomposing the probe sequence as well, the detection of local similarities can be further enhanced. With current advances in extremly fast hardware implementations of signal-processing operations, this approach may prove more practical than heretofore.
Erik Allen Cheever , '82; G. C. Overton; and D. B. Searls.
"Fast Fourier Transform-Based Correlation Of Dna-Sequences Using Complex-Plane Encoding".
Computer Applications In The Biosciences.