This study introduces a new approach to
recognize the boundaries between the parts of the DNA sequence retained after splicing and the
parts of the DNA that are spliced out.
The basic idea is to derive a new dataset from the original data to
enhance the accuracy of the well-known classification algorithms. The most
accurate results are obtained by using a derived dataset that consists from the
highest correlated features and the interesting statistical properties of the
DNA sequences. On the other hand, using adaptive
network based fuzzy inference system (ANFIS) with the derived dataset
outperforms well- known classification Algorithms. The classification
rate that is achieved by using the new approach is 95.23%, while the classification
rates 92.12%, 86.75%, 83.13% and 84.51% are obtained by Levenberg-Marquardt,
generalized regression neural networks,
radial basis functions and learning
vector quantization, respectively. Moreover, this approach can be used to represent the DNA splice sites problem
in form if-then rules and hence provides an understanding about the properties
of this problem.