The FQS algorithm improves on the quick search (QS) algorithm, by applying the bad character rule,aided with a statistically maximal expected shift value introduced in this work and a pre-testing stagebefore full pattern matching. Unlike previous approaches that blindly tested the first and last symbolsin the pattern [20,21], our pre-testing stage is performed by computing the statistical maximal expectedshift position. We have compared FQS against three other competitive QS variants: the QS itself, FJSand the Horspool algorithm. A range of text files were searched, including randomly generated textfiles with different alphabet sizes (2 ≤ |Σ| ≤ 256), and practical benchmark text files, namely E. coli,Bible and World192, from the Canterbury Corpus. The pattern lengths were varied from 10 to 1,000with 19 varieties. We find that, statistically, FQS has the overall best performance (practical runningtime, number of symbol comparisons and number of pattern shifts) over all of the other three algorithms,mostly especially for text files with alphabet sizes less than 128. The results suggest that FQS could haveimportant applications in practice, especially for genomic data sets, such as DNA or RNA sequences withfour symbols or protein sequences with 20 symbols.
đang được dịch, vui lòng đợi..
