Knuth–Morris–Pratt

The Knuth–Morris–Pratt (or KMP algorithmis a substring matching algorithm. It is used for searching occurrences of a pattern P within a string S by employing the observation that when a mismatch occurs, the word itself embodies sufficient information to determine where the next match could begin, thus bypassing re-examination of previously matched characters. 

Time complexity of this algorithm is O(k + n), where k is the length of P and n is the length of S; therefore, the algorithm is linear.

In a naive substring matching algorithm, we iterate through S[i] matching each character with P[0], if a match is observed, we proceed matching consecutive characters, comparing S[i + j] P[j]. Upon a mismatch, we resume comparison with P[0] starting at S[i + 1]. This has the complexity of O(k * n), where k is the length of P and n is the length of S. KMP is more efficient because it does some pre-processing on the pattern P, and use that information to prevent from starting back at S[i + 1] upon a mismatch.
The pre-processing of pattern P is to give us a way to tell, at any index of the pattern, P[j], when looking at the substring, from P[0] to P[j], if there is a suffix that is also a prefix. So for example if our pattern is "APXBAPWX", at P[5] we should instantly be able to recognize that in the substring of P[0] to P[5], which is "APXBAP", there is a suffix, "AP", which is also a prefix. KMP achieves this capability by computing a temporary array T as follows, where at every index, z, of T, there is an integer value that if non zero, it tells you there is a suffix that is also a prefix, and the value is the size of the said prefix/suffix.

Temporary array T is computed as follows:
int *getTempArray(char *patternint size)
{
    if (pattern == nullptr || size < 1)
        return nullptr;
 
    int *T = new int[size];
    T[0] = 0;
    int j = 0, i = 1;
 
    while (i < size)
    {
        if (pattern[i] == pattern[j])
        {
            T[i] = j + 1;
            j++; i++;
        }
        else
        {
            if (j != 0)
                j = T[j - 1];
            else
            {
                T[i] = 0;
                i++;
            }
        }
    }
 
    return T;
}
So having the temporary array T already computed, when searching for a substring and there is a mismatch in the middle of a partial match, S[i + j] and P[j], KMP simply looks at T[j - 1], value m, which indicates that there is no need to restart the matching from S[i + 1] and P[0] like the naive algorithm. It can just continue matching at S[i + j + 1] and P[m]. This is because if there is a suffix that is also a prefix in the substring of P that we already looked at so far, we know in S, the suffix was the very last values we had (because we had a partial match so far), so if these last few values are also a prefix, there is no need to start over. We just continue matching at the location we were on S and continue matching at the end of a prefix on P (for which T provided us with the exact index). So KMP algorithm is implemented as follows.
int KMPSearch(char *Sint sizeSchar *Pint sizeP)
{
    int ret = -1;
 
    if (S == nullptr || sizeS < 1 ||
        P == nullptr || sizeP < 1)
        return ret;
 
    int *T = getTempArray(PsizeP);
 
    int i = 0, j = 0;
    while (i < sizeS && j < sizeP)
    {
        if (S[i] == P[j])
        {
            ret = i;
            while (i < sizeS && j < sizeP && S[i] == P[j])
            {
                i++;
                j++;
            }
 
            if (j == sizeP)
                break;
            else
            {
                if (j != 0)
                    j = T[j - 1];
                ret = -1;
            }
        }
        else
            i++;
    }
 
    return ret;
}

Comments