Time limit
2s
Memory limit
256 MB
Write a program that works like the find feature in a word processor: it must locate every occurrence of a pattern inside a longer string.
Two strings T and P are given. You must find how many times P appears in T, and the starting position of each occurrence. This is called string matching. P is the pattern, and T is the text.
Let the length of T be n and the length of P be m. Positions are counted from 1. The i-th character of T is written as T[i], and the i-th character of P is written as P[i]. The pattern appears at position i of the text if the characters from T[i] through T[i+m-1] are all equal, in order, to P[1] through P[m].
In the following diagram, matching starts at position 1 of T, but fails at the seventh comparison because the two characters differ.
1 2 3 4 5 6 7 8 9 …T : [ A B C D A B C D A B D E ] | | | | | | XP : [ A B C D A B D ] 1 2 3 4 5 6 7In the next diagram, the substring beginning at position 5 of T matches the pattern completely.
1 2 3 4 5 6 7 8 9 …T : [ A B C D A B C D A B D E ] | | | | | | |P : [ A B C D A B D ] 1 2 3 4 5 6 7A simple algorithm checks every possible starting position. At each position it may compare up to m characters, and there are at most n-m+1 starting positions. Its time complexity is therefore O((n-m+1) × m) = O(nm).
A faster algorithm uses the information revealed by a failed comparison. In the diagram above, if the match fails because T[7] != P[7], we still know that T[1] through T[6] matched P[1] through P[6]. By using the overlap between prefixes and suffixes inside the pattern, we can move to the next possible starting position without comparing characters that are already known to match.
More generally, suppose a match starting at position i of T fails when T[i+j-1] != P[j]. If we already know the largest k such that P[1…k] = P[j-k…j-1], then the next comparison can continue with T[i+j-1] and P[k+1]. Precomputing these values for the pattern allows all matches to be found in linear time.
Using this idea, write a program that prints every position where P appears in T.
The first line contains the text string T.
The second line contains the pattern string P.
The lengths n and m of the two strings are each between 1 and 1,000,000, inclusive. The strings consist only of uppercase letters, lowercase letters, and spaces.
On the first line, print the number of occurrences of P in T.
On the second line, print the starting positions of all occurrences in increasing order, separated by spaces. Positions are 1-based. If the substring from the i-th character through the i+m-1-th character of T is equal to P, print i.
1 2 3 4 5 6 7 8 9 …T : [ A B C D A B C D A B D E ] | | | | | | XP : [ A B C D A B D ] 1 2 3 4 5 6 7 1 2 3 4 5 6 7 8 9 …T : [ A B C D A B C D A B D E ] | | | | | | |P : [ A B C D A B D ] 1 2 3 4 5 6 7