Solution: Minimum Window Subsequence
Let's solve the Minimum Window Subsequence problem using the Sliding Window pattern.
Statement#
Given two strings, str1 and str2, find the shortest substring in str1 such that str2 is a subsequence of that substring.
A substring is defined as a contiguous sequence of characters within a string. A subsequence is a sequence that can be derived from another sequence by deleting zero or more elements without changing the order of the remaining elements.
Let’s say you have the following two strings:
str1 = “”
str2 = “”
In this example, “” is a substring of str1, from which we can derive str2 simply by deleting both the instances of the character . Therefore, str2 is a subsequence of this substring. Since this substring is the shortest among all the substrings in which str2 is present as a subsequence, the function should return this substring, that is, “”.
If there is no substring in
str1that covers all characters instr2, return an empty string.
If there are multiple minimum-length substrings that meet the subsequence requirement, return the one with the left-most starting index.
Constraints:
-
str1.length -
str2.length str1andstr2consist of uppercase and lowercase English letters.
Solution#
So far, you’ve probably brainstormed some approaches and have an idea of how to solve this problem. Let’s explore some of these approaches and figure out which one to follow based on considerations such as time complexity and any implementation constraints.
Naive approach#
The naive approach would be to generate all possible substrings of str1 and then check which substrings contain str2 as a subsequence. Out of all the substrings in str1 that contain str2 as a subsequence, we’ll choose the one with the shortest length. Now, let’s look at the cost of this solution. We need two nested loops to get all possible substrings and another loop to check whether each substring contains all the required characters. This brings the time complexity to . Since we’re not using any extra space, the space complexity is .
Optimized approach using sliding window#
To eliminate the extra traversals of the substrings, we use the sliding window approach. With this approach, we only consider the substrings that we are sure contain all the characters of str2 in the same order. This problem can be conveniently solved using the sliding window pattern. The idea is to keep track of whether the subsequence has been found or not and to select the shortest subsequence from str1.
Note: In the following section, we will gradually build the solution. Alternatively, you can skip straight to just the code.
Step-by-step construction#
The first step of the solution is to initialize the variables. We begin by creating two variables, size_str1 and size_str2, to store the lengths of str1 and str2, respectively. We then initialize min_sub_len to infinity, which will be used to store the length of the minimum subsequence.
To help us traverse the two strings, we create two indexes, index_s1 and index_s2, which initially point to the first characters of str1 and str2, respectively. These indexes will be incremented as we scan through the strings to find the subsequence.
Finally, we initialize min_subsequence to an empty string. This variable will store the output, which is the smallest possible subsequence.
Once we’ve initialized the variables, we start looping over str1 using index_s1. In each iteration, if the current character of str1, pointed at by index_s1, and the current character of str2, pointed at by index_s2, are the same, we increment both the indexes. Otherwise, we only increment index_s1. Using this logic, index_s2 will reach the end of str2 only if each character of str2 is found in str1. At this point, we have found a potential minimum window subsequence, i.e., a substring that contains str2 as a subsequence. So, we set min_sub_len to the length of this substring and min_subsequence to this substring.
Now, we need to check the rest of str1 for a shorter substring that meets our requirement. So, we resume our search in str1 from index_s1 . Whenever we find a substring that meets our requirement, we compare its length with min_sub_len and if it is shorter, we update min_sub_len with the length of this substring and min_subsequence with this substring. Finally, when we have traversed the entire str1, we return the minimum window subsequence.
Let’s take a look at the code of this solution:
By this stage, we’ve found a potential solution to the problem but if we uncomment the test case provided in lines 49 - 51 in the code widget above, we’ll notice that our solution does not return the correct minimum window subsequence. The issue with our solution is that it skips over any potential minimum window subsequences that start before the last character of the substring that we just found. Let’s understand this with the help of the example:
str1 = “abcdedeaqdweq”
str2 = “adeq”
Here, the minimum window subsequence is “aqdweq” that starts at index and ends at index . The first potential minimum window subsequence that our code identifies is “abcdedeaq”, ending at . If we resume the search from the end of this substring, i.e., from index in str1, we won’t be able to locate the shortest substring that satisfies our condition, since we skipped over index , which is the starting index of the actual minimum window subsequence.
To fix this, we simply change the index from which we resume our search and continue the iteration from the second character of the current substring, instead of index_s1 .
Let’s look at the code of this solution after this correction, seen in line 35:
At this point, our solution is correct and complete, since we can see that the example where str1 = “abcdedeaqdweq” and str2 = “adeq”, provided as the first test case in the coding widget above, give the correct output.
However, if we uncomment the test case given in lines 48 - 50, we’ll notice that our solution times out, which means that our solution is inefficient. Let’s first understand our test case. The first string, str1, consists of occurrences of the letter “f”, occurrences of the letter “s”, occurrences of the letter “e”, followed by one occurrence of the letter “a”. The second string, str2 is “fffessa”.
Our solution identifies all the potential minimum window subsequences starting from every instance of the character in str1. The upper bound on the number of such subsequences in this case is . This is obviously wasteful, since we can see from inspection that we can discard all but the last three occurrences of the letter “f”. Let’s see how we can improve our solution based on this insight.
After finding a potential subsequence, let’s try to shrink it by moving backward in str1, comparing it one character at a time with str2. We initialize two more pointer variables, start and end, to index_s1 and point index_s2 to the last character of str2. Then, we iterate backward over the potential minimum window subsequence using the start pointer. When we have discovered the entire str2 in the potential minimum window subsequence in the reverse order, we update the min_sub_len and min_subsequence variables accordingly.
At this point, we have found the shortest window subsequence that meets our requirement up to the index end in str1. So, now, we need to check the rest of the str1 string. We simply resume iterating over str1 in the forward direction, starting, as before, from start.
Let’s visualize the solution:
1 of 28
2 of 28
3 of 28
4 of 28
5 of 28
6 of 28
7 of 28
8 of 28
9 of 28
10 of 28
11 of 28
12 of 28
13 of 28
14 of 28
15 of 28
16 of 28
17 of 28
18 of 28
19 of 28
20 of 28
21 of 28
22 of 28
23 of 28
24 of 28
25 of 28
26 of 28
27 of 28
28 of 28
Just the code#
Here’s the complete solution to this problem:
Solution summary#
-
Initialize two indexes,
index_s1andindex_s2, to zero for iterating both strings. -
If the character pointed by
index_s1instr1is the same as the character pointed byindex_s2instr2, increment both pointers. Otherwise, only incrementindex_s1. -
Once
index_s2reaches the end ofstr2, initialize two new indexes (startandend). With these two indexes, we’ll slide the window backward. -
Set
startandendtoindex_s1. -
If the characters pointed to by
index_s2andstartare the same, decrement both of them. Otherwise, only decrementstart. -
Once,
str2has been discovered instr1in the backward direction, calculate the length of the substring. -
If this length is less than the current minimum length, update the
min_sub_lenvariable and themin_subsequencestring. -
Resume the search in the forward direction from
startinstr1. -
Repeat until
index_s1reaches the end ofstr1.
Time complexity#
The outer loop iterates over the string str1, so the time complexity of this loop will be , where is the length of string str1. Inside this loop, there is a while loop that is used to iterate back over the window once all the characters of str2 have been found in the current window. The time complexity of this loop will be , where is the length of string str2. Therefore, the overall time complexity of this solution is . For example, when str1 = “aaaaa” and str2 = “aa”, it takes time.
Let’s consider another example where the inner loop travels for more than iterations. For example, when str1 = “abcdefg” and str2 = “af”. When all characters of str2 have been found (using outer loop in time), the inner loop iterates back from ‘f’ till ‘a’, taking time. After that, the outer loop will start from ‘b’ and traverse all characters till the end of str1 without initiating the inner loop. This is because the outer loop will never find a subsequence containing str2. This will take time. Therefore, the total time complexity in this case will be , which is .
Therefore, the worst-case time complexity of this algorithm is .
Space complexity#
Since we are not using any extra space apart from a few variables, the space complexity is .
Minimum Window Subsequence
Minimum Window Substring