Solution: Minimum Window Subsequence

Let's solve the Minimum Window Subsequence problem using the Sliding Window pattern.

Statement#

Given two strings, str1 and str2, find the shortest substring in str1 such that str2 is a subsequence of that substring.

A substring is defined as a contiguous sequence of characters within a string. A subsequence is a sequence that can be derived from another sequence by deleting zero or more elements without changing the order of the remaining elements.

Let’s say you have the following two strings:

str1 = “abbcbabbcb

str2 = “acac

In this example, “abbcabbc” is a substring of str1, from which we can derive str2 simply by deleting both the instances of the character bb. Therefore, str2 is a subsequence of this substring. Since this substring is the shortest among all the substrings in which str2 is present as a subsequence, the function should return this substring, that is, “abbcabbc”.

If there is no substring in str1 that covers all characters in str2, return an empty string.

If there are multiple minimum-length substrings that meet the subsequence requirement, return the one with the left-most starting index.

Constraints:

  • 11 \leq str1.length \leq 2×1032 \times 10^3
  • 11 \leq str2.length \leq 100100
  • str1 and str2 consist of uppercase and lowercase English letters.

Solution#

So far, you’ve probably brainstormed some approaches and have an idea of how to solve this problem. Let’s explore some of these approaches and figure out which one to follow based on considerations such as time complexity and any implementation constraints.

Naive approach#

The naive approach would be to generate all possible substrings of str1 and then check which substrings contain str2 as a subsequence. Out of all the substrings in str1 that contain str2 as a subsequence, we’ll choose the one with the shortest length. Now, let’s look at the cost of this solution. We need two nested loops to get all possible substrings and another loop to check whether each substring contains all the required characters. This brings the time complexity to O(n3)O(n^{3}). Since we’re not using any extra space, the space complexity is O(1)O(1).

Optimized approach using sliding window#

To eliminate the extra traversals of the substrings, we use the sliding window approach. With this approach, we only consider the substrings that we are sure contain all the characters of str2 in the same order. This problem can be conveniently solved using the sliding window pattern. The idea is to keep track of whether the subsequence has been found or not and to select the shortest subsequence from str1.

Note: In the following section, we will gradually build the solution. Alternatively, you can skip straight to just the code.

Step-by-step construction#

The first step of the solution is to initialize the variables. We begin by creating two variables, size_str1 and size_str2, to store the lengths of str1 and str2, respectively. We then initialize min_sub_len to infinity, which will be used to store the length of the minimum subsequence.

To help us traverse the two strings, we create two indexes, index_s1 and index_s2, which initially point to the first characters of str1 and str2, respectively. These indexes will be incremented as we scan through the strings to find the subsequence.

Finally, we initialize min_subsequence to an empty string. This variable will store the output, which is the smallest possible subsequence.

Initializing variables

Once we’ve initialized the variables, we start looping over str1 using index_s1. In each iteration, if the current character of str1, pointed at by index_s1, and the current character of str2, pointed at by index_s2, are the same, we increment both the indexes. Otherwise, we only increment index_s1. Using this logic, index_s2 will reach the end of str2 only if each character of str2 is found in str1. At this point, we have found a potential minimum window subsequence, i.e., a substring that contains str2 as a subsequence. So, we set min_sub_len to the length of this substring and min_subsequence to this substring.

Now, we need to check the rest of str1 for a shorter substring that meets our requirement. So, we resume our search in str1 from index_s1 +1+ 1. Whenever we find a substring that meets our requirement, we compare its length with min_sub_len and if it is shorter, we update min_sub_len with the length of this substring and min_subsequence with this substring. Finally, when we have traversed the entire str1, we return the minimum window subsequence.

Let’s take a look at the code of this solution:

First draft of the solution

By this stage, we’ve found a potential solution to the problem but if we uncomment the test case provided in lines 49 - 51 in the code widget above, we’ll notice that our solution does not return the correct minimum window subsequence. The issue with our solution is that it skips over any potential minimum window subsequences that start before the last character of the substring that we just found. Let’s understand this with the help of the example:

str1 = “abcdedeaqdweq”

str2 = “adeq”

Here, the minimum window subsequence is “aqdweq” that starts at index 77 and ends at index 1212. The first potential minimum window subsequence that our code identifies is “abcdedeaq”, ending at 88. If we resume the search from the end of this substring, i.e., from index 99 in str1, we won’t be able to locate the shortest substring that satisfies our condition, since we skipped over index 77, which is the starting index of the actual minimum window subsequence.

To fix this, we simply change the index from which we resume our search and continue the iteration from the second character of the current substring, instead of index_s1 +1+ 1.

Let’s look at the code of this solution after this correction, seen in line 35:

Corrected solution

At this point, our solution is correct and complete, since we can see that the example where str1 = “abcdedeaqdweq” and str2 = “adeq”, provided as the first test case in the coding widget above, give the correct output.

However, if we uncomment the test case given in lines 48 - 50, we’ll notice that our solution times out, which means that our solution is inefficient. Let’s first understand our test case. The first string, str1, consists of 10,00010,000 occurrences of the letter “f”, 9,0009,000 occurrences of the letter “s”, 500500 occurrences of the letter “e”, followed by one occurrence of the letter “a”. The second string, str2 is “fffessa”.

Our solution identifies all the potential minimum window subsequences starting from every instance of the character ff in str1. The upper bound on the number of such subsequences in this case is 10000×9000×50010000 \times 9000 \times 500. This is obviously wasteful, since we can see from inspection that we can discard all but the last three occurrences of the letter “f”. Let’s see how we can improve our solution based on this insight.

After finding a potential subsequence, let’s try to shrink it by moving backward in str1, comparing it one character at a time with str2. We initialize two more pointer variables, start and end, to index_s1 and point index_s2 to the last character of str2. Then, we iterate backward over the potential minimum window subsequence using the start pointer. When we have discovered the entire str2 in the potential minimum window subsequence in the reverse order, we update the min_sub_len and min_subsequence variables accordingly.

At this point, we have found the shortest window subsequence that meets our requirement up to the index end in str1. So, now, we need to check the rest of the str1 string. We simply resume iterating over str1 in the forward direction, starting, as before, from start+1+1.

Let’s visualize the solution:

canvasAnimation-image

1 of 28

canvasAnimation-image

2 of 28

canvasAnimation-image

3 of 28

canvasAnimation-image

4 of 28

canvasAnimation-image

5 of 28

canvasAnimation-image

6 of 28

canvasAnimation-image

7 of 28

canvasAnimation-image

8 of 28

canvasAnimation-image

9 of 28

canvasAnimation-image

10 of 28

canvasAnimation-image

11 of 28

canvasAnimation-image

12 of 28

canvasAnimation-image

13 of 28

canvasAnimation-image

14 of 28

canvasAnimation-image

15 of 28

canvasAnimation-image

16 of 28

canvasAnimation-image

17 of 28

canvasAnimation-image

18 of 28

canvasAnimation-image

19 of 28

canvasAnimation-image

20 of 28

canvasAnimation-image

21 of 28

canvasAnimation-image

22 of 28

canvasAnimation-image

23 of 28

canvasAnimation-image

24 of 28

canvasAnimation-image

25 of 28

canvasAnimation-image

26 of 28

canvasAnimation-image

27 of 28

canvasAnimation-image

28 of 28

Optimized solution

Just the code#

Here’s the complete solution to this problem:

Minimum Window Subsequence

Solution summary#

  1. Initialize two indexes, index_s1 and index_s2, to zero for iterating both strings.

  2. If the character pointed by index_s1 in str1 is the same as the character pointed by index_s2 in str2, increment both pointers. Otherwise, only increment index_s1.

  3. Once index_s2 reaches the end of str2, initialize two new indexes (start and end). With these two indexes, we’ll slide the window backward.

  4. Set start and end to index_s1.

  5. If the characters pointed to by index_s2 and start are the same, decrement both of them. Otherwise, only decrement start.

  6. Once, str2 has been discovered in str1 in the backward direction, calculate the length of the substring.

  7. If this length is less than the current minimum length, update the min_sub_len variable and the min_subsequence string.

  8. Resume the search in the forward direction from start +1+1 in str1.

  9. Repeat until index_s1 reaches the end of str1.

Time complexity#

The outer loop iterates over the string str1, so the time complexity of this loop will be O(n)O(n), where nn is the length of string str1. Inside this loop, there is a while loop that is used to iterate back over the window once all the characters of str2 have been found in the current window. The time complexity of this loop will be O(m)O(m), where mm is the length of string str2. Therefore, the overall time complexity of this solution is O(n×m)O(n \times m). For example, when str1 = “aaaaa” and str2 = “aa”, it takes O(n×m)O(n \times m) time.

Let’s consider another example where the inner loop travels for more than O(m)O(m) iterations. For example, when str1 = “abcdefg” and str2 = “af”. When all characters of str2 have been found (using outer loop in O(n)O(n) time), the inner loop iterates back from ‘f’ till ‘a’, taking O(n)O(n) time. After that, the outer loop will start from ‘b’ and traverse all characters till the end of str1 without initiating the inner loop. This is because the outer loop will never find a subsequence containing str2. This will take O(n)O(n) time. Therefore, the total time complexity in this case will be O(n+n+n)O(n+n+n), which is O(n)O(n).

Therefore, the worst-case time complexity of this algorithm is O(n×m)O(n \times m).

Space complexity#

Since we are not using any extra space apart from a few variables, the space complexity is O(1)O(1).

Minimum Window Subsequence

Minimum Window Substring