Shortest Common Supersequence — Day 60 (Python)

Annamariya Tharayil
3 min readJan 23, 2021
Photo by Andrew Coop on Unsplash

Today’s question is another variation of the Longest Common Subsequence. Let us look at today’s problem statement without further ado.

Shortest Common Supersequence

Given two strings X and Y of lengths m and n respectively, find the length of the smallest string which has both, X and Y as its sub-sequences.
Note: X and Y can have both uppercase and lowercase letters.

Example 1

Input:
X = abcd, Y = xycd
Output: 6
Explanation:
Shortest Common Supersequence
would be abxycd which is of length 6 and
has both the strings as its subsequences.

Example 2

Input:
X = efgh, Y = jghi
Output: 6
Explanation:
Shortest Common Supersequence
would be ejfghi which is of length 6 and
has both the strings as its subsequences.

Your Task:
Complete shortestCommonSupersequence() function that takes X, Y, m, and n as arguments and returns the length of the required string.

Expected Time Complexity: O(Length(X) * Length(Y)).
Expected Auxiliary Space: O(Length(X) * Length(Y)).

Constraints:
1<= |X|, |Y| <= 100

Before we start with solving the problem, let us understand what a supersequence is.

According to Wikipedia,

In computer science, the shortest common supersequence of two sequences X and Y is the shortest sequence which has X and Y as subsequences. A shortest common supersequence (SCS) is a common supersequence of minimal length. In the shortest common supersequence problem, two sequences X and Y are given, and the task is to find the shortest possible common supersequence of these sequences. In general, an SCS is not unique.

Now that we understand what a supersequence is let us try to solve the problem.

The question has two input strings, the output is a number i.e. length of shortest supersequence. We have earlier solved the longest common subsequence and now we are required to find a supersequence. The question seems similar to LCS.

When we solve LCS, we are trying to find the longest subsequence, which is common in both the given input strings. The rest of the characters are not common among both the input strings. Since we need to find a super sequence, we would require the common subsequence and the rest of the characters that are not part of the common substring.

The first part of solving this problem is using LCS.

Let us recall the code for LCS.

class FindLongest:
def longestCommonSubsequence(self, text1: str, text2: str) -> int:
memo = [[float('-inf') for j in range(len(text2)+1)]for i in range(len(text1)+1)]
for i in range(len(memo)):
for j in range(len(memo[0])):
if i == 0 or j == 0:
memo[i][j] = 0
elif text1[i-1] == text2[j-1]:
memo[i][j] = 1 + memo[i-1][j-1]
else:
memo[i][j] = max(memo[i-1][j], memo[i][j-1])
return memo[-1][-1]

The next part is mathematical. As we mentioned before, to find the shortest supersequence, we need the length of the longest common subsequence and the length of uncommon subsequence in both strings.

len(text1)+ len(text2)- (memo[-1][-1])

Let us look into the whole code snippet.

class FindShortest:
def shortestCommonSupersequence(self, text1: str, text2: str) -> int:
memo = [[float('-inf') for j in range(len(text2)+1)]for i in range(len(text1)+1)]
for i in range(len(memo)):
for j in range(len(memo[0])):
if i == 0 or j == 0:
memo[i][j] = 0
elif text1[i-1] == text2[j-1]:
memo[i][j] = 1 + memo[i-1][j-1]
else:
memo[i][j] = max(memo[i-1][j], memo[i][j-1])
return len(text1)+ len(text2) - memo[-1][-1]

Complexity analysis.

Time Complexity

We are traversing through a 2D array of size M*N, hence the time complexity is O(M*N), where M is the size of text1 and N is the size of text2.

Space Complexity.

We are creating a 2D array of size M*N, hence the space complexity is O(M*N), where M is the size of text1 and N is the size of text2.

Link to Aditya Verma’s tutorial.

--

--