1. 程式人生 > >leetcode 10 Regular Expression Matching(簡單正則表示式匹配)

leetcode 10 Regular Expression Matching(簡單正則表示式匹配)

最近程式碼寫的少了,而leetcode一直想做一個python,c/c++解題報告的專題,c/c++一直是我非常喜歡的,c語言程式設計練習的重要性體現在linux核心程式設計以及一些大公司演算法上機的要求,python主要為了後序轉型資料分析和機器學習,所以今天來做一個難度為hard 的簡單正則表示式匹配。

做了很多leetcode題目,我們來總結一下套路:
首先一般是檢查輸入引數是否正確,然後是處理演算法的特殊情況,之後就是實現邏輯,最後就是返回值。

當程式設計成為一種解決問題的習慣,我們就成為了一名純粹的程式設計師

leetcode 10 Regular Expression Matching

(簡單正則表示式匹配)

題目描述

Implement regular expression matching with support for ‘.’ and ‘*’.
‘.’ Matches any single character.
‘*’ Matches zero or more of the preceding element.
The matching should cover the entire input string (not partial).
The function prototype should be:
bool isMatch(const char *s, const char *p)

Some examples:
isMatch(“aa”,”a”) → false
isMatch(“aa”,”aa”) → true
isMatch(“aaa”,”aa”) → false
isMatch(“aa”, “a*”) → true
isMatch(“aa”, “.*”) → true
isMatch(“ab”, “.*”) → true
isMatch(“aab”, “c*a*b”) → true

題目意義及背景

It might seem deceptively easy even you know the general idea, but programming it correctly with all the details require careful thought.

在程式設計中,規劃所有的細節問題都需要認真思考。

題目分析以及需要注意的問題

為什麼aab可以匹配模式c*a*b呢?

● It is a regular expression.Not a wild card.So the ” * ” does not mean any string.And the cab should be split like this “c * a * b” which means N “c”,N “a” and One “b”.

’ * ’ Matches zero or more of the preceding element, so ” c* ” could match nothing.

此題不能使用貪心法

考慮情況:

s = “ac”, p = “ab*c”

After the first ‘a’, we see that there is no b’s to skip for “b*”. We match the last ‘c’ on both side and conclude that they both match.

It seems that being greedy is good. But how about this case?

s = “abbc”, p = “ab*bbc”
When we see “b*” in p, we would have skip all b’s in s. They both should match, but we have no more b’s to match. Therefore, the greedy approach fails in the above case.

可能還有人說,如果碰到這種情況可以先看一下*後面的內容,但是碰見下面的情況就不好辦了。

s = “abcbcd”, p = “a.*c.*d”
Here, “.*” in p means repeat ‘.’ 0 or more times. Since ‘.’ can match any character, it is not clear how many times ‘.’ should be repeated. Should the ‘c’ in p matches the first or second ‘c’ in s?

所以:
Unfortunately, there is no way to tell without using some kind of exhaustive search(窮舉搜尋).

Hints:

A sample diagram of a deterministic finite state automata (DFA). DFAs are useful for doing lexical analysis and pattern matching. An example is UNIX’s grep tool. Please note that this post does not attempt to descibe a solution using DFA.

什麼是DFA?

Solution

主要解決方案是回溯法,使用遞迴或者dp

We need some kind of backtracking mechanism (回溯法)such that when a matching fails, we return to the last successful matching state and attempt to match more characters in s with ‘*’. This approach leads naturally to recursion.

The recursion mainly breaks down elegantly to the following two cases: 主要考慮兩種遞迴情況
1. If the next character of p is NOT ‘*’, then it must match the current character of s. Continue pattern matching with the next character of both s and p.
2. If the next character of p is ‘*’, then we do a brute force exhaustive matching of 0, 1, or more repeats of current character of p… Until we could not match any more characters.
You would need to consider the base case carefully too. That would be left as an exercise to the reader.

Below is the extremely concise code (Excluding comments and asserts, it’s about 10 lines of code).

解題過程如下:

1、考慮特殊情況即*s字串或者*p字串結束。
(1)s字串結束,要求*p也結束或者間隔‘’ (例如p=”a*b*c……”),否則無法匹配
(2)*s字串未結束,而*p字串結束,則無法匹配
2、*s字串與*p字串均未結束
(1)(p+1)字元不為’‘,則只需比較s字元與*p字元,若相等則遞迴到(s+1)字串與*(p+1)字串的比較,否則無法匹配。
(2)(p+1)字元為’‘,則p字元可以匹配*s字串中從0開始任意多(記為i)等於*p的字元,然後遞迴到(s+i+1)字串與*(p+2)字串的比較,
只要匹配一種情況就算完全匹配。

bool isMatch(const char *s,const char *p)
{
    //判斷引數合法,以及程式正常結束
    assert( s && p);
    if(*p == '\0') return *s == '\0';

    //next char is not '*'; must match current character
    if(*(p+1) != '*')
    {
        assert(*p != '*');//考慮情況isMatch('aa','a*');
        return ((*p == *s) ||(*p == '.' && *s != '\0')) && isMatch(s + 1, p + 1);
    }

    //next char is '*' 繼續遞迴匹配,不能寫成*(p+1) == '*' 考慮情況isMatch('ab','.*c')
    while((*p == *s)|| (*p == '.' && *s != '\0'))
    {
        if (isMatch(s, p+2)) return true;
        s++;
    }

    //匹配下一個模式
    return isMatch(s,p+2);
}

此程式碼執行時間:18ms
這裡寫圖片描述

Further Thoughts:

Some extra exercises to this problem:
1. If you think carefully, you can exploit some cases that the above code runs in exponential complexity. Could you think of some examples? How would you make the above code more efficient?
2. Try to implement partial matching instead of full matching. In addition, add ‘^’ and ‘$’ to the rule. ‘^’ matches the starting position within the string, while ‘$’ matches the ending position of the string.
3. Try to implement wildcard matching where ‘*’ means any sequence of zero or more characters.
For the interested reader, real world regular expression matching (such as the grep tool) are usually implemented by applying formal language theory. To understand more about it, you may read this article.
Rating: 4.8/5 (107 votes cast)
Regular Expression Matching, 4.8 out of 5 based on 107 ratings

leetcode的 解題報告提醒我們說

leetcode的解答報告中說的If you are stuck, recursion is your friend.

// 遞迴版,時間複雜度O(n),空間複雜度O(1)

class Solution {
 public:
  bool isMatch(const char *s, const char *p)
  {
      if (*p == '\0') return *s == '\0';
      // next char is not '*', then must match current character
      if (*(p + 1) != '*')

      {
          if (*p == *s || (*p == '.' && *s != '\0'))
              return isMatch(s + 1, p + 1);
          else
              return false;

      }
      else
      { // next char is '*'
          while (*p == *s || (*p == '.' && *s != '\0'))
          {
              if (isMatch(s, p + 2))
                  return true;
              s++;
          }
          return isMatch(s, p + 2);
      }
  }
};

c++解決方案:

My concise recursive and DP solutions with full explanation in C++

Please refer to my blog post if you have any comment. Wildcard matching problem can be solved similarly.

class Solution {
public:
    bool isMatch(string s, string p) {
        if (p.empty())    return s.empty();

        if ('*' == p[1])
            // x* matches empty string or at least one character: x* -> xx*
            // *s is to ensure s is non-empty
            return (isMatch(s, p.substr(2)) || !s.empty() && (s[0] == p[0] || '.' == p[0]) && isMatch(s.substr(1), p));
        else
            return !s.empty() && (s[0] == p[0] || '.' == p[0]) && isMatch(s.substr(1), p.substr(1));
    }
};
class Solution {
public:
    bool isMatch(string s, string p) {
        /**
         * f[i][j]: if s[0..i-1] matches p[0..j-1]
         * if p[j - 1] != '*'
         *      f[i][j] = f[i - 1][j - 1] && s[i - 1] == p[j - 1]
         * if p[j - 1] == '*', denote p[j - 2] with x
         *      f[i][j] is true iff any of the following is true
         *      1) "x*" repeats 0 time and matches empty: f[i][j - 2]
         *      2) "x*" repeats >= 1 times and matches "x*x": s[i - 1] == x && f[i - 1][j]
         * '.' matches any single character
         */
        int m = s.size(), n = p.size();
        vector<vector<bool>> f(m + 1, vector<bool>(n + 1, false));

        f[0][0] = true;
        for (int i = 1; i <= m; i++)
            f[i][0] = false;
        // p[0.., j - 3, j - 2, j - 1] matches empty iff p[j - 1] is '*' and p[0..j - 3] matches empty
        for (int j = 1; j <= n; j++)
            f[0][j] = j > 1 && '*' == p[j - 1] && f[0][j - 2];

        for (int i = 1; i <= m; i++)
            for (int j = 1; j <= n; j++)
                if (p[j - 1] != '*')
                    f[i][j] = f[i - 1][j - 1] && (s[i - 1] == p[j - 1] || '.' == p[j - 1]);
                else
                    // p[0] cannot be '*' so no need to check "j > 1" here
                    f[i][j] = f[i][j - 2] || (s[i - 1] == p[j - 2] || '.' == p[j - 2]) && f[i - 1][j];

        return f[m][n];
    }
};

The shortest AC code.

1.’.’ is easy to handle. if p has a ‘.’, it can pass any single character in s except ‘\0’.
2.” is a totally different problem. if p has a ” character, it can pass any length of first-match characters in s including ‘\0’.

  class Solution {
    public:
    bool matchFirst(const char *s, const char *p){
        return (*p == *s || (*p == '.' && *s != '\0'));
    }

bool isMatch(const char *s, const char *p) {
    if (*p == '\0') return *s == '\0';  //empty

    if (*(p + 1) != '*') {//without *
        if(!matchFirst(s,p)) return false;
        return isMatch(s + 1, p + 1);
    } else { //next: with a *
        if(isMatch(s, p + 2)) return true;    //try the length of 0
        while ( matchFirst(s,p) )       //try all possible lengths 
            if (isMatch(++s, p + 2))return true;
    }
}
};

a shorter one (14 lines of code) with neatly format:

class Solution {
public:
    bool isMatch(const char *s, const char *p) {
        for( char c = *p; c != 0; ++s, c = *p ) {
            if( *(p+1) != '*' )
                p++;
            else if( isMatch( s, p+2 ) )
                return true;
            if( (*s==0) || ((c!='.') && (c!=*s)) )
                return false;
        }
        return *s == 0;
    }
};

9-lines 16ms C++ DP Solutions with Explanations

This problem has a typical solution using Dynamic Programming. We define the state P[i][j] to be true if s[0..i) matches p[0..j) and false otherwise. Then the state equations are:
a. P[i][j] = P[i - 1][j - 1], if p[j - 1] != ‘*’ && (s[i - 1] == p[j - 1] || p[j - 1] == ‘.’);
b. P[i][j] = P[i][j - 2], if p[j - 1] == ‘*’ and the pattern repeats for 0 times;
c. P[i][j] = P[i - 1][j] && (s[i - 1] == p[j - 2] || p[j - 2] == ‘.’), if p[j - 1] == ‘*’ and the pattern repeats for at least 1 times.
Putting these together, we will have the following code.

class Solution {
public:
    bool isMatch(string s, string p) {
        int m = s.length(), n = p.length(); 
        vector<vector<bool> > dp(m + 1, vector<bool> (n + 1, false));
        dp[0][0] = true;
        for (int i = 0; i <= m; i++)
            for (int j = 1; j <= n; j++)
                if (p[j - 1] == '*')
                    dp[i][j] = dp[i][j - 2] || (i > 0 && (s[i - 1] == p[j - 2] || p[j - 2] == '.') && dp[i - 1][j]);
                else dp[i][j] = i > 0 && dp[i - 1][j - 1] && (s[i - 1] == p[j - 1] || p[j - 1] == '.');
        return dp[m][n];
    }
};

2 years agoreply quote

python解決方案

使用re庫,叼炸天!

import re

class Solution:
    # @return a boolean
    def isMatch(self, s, p):
        return re.match('^' + p + '$', s) != None

# debug
s = Solution()
print (s.isMatch("aaa", ".*"))

Python DP solution in 36 ms

def isMatch(self, s, p):
    m = len(s)
    n = len(p)
    dp = [[True] + [False] * m]
    for i in xrange(n):
        dp.append([False]*(m+1))

    for i in xrange(1, n + 1):
        x = p[i-1]
        if x == '*' and i > 1:
            dp[i][0] = dp[i-2][0]
        for j in xrange(1, m+1):
            if x == '*':
                dp[i][j] = dp[i-2][j] or dp[i-1][j] or (dp[i-1][j-1] and p[i-2] == s[j-1]) or (dp[i][j-1] and p[i-2]=='.')
            elif x == '.' or x == s[j-1]:
                dp[i][j] = dp[i-1][j-1]

    return dp[n][m]
about a year agoreply quote 

class Solution(object):
    def isMatch(self, s, p, memo={("",""):True}):
        if not p and s:      return False
        if not s and p:      return set(p[1::2]) == {"*"} and not (len(p) % 2)
        if (s,p) in memo:    return memo[s,p]

        char, exp, prev = s[-1], p[-1], 0 if len(p) < 2 else p[-2]
        memo[s,p] =\
               (exp == '*' and ((prev in {char, '.'} and self.isMatch(s[:-1], p, memo)) or self.isMatch(s, p[:-2], memo)))\
               or\
               (exp in {char, '.'} and self.isMatch(s[:-1], p[:-1], memo))
        return memo[s,p]

# 445 / 445 test cases passed.
# Status: Accepted
# Runtime: 72 ms

8ms backtracking solution C++

//regular expression matching
//first solution: using recursive version
class Solution {
public:
    bool isMatch(string s, string p) {
        int m = s.length(), n = p.length();
        return backtracking(s, m, p, n);
    }

    bool backtracking(string& s, int i, string& p, int j) {
        if (i == 0 && j == 0) return true;
        if (i != 0 && j == 0) return false;
        if (i == 0 && j != 0) {
            //in this case only p == "c*c*c*" this pattern can match null string
            if (p[j-1] == '*') {
                return backtracking(s, i, p, j-2);
            }
            return false;
        }
        //now both i and j are not null
        if (s[i-1] == p[j-1] || p[j-1] == '.') {
            return backtracking(s, i - 1, p, j - 1);
        } else if (p[j-1] == '*') {
            //two cases: determines on whether p[j-2] == s[i-1]
            //first p[j-2]* matches zero characters of p
            if (backtracking(s, i, p, j - 2)) return true;
            //second consider whether p[j-2] == s[i-1], if true, then s[i-1] is matched, move to backtracking(i - 1, j)
            if (p[j-2] == s[i-1] || p[j-2] == '.') {
                return backtracking(s, i - 1, p, j);
            }
            return false;
        }
        return false;
    }
};

c語言參考解決方案:
3ms C solution using O(mn) time and O(n) space

bool isMatch(char *s, char *p){
    int i;

    int ls = strlen(s);
    int lp = strlen(p);
    bool* m = malloc((ls + 1) * sizeof(bool));

    // init
    m[0] = true;
    for (i = 1; i <= ls; i++) {
        m[i] = false;
    }

    int ip;
    for (ip = 0; ip < lp; ip++) {
        if (ip + 1 < lp && p[ip + 1] == '*') {
            // do nothing
        }
        else if (p[ip] == '*') {
            char c = p[ip - 1];
            for (i = 1; i <= ls; i++) {
                m[i] = m[i] || (m[i - 1] && (s[i - 1] == c || c == '.'));
            }
        }
        else {
            char c = p[ip];
            for (i = ls; i > 0; i--) {
                m[i] = m[i - 1] && (s[i - 1] == c || c == '.');
            }
            m[0] = false;
        }
    }

    bool ret = m[ls];
    free(m);
    return ret;
}

簡短的程式碼:

bool isMatch(char* s, char* p) {
    while (*s) {
        if (*p&&*(p+1)=='*') {
            if (!(*p==*s||*p=='.')) {p+=2;continue;} 
            if (!isMatch(s,p+2)) {s++;continue;} else return true;
        }
        if (*p==*s||*p=='.') {s++;p++;continue;}
        return false;
    }
    while(*p&&*(p+1)=='*') p+=2;
    return !*p;

}

參考文獻