【C++】regex 正則表示式

阿新 • • 發佈：2018-11-04

正則表示式是一種描述字元序列的方法，是C++11標準庫中新加入的強大工具。正則表示式是一種用於字串處理的微型語言，適用於一些與字串相關的操作。C++11包含了對以下幾種語法的支援：ECMAScript、basic、extended、awk、grep和egrep。C++11中使用的預設語法是ECMAScript。

RE庫定義在標頭檔案regex中，它包含多個元件：

匹配

regex_match：regex_match()演算法可以用於比較一個給定源字串和一個正則表示式模式，如果模式匹配整個源字串，則返回true，否則返回false

。

#include <iostream>
#include <regex>
using namespace std;

int main() 
{
	string str = "twinkle1993";

	regex r("[a-z0-9]+");
	cout << "正則表示式：[a-z0-9]+" << endl;
	if (regex_match(str, r))
		cout << "字串：" << str << " 匹配成功！" << endl;
	else
		cout << "字串：" << str << " 匹配失敗！" << endl;

	cout << endl << "正則表示式：\\d+" << endl;
	if (regex_match(str, regex("\\d+")))
		cout << "字串：" << str << " 匹配成功！" << endl;
	else
		cout << "字串：" << str << " 匹配失敗！" << endl;

	cout << endl << "正則表示式：\\d+" << endl;
	if (regex_match(str.begin() + 7, str.end(), regex("\\d+")))
		cout << "字串：" << &str[7] << " 匹配成功！" << endl;
	else
		cout << "字串：" << &str[7] << " 匹配失敗！" << endl;

	smatch sm;
	cout << endl << "正則表示式：([a-z]+)(\\d+)" << endl;
	if (regex_match(str.cbegin() + 5, str.cend(), sm, regex("([a-z]+)(\\d+)"))) 
	{
		cout << "字串：" << &str[5] << " 匹配成功！" << endl;
		cout << "匹配字串個數：" << sm.size() << endl;
		cout << "分別為：";
		for (auto aa : sm)
			cout << aa.str() << " ";
		cout << endl;
	}
	else
		cout << "字串：" << &str[5] << " 匹配失敗！" << endl;

	cmatch cm;
	cout << endl << "正則表示式：([a-z]+)(\\d+)" << endl;
	if (regex_match(str.c_str(), cm, regex("([a-z]+)(\\d+)"))) 
	{
		cout << "字串：" << str << " 匹配成功！" << endl;
		cout << "匹配字串個數：" << cm.size() << endl;
		cout << "分別為：";
		for (auto aa : cm)
			cout << aa.str() << " ";
		cout << endl;
	}
	else
		cout << "字串：" << str << " 匹配失敗！" << endl;
	return 0;
}

執行結果：

正則表示式：[a-z0-9]+
字串：twinkle1993 匹配成功！

正則表示式：\d+
字串：twinkle1993 匹配失敗！

正則表示式：\d+
字串：1993 匹配成功！

正則表示式：([a-z]+)(\d+)
字串：le1993 匹配成功！
匹配字串個數：3
分別為：le1993 le 1993

正則表示式：([a-z]+)(\d+)
字串：twinkle1993 匹配成功！
匹配字串個數：3
分別為：twinkle1993 twinkle 1993

查詢

regex_search：regex_search()演算法可以在輸入字串中提取匹配的子字串。smatch物件sm將包含搜尋結果。如果要獲得第一個捕捉組的字串表達形式，可在程式碼中編寫m[1]或m[1].str()。通過檢視m[1].first和m[1].second迭代器可以得到這個子字串在源字串中出現的準確位置。

#include <iostream>
#include <regex>
using namespace std;

int main() 
{
	string str = "twinkle1993winkle1993inkle1993";
	smatch sm;

	cout << "正則表示式：([a-z]+)1" << endl;
	for (auto it = str.cbegin(); regex_search(it, str.cend(), sm, regex("([a-z]+)1")); it = sm.suffix().first) 
	{
		cout << "字串：" << &*it << " 匹配成功！" << endl;
		cout << "匹配字元子串個數：" << sm.size() << endl;
		cout << "分別為：";
		for (auto aa : sm)
			cout << aa.str() << " ";

		cout << endl;
		cout << "字串 " << sm.str() << " 前的字串為：" << sm.prefix().str() << endl;
		cout << "字串 " << sm.str() << " 後的字串為：" << sm.suffix().str() << endl;
		cout << endl;
	}
	return 0;
}

執行結果：

正則表示式：([a-z]+)1
字串：twinkle1993winkle1993inkle1993 匹配成功！
匹配字元子串個數：2
分別為：twinkle1 twinkle
字串 twinkle1 前的字串為：
字串 twinkle1 後的字串為：993winkle1993inkle1993

字串：993winkle1993inkle1993 匹配成功！
匹配字元子串個數：2
分別為：winkle1 winkle
字串 winkle1 前的字串為：993
字串 winkle1 後的字串為：993inkle1993

字串：993inkle1993 匹配成功！
匹配字元子串個數：2
分別為：inkle1 inkle
字串 inkle1 前的字串為：993
字串 inkle1 後的字串為：993

regex_iterator

為了逐一迭代正則查詢的所有匹配成果，我們也可以使用regex_iterator。一般情況下，需要為某個特定的容器指定一個尾迭代器，但是對於regex_iterator，只有一個end值。只需要通過預設的建構函式宣告一個regex_iterator型別，就可以獲得這個尾迭代器：這個尾迭代器會被隱式地初始化為end值。

#include <iostream>
#include <regex>
using namespace std;

int main() 
{
	string str = "twinkle1993twink1993le1993";
	regex reg("([a-z]+)1");

	cout << "正則表示式：([a-z]+)1" << endl;
	for (sregex_iterator it(str.begin(), str.end(), reg), end; it != end; it++) 
	{
		cout << "字串：" << &*it->prefix().first << " 匹配成功！" << endl;
		cout << "匹配字元子串個數：" << it->size() << endl;
		cout << "分別為：";
		for (auto aa : *it)
			cout << aa.str() << " ";
		cout << endl;
		cout << "字串 " << it->str() << " 前的字串為：" << it->prefix().str() << endl;
		cout << "字串 " << it->str() << " 後的字串為：" << it->suffix().str() << endl;
		cout << endl;
	}
	return 0;
}

執行結果：

正則表示式：([a-z]+)1
字串：twinkle1993twink1993le1993 匹配成功！
匹配字元子串個數：2
分別為：twinkle1 twinkle
字串 twinkle1 前的字串為：
字串 twinkle1 後的字串為：993twink1993le1993

字串：993twink1993le1993 匹配成功！
匹配字元子串個數：2
分別為：twink1 twink
字串 twink1 前的字串為：993
字串 twink1 後的字串為：993le1993

字串：993le1993 匹配成功！
匹配字元子串個數：2
分別為：le1 le
字串 le1 前的字串為：993
字串 le1 後的字串為：993

regex_token_iterator

regex_iterator有助於迭代“匹配合格”的子序列。然而有時候你會想處理那些子序列之間的內容，特別是當你打算將string拆分為一個個語彙單元token或以某個東西分割string，分隔符甚至可能被指定為一個正則表示式。regex_token_iterator就提供了這樣的功能。

為了將它初始化，需要傳給它字元序列的起點和終點，以及一個正則表示式。此外還可以指明一列整數值，用來表示語彙化過程中的元素：
* -1：表示你對每一個“匹配之正則表示式之間”或“語彙切分器之間”的子序列感興趣
* 0：表示你對每一個匹配之正則表示式或語彙切分器感興趣
* 任何其他數字nn：表示你對正則表示式中的第nn個匹配次表示式感興趣

#include <iostream>
#include <regex>
using namespace std;

int main() 
{
	string str = "11twinkle1993teink1992le1994";
	regex reg("([a-z]+)1");

	cout << "正則表示式：([a-z]+)1" << endl;
	cout << "字串為：" << str << endl;
	for (sregex_token_iterator it(str.begin(), str.end(), reg), end; it != end; it++) 
	{
		cout << "匹配到的字串為：" << it->str() << endl;
	}
	cout << endl;

	for (sregex_token_iterator it(str.begin(), str.end(), reg, 1), end; it != end; it++) 
	{
		cout << "匹配到的字串為：" << it->str() << endl;
	}
	cout << endl;

	for (sregex_token_iterator it(str.begin(), str.end(), reg, -1), end; it != end; it++) 
	{
		cout << "匹配到的字串為：" << it->str() << endl;
	}
	cout << endl;
	return 0;
}

執行結果：

正則表示式：([a-z]+)1
字串為：11twinkle1993teink1992le1994
匹配到的字串為：twinkle1
匹配到的字串為：teink1
匹配到的字串為：le1

匹配到的字串為：twinkle
匹配到的字串為：teink
匹配到的字串為：le

匹配到的字串為：11
匹配到的字串為：993
匹配到的字串為：992
匹配到的字串為：994

替換

regex_replace：regex_replace()演算法要求輸入一個正則表示式，以及一個用於替換匹配子字串的格式化字串。這個格式化字串可以通過轉義序列引用匹配子字串中的部分內容。

C++ STL之正則表示式

【正則表示式1】C++11正則表示式

【C++】regex 正則表示式

匹配

查詢

regex_iterator

regex_token_iterator

替換

【C++】regex 正則表示式

【C#】利用正則表示式判斷輸入是否為純數字、容器類

【javascript】使用正則表示式驗證

【LeetCode】#10正則表示式匹配(Regular Expression Matching)

【PHP】PHP正則表示式驗證表單

【lua】Lua正則表示式匹配郵箱

【筆記】IPV6正則表示式

【轉載】Python正則表示式詳解

【Python】Requests+正則表示式爬取貓眼電影TOP100

【語法12】Python正則表示式

【python 正則表示式】python正則表示式提取郵箱、網址、手機號、ip地址

135.Python修煉之路【140-前端-JQuery-正則表示式】2018.08.04

【java 正則表示式】java正則表示式匹配圖片個數

【正則】JavaScript正則表示式

C++ regex 正則表示式的使用

C# 中使用正則表示式 Regex.Matches方法的幾個應用

【python學習筆記】用正則表示式從含中文的網頁中提取資料（含編碼轉換）

C#Regex正則表示式學習筆記

【3月24日】Requests+正則表示式抓取貓眼電影Top100

【python系列】使用正則表示式去除HTML字串

【C++】regex 正則表示式

匹配

查詢

regex_iterator

regex_token_iterator

替換

相關推薦