700字范文 > python 正则表达式re 模块的使用

python 正则表达式re 模块的使用

时间：2022-09-18 10:47:43

python 正则表达式re 模块的使用
文章目录：

一、re模块介绍1、re模块说明2、官方文档给出的文档3、别人总结的成表格中的内容二、re模块使用介绍1、常用函数源码1、常用的函数介绍与使用三、re模块的两种使用方式

我为什么要写这个博客，我又不搞爬虫，为什么，为什么，为什么？就当证明我学过吧！！！当然啦，这也可能是你看过的史上最全的re教程啦

一、re模块介绍

正则表达式就是为了利用特殊符号，快速实现字符串的匹配，或者说是字符串的过滤

1、re模块说明

正则表达式是一个特殊的字符序列，它能帮助你方便的检查一个字符串是否与某种模式匹配。

Python 自1.5版本起增加了re 模块，它提供 Perl 风格的正则表达式模式。

re 模块使 Python 语言拥有全部的正则表达式功能。

compile 函数根据一个模式字符串和可选的标志参数生成一个正则表达式对象。该对象拥有一系列方法用于正则表达式匹配和替换。

re 模块也提供了与这些方法功能完全一致的函数，这些函数使用一个模式字符串做为它们的第一个参数。

2、官方文档给出的文档

正则表达式语法：

>>> import re>>> print(re.__doc__)Support for regular expressions (RE).This module provides regular expression matching operations similar tothose found in Perl. It supports both 8-bit and Unicode strings; boththe pattern and the strings being processed can contain null bytes andcharacters outside the US ASCII range.Regular expressions can contain both special and ordinary characters.Most ordinary characters, like "A", "a", or "0", are the simplestregular expressions; they simply match themselves. You canconcatenate ordinary characters, so last matches the string 'last'.The special characters are:"."Matches any character except a newline."^"Matches the start of the string."$"Matches the end of the string or just before the newline atthe end of the string."*"Matches 0 or more (greedy) repetitions of the preceding RE.Greedy means that it will match as many repetitions as possible."+"Matches 1 or more (greedy) repetitions of the preceding RE."?"Matches 0 or 1 (greedy) of the preceding RE.*?,+?,?? Non-greedy versions of the previous three special characters.{m,n} Matches from m to n repetitions of the preceding RE.{m,n}? Non-greedy version of the above."\\"Either escapes special characters or signals a special sequence.[] Indicates a set of characters.A "^" as the first character indicates a complementing set."|"A|B, creates an RE that will match either A or B.(...) Matches the RE inside the parentheses.The contents can be retrieved or matched later in the string.(?aiLmsux) Set the A, I, L, M, S, U, or X flag for the RE (see below).(?:...) Non-grouping version of regular parentheses.(?P<name>...) The substring matched by the group is accessible by name.(?P=name)Matches the text matched earlier by the group named name.(?#...) A comment; ignored.(?=...) Matches if ... matches next, but doesn't consume the string.(?!...) Matches if ... doesn't match next.(?<=...) Matches if preceded by ... (must be fixed length).(?<!...) Matches if not preceded by ... (must be fixed length).(?(id/name)yes|no) Matches yes pattern if the group with id/name matched,the (optional) no pattern otherwise.The special sequences consist of "\\" and a character from the listbelow. If the ordinary character is not on the list, then theresulting RE will match the second character.\number Matches the contents of the group of the same number.\A Matches only at the start of the string.\Z Matches only at the end of the string.\b Matches the empty string, but only at the start or end of a word.\B Matches the empty string, but not at the start or end of a word.\d Matches any decimal digit; equivalent to the set [0-9] inbytes patterns or string patterns with the ASCII flag.In string patterns without the ASCII flag, it will match the wholerange of Unicode digits.\D Matches any non-digit character; equivalent to [^\d].\s Matches any whitespace character; equivalent to [ \t\n\r\f\v] inbytes patterns or string patterns with the ASCII flag.In string patterns without the ASCII flag, it will match the wholerange of Unicode whitespace characters.\S Matches any non-whitespace character; equivalent to [^\s].\w Matches any alphanumeric character; equivalent to [a-zA-Z0-9_]in bytes patterns or string patterns with the ASCII flag.In string patterns without the ASCII flag, it will match therange of Unicode alphanumeric characters (letters plus digitsplus underscore).With LOCALE, it will match the set [0-9_] plus characters definedas letters for the current locale.\W Matches the complement of \w.\\ Matches a literal backslash.This module exports the following functions:matchMatch a regular expression pattern to the beginning of a string.fullmatch Match a regular expression pattern to all of a string.search Search a string for the presence of a pattern.sub Substitute occurrences of a pattern found in a string.subnSame as sub, but also return the number of substitutions made.splitSplit a string by the occurrences of a pattern.findall Find all occurrences of a pattern in a string.finditer Return an iterator yielding a match object for each pile Compile a pattern into a RegexObject.purgeClear the regular expression cache.escape Backslash all non-alphanumerics in a string.Some of the functions in this module takes flags as optional parameters:A ASCII For string patterns, make \w, \W, \b, \B, \d, \Dmatch the corresponding ASCII character categories(rather than the whole Unicode categories, which is thedefault).For bytes patterns, this flag is the only availablebehaviour and needn't be specified.I IGNORECASE Perform case-insensitive matching.L LOCALEMake \w, \W, \b, \B, dependent on the current locale.M MULTILINE "^" matches the beginning of lines (after a newline)as well as the string."$" matches the end of lines (before a newline) as wellas the end of the string.S DOTALL"." matches any character at all, including the newline.X VERBOSEIgnore whitespace and comments for nicer looking RE's.U UNICODEFor compatibility only. Ignored for string patterns (itis the default), and forbidden for bytes patterns.This module also defines an exception 'error'.>>>

什么，英文你看不懂，好吧，我也看不懂，哈哈哈，毕竟我六级还没有过呐，哎，我咋那么优秀呀

我自己理解的正则化表达式语法如下表格：

3、别人总结的成表格中的内容

1、表格1

2、表格2

二、re模块使用介绍

1、常用函数源码

# public interfacedef match(pattern, string, flags=0):"""Try to apply the pattern at the start of the string, returninga match object, or None if no match was found."""return _compile(pattern, flags).match(string)def fullmatch(pattern, string, flags=0):"""Try to apply the pattern to all of the string, returninga match object, or None if no match was found."""return _compile(pattern, flags).fullmatch(string)def search(pattern, string, flags=0):"""Scan through string looking for a match to the pattern, returninga match object, or None if no match was found."""return _compile(pattern, flags).search(string)def sub(pattern, repl, string, count=0, flags=0):"""Return the string obtained by replacing the leftmostnon-overlapping occurrences of the pattern in string by thereplacement repl. repl can be either a string or a callable;if a string, backslash escapes in it are processed. If it isa callable, it's passed the match object and must returna replacement string to be used."""return _compile(pattern, flags).sub(repl, string, count)def subn(pattern, repl, string, count=0, flags=0):"""Return a 2-tuple containing (new_string, number).new_string is the string obtained by replacing the leftmostnon-overlapping occurrences of the pattern in the sourcestring by the replacement repl. number is the number ofsubstitutions that were made. repl can be either a string or acallable; if a string, backslash escapes in it are processed.If it is a callable, it's passed the match object and mustreturn a replacement string to be used."""return _compile(pattern, flags).subn(repl, string, count)def split(pattern, string, maxsplit=0, flags=0):"""Split the source string by the occurrences of the pattern,returning a list containing the resulting substrings. Ifcapturing parentheses are used in pattern, then the text of allgroups in the pattern are also returned as part of the resultinglist. If maxsplit is nonzero, at most maxsplit splits occur,and the remainder of the string is returned as the final elementof the list."""return _compile(pattern, flags).split(string, maxsplit)def findall(pattern, string, flags=0):"""Return a list of all non-overlapping matches in the string.If one or more capturing groups are present in the pattern, returna list of groups; this will be a list of tuples if the patternhas more than one group.Empty matches are included in the result."""return _compile(pattern, flags).findall(string)def finditer(pattern, string, flags=0):"""Return an iterator over all non-overlapping matches in thestring. For each match, the iterator returns a match object.Empty matches are included in the result."""return _compile(pattern, flags).finditer(string)def compile(pattern, flags=0):"Compile a regular expression pattern, returning a pattern object."return _compile(pattern, flags)def purge(): # 净化"Clear the regular expression caches"_cache.clear()_compile_repl.cache_clear()def template(pattern, flags=0): # 模板"Compile a template pattern, returning a pattern object"return _compile(pattern, flags|T)_alphanum_str = frozenset("_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890")_alphanum_bytes = frozenset(b"_abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ01234567890")def escape(pattern):"""Escape all the characters in pattern except ASCII letters, numbers and '_'."""if isinstance(pattern, str):alphanum = _alphanum_strs = list(pattern)for i, c in enumerate(pattern):if c not in alphanum:if c == "\000":s[i] = "\\000"else:s[i] = "\\" + creturn "".join(s)else:alphanum = _alphanum_bytess = []esc = ord(b"\\")for c in pattern:if c in alphanum:s.append(c)else:if c == 0:s.extend(b"\\000")else:s.append(esc)s.append(c)return bytes(s)

1、常用的函数介绍与使用

re的匹配语法有以下几种

(pattern, string, flags=0)：从头开始匹配re.search(pattern, string, flags=0) ：匹配包含re.findall(pattern, string, flags=0)：把所有匹配到的字符放到以列表中的元素返回re.split(pattern, string, maxsplit=0, flags=0)：以匹配到的字符当做列表分隔符re.sub(pattern, repl, string, count=0, flags=0)：匹配字符并替换re.fullmatch(pattern, string, flags=0)：全部匹配

1、re.match(pattern, string, flags=0)：从头开始匹配

从头开始匹配，比如要匹配一个字符串中的数字，那么该字符串一定要以数字开头，eg:“hello”，这种是可以匹配到的，“helloworld”，这种是匹配不到的

参数说明：

pattern：要匹配的模式，传入的是字符串类型，一般在前面加上r"",避免转义符影响string：要匹配的字符串flags：标志位，默认值为0

返回值：match对象

如果匹配到则返回一个match对象，如果没有匹配到则返回None例如：<_sre.SRE_Match object; span=(0, 4), match=‘1234’>
返回值的含义（下面其他的方法返回值含义同此，不一一介绍）：

_sre.SRE_Match object：返回的match对象span=(0, 4)：匹配到字符的下标，并不包括右端点(包前不包后)，即匹配到字符的小标为：0,1,2,3match=‘1234’：匹配到的具体字符串

举例：

# \d 匹配数字一次，匹配数字的范围是[0-9]，# 如果要匹配数字多次就在后面填上一个加号 \d+obj = re.match('\d+', '1234uuasf') print(obj) # span=(0-3) 匹配到的字符串的范围是0-3（匹配到字符串的下标）包前不包后 # match="1234" 匹配到的字符串if obj:print(obj.group())obj = re.match('\d+', 'uu1234uuasf')print(obj) # span=(0-3) 匹配到的字符串的范围是0-3 包前不包后 if obj:print(obj.group())# 结果<_sre.SRE_Match object; span=(0, 4), match='1234'>1234None

那如果想要匹配字符串中数字，但是字符串又不是以数字开头的怎么办？因为这种情况re.match()匹配返回值为None，此时就可以用re.search

2、re.search(pattern, string, flags=0) ：匹配包含

会搜索整个字符串，如果有符合匹配模式的字符就可以匹配到

参数说明：

pattern：要匹配的模式，传入的是字符串类型，一般在前面加上r"",避免转义符影响string：要匹配的字符串flags：标志位，默认值为0

返回值：match对象

如果匹配到则返回一个match对象，如果没有匹配到则返回None例如：<_sre.SRE_Match object; span=(0, 4), match=‘1234’>

举例：

obj = re.search('\d+', 'uu1234uua212sf')print(obj) # span=(0-3) 匹配到的字符串的范围是0-3 包前不包后if obj:print(obj.group())# 结果<_sre.SRE_Match object; span=(2, 6), match='1234'>1234

从上面的结果可以看出，只匹配到第一次出现的连续的数字，当再有数字出现时不会再匹配，如果想匹配多次怎么办呢（即匹配所有符合条件pattern的元素）？此时可以用re.findall()进行匹配

3、re.findall(pattern, string, flags=0)：把所有匹配到的字符放到列表种返回

把所有匹配到的字符放到列表种返回

参数说明：

pattern：要匹配的模式，传入的是字符串类型，一般在前面加上r"",避免转义符影响string：要匹配的字符串flags：标志位，默认值为0

返回值：列表

如果匹配到，则返回符合字符的列表，如果没有匹配，则返回一个空列表

举例：

obj = re.findall('\d', 'uu1234uua212sf3')print(obj) obj = re.findall('\d+', 'uu1234uua212sf3')print(obj) obj = re.findall('\d+', 'hasattrellow')print(obj) # 结果['1', '2', '3', '4', '2', '1', '2', '3']['1234', '212', '3'][]

4、re.split(pattern, string, maxsplit=0, flags=0)：以匹配到的字符当做列表分隔符

以匹配到的字符当做列表分隔符

参数说明：

pattern：要匹配的模式，传入的是字符串类型，一般在前面加上r"",避免转义符影响string：要匹配的字符串maxsplit：匹配符合模式的次数，默认是0，表示不限制次数flags：标志位，默认值为0

返回值：列表

如果匹配到，返回值是一个分割的列表，如果没有匹配到就返回原来字符串的列表，即不分割

举例：

s='9-2*5/3+7/3*99/4*2998+10*568/14'# 以* - + / 为分隔符进行分割re.split("[\*\-\/\+]", s)s='9-2*5/3+7/3*99/4*2998+10*568/14'# maxsplit=3 只匹配到第三次，停止匹配（匹配三次，分成四个元素）re.split("[\*\-\/\+]", s, maxsplit=3)re.split("[\*\-\/\+]", "128739oul")# 结果['9', '2', '5', '3', '7', '3', '99', '4', '2998', '10', '568', '14']['9', '2', '5', '3+7/3*99/4*2998+10*568/14']['128739oul'] # 没有匹配到就返回原来字符串的列表，即分割

5、re.sub(pattern, repl, string, count=0, flags=0)：匹配字符并替换

参数说明：pattern：要匹配的模式，传入的是字符串类型，一般在前面加上r"",避免转义符影响repl：要替换匹配到的字符的字符串string：要匹配的字符串count：替换匹配到字符串的次数，默认值为0，即不限制次数flags：标志位，默认值为0

返回值：字符串

如果匹配到，返回值是一个分割的列表，如果没有匹配到就返回原来字符串的列表，即不分割

举例：

re.sub('[a-z]+','nb','中国abc123abc')re.sub('\d+','|', 'alex22tom33jack55hi',count=2)re.sub('[a-z]','nb','中国abc123abc')# 结果'中国nb123nb''alex|tom|jack55hi''中国nbnbnb123nbnbnb'

5、 re.fullmatch(pattern, string, flags=0)：全部匹配

整个字符串匹配成功就返回re object, 否则返回None

参数说明：

pattern：要匹配的模式，传入的是字符串类型，一般在前面加上r"",避免转义符影响string：要匹配的字符串flags：标志位，默认值为0

返回值：match对象

如果匹配到则返回一个match对象，如果没有匹配到则返回None例如：<_sre.SRE_Match object; span=(0, 4), match=‘1234’>

举例：

obj = re.fullmatch('\w+@\w+\.(com|cn|edu)',"alex@")print(obj)obj.group()obj = re.fullmatch('\d+', 'uu1234uua212sf')print(obj) if obj:print(obj.group())# 结果'alex@'None

取出返回的match对象匹配到的字符串

1、用group()对象中的字符串

对于：re.match, re.serach, re.fullmatch 返回match对象的可以用group()提取匹配到的字符串

match对象.group()

返回值字符串

正则表达式实战例子：

1、验证邮箱

举例：zhangsan-001@ 分析邮件名称部分：26个大小写英文字母表示为a-zA-Z* 数字表示为0-9* 下划线表示为_* 中划线表示为-由于名称是由若干个字母、数字、下划线和中划线组成，所以需要用到+表示多次出现根据以上条件得出邮件名称表达式：[a-zA-Z0-9_-]+ 分析域名部分：一般域名的规律为“[N级域名][三级域名.]二级域名.顶级域名”，比如“”、“”、“mp.”、“12-”，分析可得域名类似“** .** .** .**”组成。“**”部分可以表示为[a-zA-Z0-9_-]+“.**”部分可以表示为\.[a-zA-Z0-9_-]+多个“.**”可以表示为(\.[a-zA-Z0-9_-]+)+综上所述，域名部分可以表示为[a-zA-Z0-9_-]+(\.[a-zA-Z0-9_-]+)+最终表达式：由于邮箱的基本格式为“名称@域名”，需要使用“^”匹配邮箱的开始部分，用“$”匹配邮箱结束部分以保证邮箱前后不能有其他字符，所以最终邮箱的正则表达式为： ^[a-zA-Z0-9_-]+@[a-zA-Z0-9_-]+(\.[a-zA-Z0-9_-]+)+$