700字范文 > 模式匹配和正则表达式_python

模式匹配和正则表达式_python

时间：2023-08-18 09:26:14

相关推荐

模式匹配和正则表达式_python

模式匹配和正则表达式的学习

python实现

正则表达式

其对文字处理有着超高的效率

利用正则表达式可以实现“三步解决一个问题”

步骤

1.用 import re 导入正则表达式模块。2.用 pile()函数创建一个 Regex 对象(记得使用原始字符串)。3.向 Regex 对象的 search()方法传入想查找的字符串。它返回一个 Match 对象。4.调用 Match 对象的 group()方法，返回实际匹配文本的字符串。

正则表达式-regex

import re# first step , write a expected modulephoneNumRegex=plie(r'\d\d\d-\d\d\d-\d\d\d\d')# second step ,input the the stringmo=phoneNumRegex.search('my phone number is : 123-456-1123')# third step ,output the group resultprint('phone number found:'+mo.group())

其中正则表达式的编写前加一个r,是用来表示是原始字符串，不包括转义字符

use group separate the item

phoneNumRegex2=pile(r'(\d\d\d)-(\d\d\d-\d\d\d\d)')mo2=phoneNumRegex2.search('my phone numer is:345-232-4556')print('mo2.group(1):'+mo2.group(1))print('mo2.group(2):'+mo2.group(2))print("mo2.group(0):"+mo2.group(0))print('mo2.group():'+mo2.group())print(mo2.groups())

如果想匹配（）括号在字符串中，使用'$' ,'$' 转义字符来实现

# explain ,if need () in regex use '\(' # use | pipeline match more than one stringheroRegex=pile(r'dengshuo|zhengchuan')mo3=heroRegex.search('dengshuo orginal name is deng zhengchuan')mo4=heroRegex.search('deng zhengchuan is dengshuo')print(mo3.group()) # | just like or ,return the first match group print(mo4.group()) # can use findall() match all group# use ? implement optinal match ,? partment 0 or 1 timesbatRgex=pile(r'Bat(wo)?man')mo5=batRgex.search('dengshuo is a Batman')mo6=batRgex.search('a woman can be a Batwoman')print(mo5.group())print(mo6.group())# use * 0, 1 or many times# use + ,1 or many timesbatRgex=pile(r'Bat(wo)*man') # can get BatmanbatRgex=pile(r'Bat(wo)+man') # can't get Batman # use (wo){} match special numberbatRgex=pile(r'Bat(wo){2}man')# only match BatwowomanbatRgex=pile(r'Bat(wo){3,5}man') # can match Batwowowoman Batwowowowoman Batwowowowowoman

匹配多个相同字符时，默认是贪心的

greedyHaRegex = pile(r'(Ha){3,5}') # 贪心策略mo1 = greedyHaRegex.search('HaHaHaHaHa')mo1.group()## 'HaHaHaHaHa'nongreedyHaRegex = pile(r'(Ha){3,5}?') # 非贪心mo2 = nongreedyHaRegex.search('HaHaHaHaHa') mo2.group()## 'HaHaHa'

findall()

1.如果调用在一个没有分组的正则表达式上

例如\d\d\d-\d\d\d-\d\d\d\d，方法 findall()将返回一个匹配字符串的列表，例如['415-555-9999', '212-555-0000']。

2.如果调用在一个有分组的正则表达式上

例如(\d\d\d)-(\d\d\d)-(\d\d\d\d)，方法 findall()将返回一个字符串的元组的列表(每个分组对应一个字符串)，例如[('415', '555', '1122'), ('212', '555', '0000')]。

自定义字符分类

\d 数字\D 非数字\w 字母，数字，下划线\W 非(字母，数字，下划线)\s 空格，制表符，换行符\S ^()^ 必须以什么字符串开始$ 必须以什么字符串结束. 通配符* 重复字符.* 可以用来匹配除去换行符外的所有字符

plie(r'[aeiouAEIOU]')

当然还有反向选择

plie(r'[^aeiouAEIOU]')

不区分大小写

complile的参数

plie(r' ',re.I) 或者 plie(r' ',re.IGNORECASE) 输入第二参数进行限定

快速检索

?匹配零次或一次前面的分组。*匹配零次或多次前面的分组。+匹配一次或多次前面的分组。{n}匹配 n 次前面的分组。{n,}匹配 n 次或更多前面的分组。{,m}匹配零次到 m 次前面的分组。{n,m}匹配至少 n 次、至多 m 次前面的分组。{n,m}?或*?或+?对前面的分组进行非贪心匹配。^spam 意味着字符串必须以 spam 开始。spam$意味着字符串必须以 spam 结束。.匹配所有字符，换行符除外。\d、\w 和\s 分别匹配数字、单词和空格。\D、\W 和\S 分别匹配出数字、单词和空格外的所有字符。[abc]匹配方括号内的任意字符(诸如 a、b 或 c)。[^abc]匹配不在方括号内的任意字符。

sub()方法替换字符串

找到目标字符出进行替换，输入两个参数。第一个是要替换的参数，第二个是要被替换的参数

import renamesRegex=pile(r'Agent \w+') # 到空格结束namesRegex.sub('CENSOND'，'Agent Alice gave the number in the sentence')# CENSOND Agent Alice gave the number in the sentence

本内容不代表本网观点和政治立场，如有侵犯你的权益请联系我们处理。

网友评论

网友评论仅供其表达个人看法，并不表明网站立场。