本文主要介绍Python中,使用正则表达式等方法,删除替换配置文件中[]方括号内的内容,和多个方括号([])之间的内容,以及相关的示例代码。

示例代码smb文件:

i this is my config file.
Please dont delete it
[homes]
browseable = No
comment = Your Home
create mode = 0640
csc policy = disable
directory mask = 0750
public = No
writeable = Yes
[proj]
browseable = Yes
comment = Project directories
csc policy = disable
path = /proj
public = No
writeable = Yes
[]
This last second line.
End of the line.

删除替换后:

Hi this is my config file.
Please dont delete it
This last second line.
End of the line.

1、使用正则表达式实现

import re
with open("smb", "r") as f:
txt = f.read()
txt = re.sub(r'(\n\[)(.*?)(\[]\n)', '', txt, flags=re.DOTALL)
print(txt)

说明:

(\n\[):找到一个序列,其中有一个换行符,后跟一个[

(\[]\n):查找有[]后接换行符的序列

(.*?):删除(\n\[)和中间的所有内容(\[]\n)

re.DOTALL:正则表达式中点(.)所有字符了,包括换行符(\n)。

2、使用pandas和正则re实现

import re
import pandas as pd
# read each line in the file (one raw -> one line)
txt = pd.read_csv('smb',  sep = '\n', header=None)
# join all the line in the file separating them with '\n'
txt = '\n'.join(txt[0].to_list())
# apply the regex to clean the text (the same as above)
txt = re.sub(r'(\n\[)(.*?)(\[]\n)', '\n', txt, flags=re.DOTALL)
print(txt)

3、使用pandas实现

df = pd.read_csv('smb', sep='----', header=None)
# mark rows starts with `[`
s = df[0].str.startswith('[')
# drop the lines between `[`
df = df.drop(np.arange(s.idxmax(),s[::-1].idxmax()+1))
# write to file if needed
df.to_csv('clean.txt', header=None, index=None)

或者

df = pd.read_csv(your_file,sep='\t',header=None)
idx = df[df[0].str.contains('\[')].index
df1 = df.loc[~df.index.isin(range(idx[0],idx[-1] + 1))]

推荐文档