本文主要介绍Python2和Python3中,使用unicodedata来判断指定字符是否为标点符号(Punctuation)的方法,以及判断的示例代码。

使用unicodedata.category()判断标点符号

Unicode类别P*专用于标点符号:

Pc :标点,连接符(connector )

Pd :标点,短划线(dash)

Ps :标点,开始(start)

Pe :标点,结束(end)

Pi  :标点,前引号(initial quote,根据具体使用情况,作用可能像 Ps 或 Pe)

Pf  :标点,后引号(final quote,根据具体使用情况,作用可能像 Ps 或 Pe)

Po :标点,其他(other)

python3.8+版本:

>>> import sys
>>> from unicodedata import category
>>> codepoints = range(sys.maxunicode + 1)
>>> punctuation = {c for i in codepoints if category(c := chr(i)).startswith("P")}
>>> "'" in punctuation
True
>>> "’" in punctuation
True

python3版本:

>>> import sys
>>> from unicodedata import category
>>> chrs = (chr(i) for i in range(sys.maxunicode + 1)) >>> punctuation = set(c for c in chrs if category(c).startswith("P"))
>>> "'" in punctuation
True
>>> "’" in punctuation
True

python2版本:

>>> import sys
>>> from unicodedata import category
>>> chrs = (unichr(i) for i in range(sys.maxunicode + 1)) >>> punctuation = set(c for c in chrs if category(c).startswith("P"))
>>> u"'" in punctuation
True
>>> u"’" in punctuation
True

标点符号判断方法代码:

import unicodedata
class DuckType:
    def __contains__(self,s):
        return unicodedata.category(s).startswith("P")
punct=DuckType()
print("'" in punct,'"' in punct,"a" in punct)
#python2.7中调用方法
#print(u"'" in punct,u'"' in punct,u"a" in punct)
(True, True, False)

推荐文档