How to detect if a String has specific UTF-8 characters in it? (Python) -
i have list of strings in python. want remove strings list special utf-8 characters. want strings include characters "u+0021" "u+00ff". so, know way detect if string contains these special characters?
thanks :)
edit: use python 3
the latin1 encoding correspond 256 first utf8 characters. differently, if c
unicode character code in [0-255]
, c.encode('latin1')
has same value ord(c)
.
so test whether string has @ least 1 character outside [0-255] range, try encode latin1
. if contains none, encoding succeed, else unicodeencodeerror:
no_special = true try: s.encode('latin1') except unicodeencodeerror: no_special = false
btw, told in comment unicode characters outside [0-255] range not special, not in latin1 range.
please note above accepts control characters \t
, \r
or \n
because legal latin1 characters. may or not want here.
Comments
Post a Comment