How to detect if a String has specific UTF-8 characters in it? (Python) -
i have list of strings in python. want remove strings list special utf-8 characters. want strings include characters "u+0021" "u+00ff". so, know way detect if string contains these special characters?
thanks :)
edit: use python 3
the latin1 encoding correspond 256 first utf8 characters. differently, if c unicode character code in [0-255], c.encode('latin1') has same value ord(c).
so test whether string has @ least 1 character outside [0-255] range, try encode latin1. if contains none, encoding succeed, else unicodeencodeerror:
no_special = true try: s.encode('latin1') except unicodeencodeerror: no_special = false btw, told in comment unicode characters outside [0-255] range not special, not in latin1 range.
please note above accepts control characters \t, \r or \n because legal latin1 characters. may or not want here.
Comments
Post a Comment