How to detect if a String has specific UTF-8 characters in it? (Python) -

- August 15, 2014

i have list of strings in python. want remove strings list special utf-8 characters. want strings include characters "u+0021" "u+00ff". so, know way detect if string contains these special characters?

thanks :)

edit: use python 3

the latin1 encoding correspond 256 first utf8 characters. differently, if c unicode character code in [0-255], c.encode('latin1') has same value ord(c).

so test whether string has @ least 1 character outside [0-255] range, try encode latin1. if contains none, encoding succeed, else unicodeencodeerror:

no_special = true try:     s.encode('latin1') except unicodeencodeerror:     no_special = false

btw, told in comment unicode characters outside [0-255] range not special, not in latin1 range.

please note above accepts control characters \t, \r or \n because legal latin1 characters. may or not want here.

Search This Blog

Ant COmde

How to detect if a String has specific UTF-8 characters in it? (Python) -

Comments

Post a Comment

Popular posts from this blog

sql - invalid in the select list because it is not contained in either an aggregate function -

Angularjs unit testing - ng-disabled not working when adding text to textarea -

How to start daemon on android by adb -