[ACCEPTED]-Handle wrongly encoded character in Python unicode string-character-encoding
You have to convert your unicode string 4 into a standard string using some encoding 3 e.g. utf-8:
some_unicode_string.encode('utf-8')
Apart from that: this is a dupe 2 of
BeautifulSoup findall with class attribute- unicode encode error
and at least ten other related questions 1 on SO. Research first.
Your unicode string is fine:
>>> unicodedata.name(u"\xfc")
'LATIN SMALL LETTER U WITH DIAERESIS'
The problem 9 you see at the interactive prompt is that 8 the interpreter doesn't know what encoding 7 to use to output the string to your terminal, so 6 it falls back to the "ascii" codec -- but 5 that codec only knows how to deal with ASCII 4 characters. It works fine on my machine 3 (because sys.stdout.encoding is "UTF-8" for 2 me -- likely because something like my environment 1 variable settings differ from yours)
>>> print u'Gl\xfcck'
Glück
At the beginning of your code, just after 5 imports, add these 3 lines.
import sys # import sys package, if not already imported
reload(sys)
sys.setdefaultencoding('utf-8')
It will override 4 system default encoding (ascii) for the 3 course of your program.
Edit: You shouldn't 2 do this unless you are sure of the consequences, see 1 comment below. This post is also helpful: Dangers of sys.setdefaultencoding('utf-8')
Do not str() cast to string what you've got from 3 model fields, as long as it is an unicode 2 string already. (oops I have totally missed 1 that it is not django-related)
I stumble upon this bug myself while processing 6 a file containing german words that I was 5 unaware it has been encoded in UTF-8. The 4 problem manifest itself when I start processing 3 words and some of them would't show the 2 decoding error.
# python
Python 2.7.12 (default, Aug 22 2019, 16:36:40)
>>> utf8_word = u"Gl\xfcck"
>>> print("Word read was: {}".format(utf8_word))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 2: ordinal not in range(128)
I solve the error calling 1 the encode method on the string:
>>> print("Word read was: {}".format(utf8_word.encode('utf-8')))
Word read was: Glück
More Related questions
We use cookies to improve the performance of the site. By staying on our site, you agree to the terms of use of cookies.