Python 3.3 Encoding Issue -


i have sql database has encoding issues, it's returning me result similar this:

"cuvée" 

from can tell because encoded latin-1 when should have been encoded utf-8 (please correct me if i'm wrong). i'm processing these results in python script , have been getting few encoding problems , have been unable convert it's supposed be:

"cuvée" 

i'm using python 3.3 using codecs.decode make change latin1 utf-8 i'm getting:

'str' not support buffer interface 

i think i've tried found no avail. i'm not keen on going python 2.7 because i've written rest of script on 3.3 , quite pain rewrite. there way unaware of?

yes, have called mojibake; latin-1, or windows codepage 1252 or closely related codec.

you could try encode latin-1, decode again:

faulty_text.encode('latin1').decode('utf8') 

however, sometimes, cp1252 mojibakes, faulty encoding results in text cannot legally encoded bytes, because utf-8 bytes 'decoded' forcefully though codec doesn't support bytes.

your best bet install ftfy library, can automatically fix such mojibake mistakes you. includes special codecs undo cp1252 mojibakes (as other related codepages), codecs bypass aforementioned problems.


Comments

Popular posts from this blog

get url and add instance to a model with prefilled foreign key :django admin -

css - Make div keyboard-scrollable in jQuery Mobile? -

ruby on rails - Seeing duplicate requests handled with Unicorn -