Python plays with unicode nicely nowadays. Bt what if you must deal with old time formats conversion, or ASCII files exporting for e.g. You may also use software that is out of date but is too long to rewrite... Here often errors occur. I have received mine at copy-pasting from MS Word into Django admin UI by stupid users. Most of the website played nicely with this fancy characters, but exporting to CSV failed due to non ASCII characters support. Google said nothing special. Python docs about unicode usage briefly cover this type of events. So here is the result of some hours of experiments. I've decided to rewrite some of the python functionality to create decode function with behavior for my needs. Hopefully they will shorten you some time with those collisions you may get in your Django apps...
Anyway I've started to receive errors like:
Hope this will help you to save some precious time during your python development.
Helped? I'm wrong somewhere? Please comment!
Anyway I've started to receive errors like:
So I had a list of u'' values that contained special characters " ordinal not in range(128) ". Requires no imports... Pure python:Exception Type: UnicodeEncodeError Exception Value: 'ascii' codec can't encode character u'\u2013' in position 17: ordinal not in range(128)
This code is a bit complicated due to mine specific task and has iterations in iterations etc... But it's from a working app and checked working. However here is the theoretical example that must clean up a single string:values == [ u'Some fancy text \u2013 something', u'some normal, easy convertible text', u'some more normal text' ] HACK: entry cleanup for special characters (Fixing Bug #...) # entry cleanup for special characters i = 0 for value in values: try: # if string can be encoded to 'ascii' pass unicode(value).encode('ascii') except UnicideEncodeError: val_temp = unicode(value) # cleaning up string with escaping non convertible characters result = [] for symbol in val_temp: try: symbol.encode('ascii') result.append(symbol) except UnicodeEncodeError: pass # rewriting wrong value in values array val_temp = ''.join(result) values[i] = val_temp pass i = i+1 # normally work with our list... it's safe now... values == [ u'Some fancy text something', u'some normal, easy convertible text', u'some more normal text' ]
So the technique here is simple. We are checking if this unicode string can be converted to 'ascii' python encoding without errors we simply passing through. And if it's not... Converting it to 'ascii' string symbol by symbol. Symbols that will fail will be gracefully omitted. You can create a function from all of this, like 'my_decode_cleanup' or something and use whenever needed...value = u'Some fancy text \u2013 something' try: # if string can be encoded to 'ascii' pass value.encode('ascii') except: # cleaning up string with escaping non convertible characters result = [] for symbol in val_temp: try: symbol.encode('ascii') result.append(symbol) except UnicodeEncodeError: pass # rewriting our variable with safe one value = ''.join(result) pass # normally work with our unicode string... it's safe now... value = u'Some fancy text something
Hope this will help you to save some precious time during your python development.
Helped? I'm wrong somewhere? Please comment!











