'ascii' codec can't decode byte 0xe2 in position 50: ordinal not in range(128)


Disclaimer: I'm not a python developer :)
Yesterday I've found a tool that I needed, it was written in python.
Nothing special, get a list of links from a csv, login to a page, and access a link from the opened page by match.

The script was old(2 years), and it needed some refinement, but I managed to make it work to start griding my over 7000 links, after 30 minutes later I saw it crashed with a strange error:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 50: ordinal not in range(128)

I've started to investigate the issue, found some information on stackoverflow, but I was confused, I don't really know python that well. What I've found is to make some changes in the config files from python, which I don't really know how to do and it was not recommended, or to use decode('utf-8') function on the string variable and usage that caused the issue.

As you can see in the full error log, listed down below, I traced the error to the first call from the call stack:
File "C:\Python27\lib\re.py", line 155, in sub return _compile(pattern, flags).sub(repl, string, count)
I opened the re.py file and changed the function definition to:
return _compile(pattern, flags).sub(repl, string.decode('utf-8'), count)
It worked like a charm! no strange settings or other oddities were needed.

As I said, not a pro python programmer here, I'm sure there is a better solution, my solution probably would have implication on overall performance, cause trouble when upgrading python or I don't know, because it is a python lib file. You can use it if you need a quick and dirty fix ;)
BTW, I didn't know that it is this easy to make changes to the default libs.

Full error log:
Traceback (most recent call last):
  File "C:\Users\n_lac\Documents\python\udemy coupon.py", line 40, in
    course_page = br.open(course_links)
  File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 254, in open
    return self._mech_open(url_or_request, data, timeout=timeout)
  File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 284, in _mech_open
    response = UserAgentBase.open(self, request, data)
  File "C:\Python27\lib\site-packages\mechanize\_opener.py", line 206, in open
    response = meth(req, response)
  File "C:\Python27\lib\site-packages\mechanize\_urllib2_fork.py", line 467, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python27\lib\site-packages\mechanize\_opener.py", line 224, in error
    result = apply(self._call_chain, args)
  File "C:\Python27\lib\site-packages\mechanize\_urllib2_fork.py", line 340, in _call_chain
    result = func(*args)
  File "C:\Python27\lib\site-packages\mechanize\_urllib2_fork.py", line 586, in http_error_302
    return self.parent.open(new)
  File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 254, in open
    return self._mech_open(url_or_request, data, timeout=timeout)
  File "C:\Python27\lib\site-packages\mechanize\_mechanize.py", line 284, in _mech_open
    response = UserAgentBase.open(self, request, data)
  File "C:\Python27\lib\site-packages\mechanize\_opener.py", line 206, in open
    response = meth(req, response)
  File "C:\Python27\lib\site-packages\mechanize\_http.py", line 134, in http_response
    self.head_parser_class())
  File "C:\Python27\lib\site-packages\mechanize\_http.py", line 100, in parse_head
    parser.feed(data)
  File "C:\Python27\lib\HTMLParser.py", line 117, in feed
    self.goahead(0)
  File "C:\Python27\lib\HTMLParser.py", line 161, in goahead
    k = self.parse_starttag(i)
  File "C:\Python27\lib\HTMLParser.py", line 308, in parse_starttag
    attrvalue = self.unescape(attrvalue)
  File "C:\Python27\lib\HTMLParser.py", line 475, in unescape
    return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));", replaceEntities, s)
  File "C:\Python27\lib\re.py", line 155, in sub
    return _compile(pattern, flags).sub(repl, string, count)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 50: ordinal not in range(128)

Comments

Popular posts from this blog

Sanitizer provider is not configured in the web.config file. Ajax Control Toolkit and HtmlEditorExtender problems.

DataTable to TreeView in C#, Displaying Hierarchies