On Errors in Repeated Functions
Recently I found myself parsing several similar XML files in Python. The XML
had a deeply nested structure I wanted to get stuff out of, which means using
Python's xml.etree.ElementTree
's find
and findall
methods:
for entry in root.find('this').find('this2').findall('entry'):
first_thing = entry.text
for next_entry in entry.find('this3').findall('next_entry'):
thing_i_want = next_entry.find('thing_i_want').text
other_thing = next_entry.find('junk').find('other_thing').text
With this bit of code, I want to grab a list of the text in a XML tree. It's
readable, succinct, and does no error handling. Each of those find
calls can
return None
if their element isn't found, and because I want to parse
multiple files I can't have that. Fortunately, for the most part, I want to
handle the errors in only two ways: assign None
to the specific thing or quit
the whole parse and try again with the next one (whether that's at the file
level or at the XML branch level). The other problem is how to repeat this
error handling non-awkwardly.
One way to do handle the errors is with a try/except
block:
for entry in root.find('this').find('this2').findall('entry'):
try:
first_thing = entry.text
for next_entry in entry.find('this3').findall('next_entry'):
thing_i_want = next_entry.find('thing_i_want').text
other_thing = next_entry.find('junk').find('other_thing').text
except AttributeError:
first_thing = None
thing_i_want = None
other_thing = None
# or `continue` if the rest of the info isn't worth processing on error
The problem with this construct in this situation is that it's not granular
enough. I'd like to get the first_thing
if possible and assign None
to the
others if I can't get them. I could you multiple try/except
blocks, but it
quickly becomes very unreadable.
Another way of doing it is to test everything before we find it:
this = root.find('this')
if this is not None:
this2 = this.find('this2')
if this2 is not None:
# iteration over None just skips the loop, so no need to check here
for entry in this2.findall('entry')
...
The problems with this is obvious. Look at that nesting! I'd be halfway across the screen before I got anything done! However, this pattern can be abstracted into a function.
def find_or_none(node, taglist):
for tag in taglist:
node = node.find(tag)
if node is None:
return node
return node
This compressed error handling could also be implemented using exceptions. With this punted error handling, the code becomes:
this2 = find_or_none(root, ['this', 'this2'])
if this2 is not None:
for entry in this2.findall('entry')
first_thing = entry.text
next_entries = entry.find('this3')
if next_entries is not None:
for ...
This works, but there's one way I think it can be improved. Instead of checking
if something is not None
, check if it is None
and break
, continue
, or
return
the heck away from the error.
With this change, I get fairly clean error handling:
this2 = find_or_none(root, 'this', 'this2')
if this2 is None:
return None
for entry in this2.findall('entry'):
first_thing = entry.text
this3 = entry.find('this3')
if this3 is None:
continue # or `break` or `return` if that's appropriate
for next_entry in this3.findall('next_entry'):
thing_i_want = next_entry.find('thing_i_want')
if thing_i_want is not None:
thing_i_want = thing_i_want.text
other_thing = find_or_none(next_entry, ['junk', 'other_thing'])
if other_thing is not None:
other_thing = other_thing.text
It's not pretty, but it works. Other techniques in other languages for this kind of thing are C#'s null-conditional operator and Functional Programming's monadic error handling. There's also an excellent video including a functional approach to this this (and other) problems. Rust also has a monadic approach to error handling. After I read/watch this stuff again, I'll probably be ashamed of this post, but until then, it's up :)
Update: I still find this useful for some things, but XML handling is best handled by XSLT! Use XSLT!