Python Tip: Convert XML Tree To A Dictionary
I was doing a simple XML integration with SOAP service today and it really struck me that a lot of the data manipulation would be easier if the data was a dictionary. In addition, the XML returned was guaranteed to be fairly small and have only a handful of schemas – so a full-blown SAX parser wasn’t really necessary as there was no risk of overflowing memory with the raw XML data. So I decided to write a simple recursive algorithm to do the conversion. I’m posting it here in the hopes it saves someone else a bit of time in the future:
import xml.etree.ElementTree def make_dict_from_tree(element_tree): """Traverse the given XML element tree to convert it into a dictionary. :param element_tree: An XML element tree :type element_tree: xml.etree.ElementTree :rtype: dict """ def internal_iter(tree, accum): """Recursively iterate through the elements of the tree accumulating a dictionary result. :param tree: The XML element tree :type tree: xml.etree.ElementTree :param accum: Dictionary into which data is accumulated :type accum: dict :rtype: dict """ if tree is None: return accum if tree.getchildren(): accum[tree.tag] = {} for each in tree.getchildren(): result = internal_iter(each, {}) if each.tag in accum[tree.tag]: if not isinstance(accum[tree.tag][each.tag], list): accum[tree.tag][each.tag] = [ accum[tree.tag][each.tag] ] accum[tree.tag][each.tag].append(result[each.tag]) else: accum[tree.tag].update(result) else: accum[tree.tag] = tree.text return accum return internal_iter(element_tree, {}) make_dict_from_tree(xml.etree.ElementTree.fromstring(xml_string))
This seems to “Do The Right Thing” — for example, if you give it the following test data:
<DATA> <Items> <Item> <Name>Ha</Name> <Name>Hu</Name> </Item> <Item> <Name>Da</Name> <Name>Du</Name> </Item> </Items> </DATA>
You get the following dictionary out:
{ 'DATA': { 'Items': { 'Item': [{'Name': ['Ha', 'Hu']}, {'Name': ['Da', 'Du']}] } } }
NOTE: For the CS geeks out there, this does an post-order traversal of the XML tree. Also, this does not handle attributes.
EDIT: There’s a pretty concise answer on StackOverflow, but the results it returns are different from what I wanted.