Python Tip: Convert XML Tree To A Dictionary
home // page // Python Tip: Convert XML Tree To A Dictionary

Python Tip: Convert XML Tree To A Dictionary

I was doing a simple XML integration with SOAP service today and it really struck me that a lot of the data manipulation would be easier if the data was a dictionary. In addition, the XML returned was guaranteed to be fairly small and have only a handful of schemas – so a full-blown SAX parser wasn’t really necessary as there was no risk of overflowing memory with the raw XML data. So I decided to write a simple recursive algorithm to do the conversion. I’m posting it here in the hopes it saves someone else a bit of time in the future:

 

import xml.etree.ElementTree

def make_dict_from_tree(element_tree):
    """Traverse the given XML element tree to convert it into a dictionary.

    :param element_tree: An XML element tree
    :type element_tree: xml.etree.ElementTree
    :rtype: dict
    """
    def internal_iter(tree, accum):
        """Recursively iterate through the elements of the tree accumulating
        a dictionary result.

        :param tree: The XML element tree
        :type tree: xml.etree.ElementTree
        :param accum: Dictionary into which data is accumulated
        :type accum: dict
        :rtype: dict
        """
        if tree is None:
            return accum

        if tree.getchildren():
            accum[tree.tag] = {}
            for each in tree.getchildren():
                result = internal_iter(each, {})
                if each.tag in accum[tree.tag]:
                    if not isinstance(accum[tree.tag][each.tag], list):
                        accum[tree.tag][each.tag] = [
                            accum[tree.tag][each.tag]
                        ]
                    accum[tree.tag][each.tag].append(result[each.tag])
                else:
                    accum[tree.tag].update(result)
        else:
            accum[tree.tag] = tree.text

        return accum

    return internal_iter(element_tree, {})

make_dict_from_tree(xml.etree.ElementTree.fromstring(xml_string))

This seems to “Do The Right Thing” — for example, if you give it the following test data:

<DATA>
  <Items>
    <Item>
      <Name>Ha</Name>
      <Name>Hu</Name>
    </Item>
    <Item>
      <Name>Da</Name>
      <Name>Du</Name>
    </Item>
  </Items>
</DATA>

You get the following dictionary out:

{
  'DATA': {
    'Items': {
      'Item': [{'Name': ['Ha', 'Hu']}, {'Name': ['Da', 'Du']}]
    }
  }
}

 

NOTE: For the CS geeks out there, this does an post-order traversal of the XML tree. Also, this does not handle attributes.

EDIT: There’s a pretty concise answer on StackOverflow, but the results it returns are different from what I wanted.