Python Tip: Convert XML Tree To A Dictionary
I was doing a simple XML integration with SOAP service today and it really struck me that a lot of the data manipulation would be easier if the data was a dictionary. In addition, the XML returned was guaranteed to be fairly small and have only a handful of schemas – so a full-blown SAX parser wasn’t really necessary as there was no risk of overflowing memory with the raw XML data. So I decided to write a simple recursive algorithm to do the conversion. I’m posting it here in the hopes it saves someone else a bit of time in the future:
import xml.etree.ElementTree
def make_dict_from_tree(element_tree):
"""Traverse the given XML element tree to convert it into a dictionary.
:param element_tree: An XML element tree
:type element_tree: xml.etree.ElementTree
:rtype: dict
"""
def internal_iter(tree, accum):
"""Recursively iterate through the elements of the tree accumulating
a dictionary result.
:param tree: The XML element tree
:type tree: xml.etree.ElementTree
:param accum: Dictionary into which data is accumulated
:type accum: dict
:rtype: dict
"""
if tree is None:
return accum
if tree.getchildren():
accum[tree.tag] = {}
for each in tree.getchildren():
result = internal_iter(each, {})
if each.tag in accum[tree.tag]:
if not isinstance(accum[tree.tag][each.tag], list):
accum[tree.tag][each.tag] = [
accum[tree.tag][each.tag]
]
accum[tree.tag][each.tag].append(result[each.tag])
else:
accum[tree.tag].update(result)
else:
accum[tree.tag] = tree.text
return accum
return internal_iter(element_tree, {})
make_dict_from_tree(xml.etree.ElementTree.fromstring(xml_string))
This seems to “Do The Right Thing” — for example, if you give it the following test data:
<DATA>
<Items>
<Item>
<Name>Ha</Name>
<Name>Hu</Name>
</Item>
<Item>
<Name>Da</Name>
<Name>Du</Name>
</Item>
</Items>
</DATA>
You get the following dictionary out:
{
'DATA': {
'Items': {
'Item': [{'Name': ['Ha', 'Hu']}, {'Name': ['Da', 'Du']}]
}
}
}
NOTE: For the CS geeks out there, this does an post-order traversal of the XML tree. Also, this does not handle attributes.
EDIT: There’s a pretty concise answer on StackOverflow, but the results it returns are different from what I wanted.