Getting Element Tree to read a SASXML file
Element Tree is a very nice and simpleĀ to use python library for loading, reading in, and writing out XML files. However there is on little trick that caught me out because I didn’t understand enough about XML and name spaces. Having got some data in sas xml format it took me a long while to work out why I couldn’t get the items tagged <Q> out…the simple problem being that I had to get out the items tagged {cansas1d/1.0}Q. The wibbly brackets give the name space for the tag which is in retrospect obvious that you would need for properly parsing and dealing with XML.
Jason Winget also has a nice example of using Element Tree up.
A little of the data file looks like this:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="cansasxml-html.xsl" ?>
<SASroot version="1.0"
xmlns="cansas1d/1.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="cansas1d/1.0 http://svn.smallangles.net/svn/canSAS/1dwg/trunk/cansas1d.xsd">
<SASentry name="Workspace_1">
<Title> GluR0 + Gln 100 % D2O_SA </Title>
<Run> 48664 </Run>
<SASdata>
<Idata><Q unit="1/A"> 0.009000 </Q><I unit="1/cm"> 0.44416E+01 </I><Idev unit="1/cm"> 0.14E+00 </Idev><Qdev unit="1/A"> 0.00E+00 </Qdev></Idata>
<Idata><Q unit="1/A"> 0.011000 </Q><I unit="1/cm"> 0.37120E+01 </I><Idev unit="1/cm"> 0.86E-01 </Idev><Qdev unit="1/A"> 0.00E+00 </Qdev></Idata>
The source for the loader looks like this. xml.etree.ElementTree is imported as ‘ET’ at the top of the whole codeset:
def loadsasxml(file):
"""Loaded for SASxml 1.0 format data.
The loader uses xml.etree.ElementTree to parse the xml file and
then searches through the file to find the {cansas1d/1.0}Q and
{cansas1d/1,0}I tags and then extract the text attribute from each
of these. The list is then converted from text to floats and the Q
and I lists passed to a new SasData object. Currently nothing else
from the sas xml folder is loaded. ElementTree is imported as 'ET'
"""
# Check that file is a sasxml file
# assert (first line of file is what it should be) is True
# Parse the xml file and find the root element
tree = ET.parse('xmltest.xml')
elem = tree.getroot()
# return a list of all the <Q> tags and get the Q values
q_tags = elem.getiterator("{cansas1d/1.0}Q")
q_list = []
for elements in q_tags:
q_list.append(float(elements.text)) # need to convert text to float
# then do the same for the <I> tags and values
i_tags = elem.getiterator("{cansas1d/1.0}I")
i_list = []
for elements in i_tags:
i_list.append(float(elements.text))
# check everything is ok with q_list and i_list
assert len(q_list) == len(i_list), 'different number of q and i values?'
assert len(q_list) != 0, 'appear to be no q values'
assert len(i_list) != 0, 'appear to be no i values'
assert q_list[0] < q_list[-1], 'q values not in order?'
# generate and return a SasData object
return ExpSasData(q_list, i_list)
leave a comment