Getting Element Tree to read a SASXML file

· data
Authors

Element Tree is a very nice and simple  to use python library for loading, reading in, and writing out XML files. However there is on little trick that caught me out because I didn’t understand enough about XML and name spaces. Having got some data in sas xml format it took me a long while to work out why I couldn’t get the items tagged <Q> out…the simple problem being that I had to get out the items tagged {cansas1d/1.0}Q. The wibbly brackets give the name space for the tag which is in retrospect obvious that you would need for properly parsing and dealing with XML.

Jason Winget also has a nice example of using Element Tree up.

A little of the data file looks like this:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="cansasxml-html.xsl" ?>
 <SASroot version="1.0"
          xmlns="cansas1d/1.0"
          xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
          xsi:schemaLocation="cansas1d/1.0 http://svn.smallangles.net/svn/canSAS/1dwg/trunk/cansas1d.xsd">
 <SASentry name="Workspace_1">
 <Title> GluR0 + Gln 100 % D2O_SA </Title>
 <Run> 48664 </Run>
 <SASdata>
  <Idata><Q unit="1/A"> 0.009000 </Q><I unit="1/cm"> 0.44416E+01 </I><Idev unit="1/cm"> 0.14E+00 </Idev><Qdev unit="1/A"> 0.00E+00 </Qdev></Idata>
  <Idata><Q unit="1/A"> 0.011000 </Q><I unit="1/cm"> 0.37120E+01 </I><Idev unit="1/cm"> 0.86E-01 </Idev><Qdev unit="1/A"> 0.00E+00 </Qdev></Idata>

The source for the loader looks like this. xml.etree.ElementTree is imported as ‘ET’ at the top of the whole codeset:

def loadsasxml(file):
“””Loaded for SASxml 1.0 format data.

The loader uses xml.etree.ElementTree to parse the xml file and
then searches through the file to find the {cansas1d/1.0}Q and
{cansas1d/1,0}I tags and then extract the text attribute from each
of these. The list is then converted from text to floats and the Q
and I lists passed to a new SasData object. Currently nothing else
from the sas xml folder is loaded. ElementTree is imported as ‘ET’
“””

# Check that file is a sasxml file
# assert (first line of file is what it should be) is True

# Parse the xml file and find the root element
tree = ET.parse(‘xmltest.xml’)
elem = tree.getroot()

# return a list of all the tags and get the Q values
q_tags = elem.getiterator(“{cansas1d/1.0}Q”)

q_list = []

for elements in q_tags:
q_list.append(float(elements.text)) # need to convert text to float

# then do the same for the tags and values
i_tags = elem.getiterator(“{cansas1d/1.0}I”)
i_list = []

for elements in i_tags:
i_list.append(float(elements.text))

# check everything is ok with q_list and i_list
assert len(q_list) == len(i_list), ‘different number of q and i values?’
assert len(q_list) != 0, ‘appear to be no q values’
assert len(i_list) != 0, ‘appear to be no i values’
assert q_list[0] < q_list[-1], 'q values not in order?' # generate and return a SasData object return ExpSasData(q_list, i_list) [/sourcecode]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s