Assuming all three needed nodes (aixm:designator, aixm:type, and gml:pos) are always present, consider parsing the parent nodes, aixm:DesignatedPointTimeSlice and axim:Point and then join them. Finally, select the three final columns needed.
import pandas as pd
ab = {
'aixm':'http://www.aixm.aero/schema/5.1.1',
'adrext':'http://www.aixm.aero/schema/5.1.1/extensions/EUR/ADR',
'gml':'http://www.opengis.net/gml/3.2'
}
time_slice_df = pd.read_xml(
'file.xml', xpath=".//aixm:DesignatedPointTimeSlice", namespaces=ab
).add_prefix("time_slice_")
point_df = pd.read_xml(
'file.xml', xpath=".//aixm:Point", namespaces=ab
).add_prefix("point_")
time_slice_df = (
time_slice_df.join(point_df)
.reindex(
["time_slice_designator", "time_slice_type", "point_pos"],
axis="columns"
)
)
And in forthcoming pandas 1.5, read_xml will support iterparse allowing retrieval of descendant nodes not limited to XPath expressions:
time_slice_df = pd.read_xml(
'file.xml',
namespaces = ab,
iterparse = {"aixm:DesignatedPointTimeSlice":
["aixm:designator", "axim:type", "aixm:Point"]
}
)
Answer from Parfait on Stack OverflowBUG: iterparse on read_xml overwrites with attributes on child elements
BUG: iterparse on read_xml overwrites nested child elements
BUG: pd.read_xml does not support file like object when iterparse is used
Parsing XML into a Pandas dataframe
I am trying to parse an XML file into a Pandas DataFrame. It's a nicely formatted file that's not very deep, but whenever I work with XML it's like my brain goes blank and I never can remember all the goofy intricacies of dealing with it.
The file looks roughly like this
<?xml version="1.0" encoding="utf-8"?>
<diagnosticsLog type="db-profile" startDate="11/14/2022 23:31:12">
<!--Build 18.0.1.69-->
<columns>
<column friendlyName="time" name="time" />
<column friendlyName="Direction" name="Direction" />
<column friendlyName="SQL" name="SQL" />
<column friendlyName="ProcessID" name="ProcessID" />
<column friendlyName="ThreadID" name="ThreadID" />
<column friendlyName="TimeSpan" name="TimeSpan" />
<column friendlyName="User" name="User" />
<column friendlyName="HTTPSessionID" name="HTTPSessionID" />
<column friendlyName="HTTPForward" name="HTTPForward" />
<column friendlyName="SessionID" name="SessionID" />
<column friendlyName="SessionGUID" name="SessionGUID" />
<column friendlyName="Datasource" name="Datasource" />
<column friendlyName="Sequence" name="Sequence" />
<column friendlyName="LocalSequence" name="LocalSequence" />
<column friendlyName="Message" name="Message" />
<column friendlyName="AppPoolName" name="AppPoolName" />
</columns>
<rows>
<row>
<col name="time">11/14/2022 23:31:12</col>
<col name="TimeSpan">0 ms</col>
<col name="ThreadID">0x00000025</col>
<col name="User">USERNAME</col>
<col name="HTTPSessionID"></col>
<col name="HTTPForward">20.186.0.0</col>
<col name="SessionGUID">e4e51b-a64d-4b7b-9bfe-9612dd22b6cc</col>
<col name="SessionID">6096783</col>
<col name="Datasource">datasourceName</col>
<col name="AppPoolName">C 1801AppServer Ext</col>
<col name="Direction">Out</col>
<col name="sql">UPDATE SET </col>
<col name="Sequence">236419</col>
<col name="LocalSequence">103825</col>
</row>
<row>
<col name="time">11/14/2022 23:31:12</col>
<col name="TimeSpan">N/A</col>
<col name="ThreadID">0x00000025</col>
<col name="User">USERNAME</col>
<col name="HTTPSessionID"></col>
<col name="HTTPForward">20.186.0.0</col>
<col name="SessionGUID">e491b-a64d-4b7b-9bfe-9612dd22b6cc</col>
<col name="SessionID">6096783</col>
<col name="Datasource">datasourceName</col>
<col name="AppPoolName">C 1801AppServer Ext</col>
<col name="Direction">In</col>
<col name="sql">UPDATE SET</col>
<col name="Sequence">236420</col>
<col name="LocalSequence">103826</col>
</row>
</rows>
</diagnosticsLog>I want to convert that to the column names being the columns and each row being a row. I'm at a loss on how to do this.