Tuesday 18 June 2013

Loading XML files using Fastload - Using URL to access front-end data and moving data to Teradata

1. A standard xml format

Let us assume that below data is stored in Test.xml:

 <Response>
                <ResponseID>R_3JHyAiOCTXKq</ResponseID>
                <ResponseSet>Default Response Set</ResponseSet>
                <Name>Anonymous</Name>
                <ExternalDataReference></ExternalDataReference>
                <EmailAddress></EmailAddress>
                <IPAddress>96.224.174.233</IPAddress>
                <Status>0</Status>
                <StartDate>2013-01-11 13:55:53</StartDate>
                <EndDate>2013-01-11 13:56:18</EndDate>
                <Finished>1</Finished>
                <Flag></Flag>
 </Response>

So, anything that comes between <Response> and </Response> is our data set.
Each field data is represented within its corresponding tags.

For example : <ResponseID>R_3JHyAiOCTXKq</ResponseID>

So, now to parse this xml, we will have to write the corresponding xslt file

Let us now write the corresponding xslt file called Test.xslt

2. The sample xslt file for the above xml data will be as follows:

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">

<xsl:for-each select="xml/Response">
<xsl:value-of select="ResponseID"/>|<xsl:value-of select="ResponseSet"/>|<xsl:value-of select="Name"/>|<xsl:value-of select="ExternalDataReference"/>|<xsl:value-of select="EmailAddress"/>|<xsl:value-of select="IPAddress"/>|<xsl:value-of select="Status"/>|<xsl:value-of select="StartDate"/>|<xsl:value-of select="EndDate"/>|<xsl:value-of select="Finished"/>|<xsl:value-of select="Flag"/>|
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

3. Extract the xml data into a flat file

xsltproc is the command line processor for xml

-o = redirect to output file. Otherwise, data will be redirected to standard output.
Place the Test.xml , Test.xslt in the same directory

Let us now run the below command:

xsltproc Test.xslt Test.xml -o Test.txt

The file Test.txt will contain "|" (pipe delimited) data parsed from the Test.xml file based on the Test.xslt layout file.

Testing row counts:

Data2=`wc -l Test.txt|  cut -d ' ' -f1`

Data1=`grep -i "</Response>" Test.xml | wc -l`

If $Data1 == $Data2, then all the rows were successfully parsed and record count in Test.txt and Test.xml match.




Let us know if this information helped. You could donate for a noble cause.

Also, help us understand what topics would you like covered in the future blogs.

4 comments:

  1. Where do i run this command
    xsltproc Test.xslt Test.xml -o Test.txt

    Did try in CMD did not work

    ReplyDelete
    Replies
    1. Check if xsltproc is installed on the linux box.
      Simply typing :
      $ xsltproc -version

      if this is installed on the server, it should work.

      Delete
  2. Thanks for the reply ...I am using windows or can have cygwin

    ReplyDelete
    Replies
    1. http://www.sagehill.net/docbookxsl/InstallingAProcessor.html

      This post sums up everything about xslt processor nicely.

      Delete

Please share your thoughts and let us know the topics you want covered