Recommended sites
Web Hosting Reviews

XML processing with PHP

XML processing with PHP

Introduction
XML is Extensible Markup Language. It is a markup language - much like HTML - and was designed to describe data. XML uses tags but these tags are not predefined as in HTML. In XML you define your own tags. All XML documents have 3 types of components: Elements, attributes and data.

Let's see a basic XML file:

<root>
  <child attribute='test'>
    <subchild attribute='sub'>testdata</subchild>
  </child>
</root>
   
Interpretation
The first line defines the root element of the XML. It is the <root>. The root element can contain any number of child elements. All of them can be found between the opening <root> and closing </root> tag. In our example the file contains a child element. The child element has and that in turn can have its own and so on. There is no restriction as to how many child element is defined. All is up to you.
In the second line there is a new element which is a child of the root element. In this case it has an attribute which you can interpret as a property of the given element. In this example the property name is attribute and the property values is 'test'. This element has again a child (line 3) - which is a subchild of the root - element with an attribute. Besides this it has a data with the value of 'testdata'.

As you can see from this small example:
    1. Each XML element has an opening and a closing tag.
    2. The attributes is defined inside the opening tag.
    3. The data is surrounded by opening and closing tags.

[newpage=Part 2]
PHP and XML
PHP has its own built in functions for XML handling. To process an XML file first of all you need an XML parser. You can create it with the following code:

  $parser = xml_parser_create();


This code creates an empty XML parser object which will process our XML file later. The parser requires the XML document as a string so we read the file as follows:

  $document = file_get_contents("test.xml");


Now we have an XML document in string format and an XML parser. Let's process the file.

  xml_parse($parser, $document);


After processing we free up the resources:

  xml_parser_free($parser);


The complete code looks like this at the moment:

<?php
   $parser 
= xml_parser_create();
   
$document = file_get_contents("test.xml");
   
xml_parse($parser, $document); 
   
xml_parser_free($parser);
?>
  

And our test XML file (test.xml), which is located in the same directory as our PHP script:

<carlist>
  <car type="Mercedes">S 600</car>
  <car type="Mercedes">E 270 CDI</car>
  <car type="BMW">535 D</car>
  <car type="Lexus">IS 220</car>
</carlist>
  

Try to execute it!
What happened?
Nothing.
It is normal as we didn't define what to do with the results and how the result should look like. Then for example how can I list all Mercedeses? [newpage=Part 3]
XML processing refinement
To get a list of Mercedes we need to write some additional code. First make some refinement how the parser should process the XML you can set some property with the xml_parser_set_option() function
In our example I set the case folding property to false, which generally means that no tag name will convert to uppercase.

xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, false);


XML element handlers
Besides this we have to tell the parser what to do if an opening or closing tag is found. To do this I wrote two functions to handle these events. Both have a well defined attribute list.

function openElement($parser, $element, $attributes)
function 
closeElement($parser, $element) 


The first parameter in both cases is the parser object. The second is the element name. The openElement has a third parameter the attributes which is an array representation of the actual element attribute names and values.

You can assign these function to the parser as follows:

xml_set_element_handler
($parser, "openElement", "closeElement");


Take care that the functions defined above must have the same names as the parameters of the xml_set_element_handler. As result if the parser finds an open element than it will call the openElement function. Of course it calls closeElement if a closing tag was found.
[newpage=Part 4]
XML data handler
One more thing is missing. A handler function to process the XML data values. For that I implemented a new function:

function characterData($parser, $data);


The first parameter is already known. The second contains the actual data value. To make the assignment you should use the following statement:

xml_set_character_data_handler
($parser, "characterData");


Now the full code looks like this:

<?php

   
function openElement($parser, $element, $attributes) {
   }

   function 
closeElement($parser, $element) {
   }

   function 
characterData($parser, $data) {
   }

   
$parser = xml_parser_create();

   
xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, false);
   
xml_set_element_handler($parser, "openElement", "closeElement");
   
xml_set_character_data_handler($parser, "characterData");

   
$document = file_get_contents("test.xml");
   
xml_parse($parser, $document);

   
xml_parser_free($parser);

?>   



Of course this code will result again is an empty page. Just to have some output and demonstrate how the functions are called I added some code in the handler functions.

The new code is:

<?php

   
function openElement($parser, $element, $attributes) {
      foreach (
$attributes as $key => $value) $attr .= $key." : ".$value." - ";
      echo 
"-> openElement element: $element, attribute: $attributes ($attr) <br/>";
   }

   function 
closeElement($parser, $element) {
      echo 
"-> closeElement element is: $element<br/>";
   }

   function 
characterData($parser, $data) {
      echo 
"-> characterData data is: [$data]<br/>";
   }

   
$parser = xml_parser_create();

   
xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, false);
   
xml_set_element_handler($parser, "openElement", "closeElement");
   
xml_set_character_data_handler($parser, "characterData");

   
$document = file_get_contents("test.xml");
   
xml_parse($parser, $document);

   
xml_parser_free($parser);

?>


[newpage=Part 5] And the output is:

-> openElement element: carlist, attribute: Array () 
-> characterData data is: [ ]
-> characterData data is: [ ]
-> openElement element: car, attribute: Array (type : Mercedes - ) 
-> characterData data is: [S 600]
-> closeElement element is: car
-> characterData data is: [ ]
-> characterData data is: [ ]
-> openElement element: car, attribute: Array (type : Mercedes - ) 
-> characterData data is: [E 270 CDI]
-> closeElement element is: car
-> characterData data is: [ ]
-> characterData data is: [ ]
-> openElement element: car, attribute: Array (type : BMW - ) 
-> characterData data is: [535 D]
-> closeElement element is: car
-> characterData data is: [ ]
-> characterData data is: [ ]
-> openElement element: car, attribute: Array (type : Lexus - ) 
-> characterData data is: [IS 220]
-> closeElement element is: car
-> characterData data is: [ ]
-> closeElement element is: carlist
   

In the output you can find the root element <carlist>, the four child elements <car> the attribute list and the data values.
The only interesting thing is why we have so many time characterData with an empty string?
To understand this you should open the XML file and displays the hidden characters. After it you can see that there is a new line character after a <carlist> element and it is handled as a data value. After it in the second line there are some spaces before the <car> element and it results again a characterData call.
Try to remove all hidden character from the test.xml.

The result will be the following:

-> openElement element: carlist, attribute: Array () 
-> openElement element: car, attribute: Array (type : Mercedes - ) 
-> characterData data is: [S 600]
-> closeElement element is: car
-> openElement element: car, attribute: Array (type : Mercedes - ) 
-> characterData data is: [E 270 CDI]
-> closeElement element is: car
-> openElement element: car, attribute: Array (type : BMW - ) 
-> characterData data is: [535 D]
-> closeElement element is: car
-> openElement element: car, attribute: Array (type : Lexus - ) 
-> characterData data is: [IS 220]
-> closeElement element is: car
-> closeElement element is: carlist
   

Now it is much more clear. Of course you don't have to remove these characters from the XML. I did it just to demonstrate how the processing works. [newpage=Part 6]
Last steps
We are almost ready. Some further modification and we have the list of Mercedeses. To do that we have to change the handler functions code as follows:
  
function openElement($parser, $element, $attributes) {
     global 
$flag;
     if ((
$element == 'car') && ($attributes['type'] == 'Mercedes')) $flag = true; 
}

function 
closeElement($parser, $element) {
     global 
$flag;
     
$flag = false;
}

function 
characterData($parser, $data) {
    global 
$flag,$mblist;
    if (
$flag)  $mblist[] = $data;
}

  

The $flag and $type are defined outside of the functions and used inside as global variables.
    - The openElement function sets the flag to true if a car with the type of Mercedes was found.
    - The closeElement function resets the flag, to represent we are using not more the relevant element.
    - The characterData function checks the flag and if it is set - means that the actual element is a car and the attribute type is Mercedes - than add the data value to the type array.
At the end we have an array with the requested list. Just display it. The complete code looks like this:

  <?php
  
    $mblist  
= '';
    
$flag    = false;

    function 
openElement($parser, $element, $attributes) {
         global 
$flag;
         if ((
$element == 'car') && ($attributes['type'] == 'Mercedes')) $flag = true; 
    }

    function 
closeElement($parser, $element) {
         global 
$flag;
         
$flag = false;
    }

    function 
characterData($parser, $data) {
        global 
$flag,$mblist;
        if (
$flag)  $mblist[] = $data;
    }

    
$parser = xml_parser_create();

    
xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, false);
    
xml_set_element_handler($parser, "openElement", "closeElement");
    
xml_set_character_data_handler($parser, "characterData");

    
$document = file_get_contents("test.xml");
    
xml_parse($parser, $document);

    
xml_parser_free($parser);

    foreach (
$mblist as $value) {
        echo 
$value.'<br/>';
    }

?>

And the output is:
  
S 600
E 270 CDI
  
Final words
I hope that this small tutorial helps you to understand the basics of PHP XML processing. If everything is clear then it will not cause any further problem to extend the script and process your own XML file.

Thanks for your time and attention!

Sponsored links