Parsing XML files in Java using DOM library

by Ismail Sirma


Posted on 22.6.2015 10:56:36


Let’s try to read the content of an xml file which is an RSS feed of a website. Our aim is to get the new topics tagged as “title”. Reading through the content of XML file can be handled with:

  • creating DocumentBuilderFactory object using DocumentBuilderFactory classes’ newInstance() method

  • creating DocumentBuilder object using newDocumentBuilder() method

  • creating a Document object using parse() method with the builder to parse the document

  • creating a NodeList object which includes the list of all values tagged with “title” value usinggetElementsByTagName() method


Please note that NodeList, Element and Document objects are from w3c’s DOM library. Also, we may encounter with different three exceptions running the code below:

  • ParserConfigurationException

  • SAXException

  • IOException


import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import java.io.IOException;

/**
* Created by ismailsirma on 22.6.2015.
*/
public class XmlParser {

public static void main(String[] args){

try {
// create an instance of DocumentBuilderFactory class
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();

// create an instance of Document Builder class
DocumentBuilder builder = factory.newDocumentBuilder();

// create a document object (w3c) by calling documentBuilder classes' parse method
// parse method downloads the file and parse the document and turn the document into an object
Document doc = builder.parse("http://www.theverge.com/tech/rss/index.xml");

// Create an object of NodeList, fill the list with the values in specific tag
NodeList list = doc.getElementsByTagName("title");

// Show all the items in the tag
System.out.println("There are " + list.getLength() + " items.");

// Loop through nodelist contents
for (int i = 0;i < list.getLength(); i++){
// retrieve the item in the list using Element which is a Class in w3c DOM interface.
Element item = (Element) list.item(i);
// To get the First child of the element value; getFirstChild method can be used
// getNodeValue method returns the string value of the item
System.out.println(item.getFirstChild().getNodeValue());
}

} catch (ParserConfigurationException e){
e.printStackTrace();
} catch (SAXException e){
e.printStackTrace();
} catch (IOException e){
e.printStackTrace();
}
}
}

Blog Search

Message

Welcome to Blog of Ismail Sirma.

Back to List