XML DOM parsing in Groovy

These days I had to modify several XML files at once, I took this chance to use Groovy and to learn something new.  I put an example project on GitHub.

Gradle project creation

I decided to create a Gradle project, NetBeans supports it via an external plugin, but I created it using the command line

mkdir groovyDom
cd groovyDom
gradle init --type groovy-application
gradle build
gradle run

with these commands, a project is created and run, the first time it takes some time because it has to download Gradle.

Spok test framework

I used an XML file from Microsoft to create my examples, they are written as tests using the Spok framework. The Spok framework is the way you write test in Groovy, I found it very straightforward to learn and very expressive

A test class called AppTest was created automatically by Gradle; inside you can define methods like:

def "File exists"() {
setup: // Initialization
ClassLoader classLoader = getClass().getClassLoader();

when: // Do something
// The file is inside /src/test/resources
File file = new File(classLoader.getResource("test.xml").getFile());

then: // Condition
file.isFile()
}

after the then: keyword is possible to combine more conditions using the and: keyword. I used this feature a lot to test multiple conditions.

There is also a dedicated folder for test resources, src/test/resources, I put my XML file there so I could load it easily.

 

Navigating the DOM

Back to the DOM, loading the model requires fewer lines of code thanks to Groovy:

def document
file.withReader{ reader ->
    document = DOMBuilder.newInstance().parse(reader) 
}

Groovy re-uses the Java APIs to parse the dom, but enhanced with the Pimp My Library pattern. This pattern allows to add methods to classes without extending them, as far as I remember is available also in Scala and C#.

I searched nodes and attributes using xpath, this is not a feature of the normal Java API but it was injected by Groovy using the aforementioned Pimp My Library pattern. The code must be enclosed in a use section like:

use (DOMCategory){ //Pimp my library pattern
 // I can search with the xpath syntax
 computerBooks = document.xpath("/catalog/book[genre='Computer']", NODESET)
 
 titles = computerBooks.collect{ book ->
 
 book.title.text()
 
 } as ArrayList
 
 } 

The NODESET parameter means that the result will be a collection of DOM nodes, we can create another collection from this one using the standard Groovy syntax and extract the text content of the node with the text() method.

One of the nicest aspects of Groovy is that it can create methods and fields on the fly, in this case, if we have an element called “book” and a sub-element called “title” we can access the content of the title like book.title.text() .

Without using xpath, I could have just navigated through the nodes from the root element, that is retrieved as:

def catalog = document.documentElement

Reading and writing attributes

There is an easy way to read attributes values using a syntax like computerBooks.item(0).’@id’, in this case, I extract the Id attribute from the first element of computerBooks.

I found no way to set an attribute but using the original Java method, setAttribute().

    computerBooks.item(0).'@id'

Header Designed by Freepik

Leave a Comment

Your email address will not be published. Required fields are marked *