Java XSLT Memory Leak



I ran into a strange memory (leak?) problem when I had a to write a Java command line tool that used javax.xml to transform and JAXB to analyze a XML structure and that finally stored the results in a CMS.

This tool had to parse quite a big XML file (> 300MB), perform some XSLT (using javax.xml.transform.Transformer.transform(Source xmlSource, Result outputTarget)) and then continue to work on that XML file.

Transforming the file consumed a little bit more than 1GB of RAM and even after the file was transformed, that RAM (heap space) just wasn’t freed by the garbage collector. That meant that I had to include an additional 1G of heap space just for the XSL transformation!

The tool was designed in a flexible way so that if you set the correct flag and a previous run of that tool had already created the transformed XML file, the transformation could be skipped – in this case memory was not a problem at all! But when I performed that XSLT, 1 additional GB was needed. Since the parts of the code that were executed afterwards, needed some memory, too, I needed to provide roughly 1.3 GB to the JVM so that the tool was able to finish at all!

I double-checked if all my input- and output-streams were closed properly, but that wasn’t the problem.

I debugged my code, googled, read the API, googled, decompiled (or looked into the sources) of the javax.xml code, googled again, but just couldn’t figure out why the java heap memory wasn’t freed.

Then I used profiling (the Java7 JDK comes bundled with Java Mission Control) and found out that most memory was consumed by instances of cached map entries (java.lang.ThreadLocal$ThreadLocalMap$Entry).

It looks like the javax.xml API caches the XML chunks in a hash map for performance reasons. This might seem like a nice idea, but in fact it is quite memory-consuming and doesn’t help you when you’re parsing a huge file for exactly one single time! WTF do you don’t need a cache for in this case?

Since the tool was a re-written version of an existing – very complex – tool that had lots of dependencies (1st you had to parse this, then analyze this structure, then …), it had been single-threaded. Single-threaded means only one ThreadLocal and only one ThreadLocalMap.

So the solution was to put the transformation of the one XML file to the other into its own Thread.

Here’s the class I used:


package de.ahoiit.xml;

import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;
import java.io.IOException;
import java.io.InputStream;
import java.io.OutputStream;
import java.util.concurrent.Executors;

public class ThreadedXslTransformer {
  private long sleepTime = 10000l;
  public ThreadedXslTransformer() {
  }
  public ThreadedXslTransformer(long sleepTime) throws IllegalArgumentException {
    if (sleepTime < 1l)
      throw new IllegalArgumentException(“Invalid sleep time – must be a positive integer!”);
    this.sleepTime = sleepTime;
  }
  private static class TransformerRunner implements Runnable {
    private final InputStream xslInputStream;
    private final InputStream xmlInputStream;
    private final OutputStream xmlOutputStream;
    private TransformerException transformerException;

    TransformerRunner(InputStream xslInputStream, InputStream xmlInputStream, OutputStream xmlOut) {
      this.xslInputStream = xslInputStream;
      this.xmlInputStream = xmlInputStream;
      this.xmlOutputStream = xmlOut;
    }
    @Override
    public void run() {
      TransformerFactory transformerFactory = TransformerFactory.newInstance();
      try {
        Transformer transformer = transformerFactory.newTransformer(new StreamSource(this.xslInputStream));
        transformer.transform(new StreamSource(this.xmlInputStream), new StreamResult(this.xmlOutputStream));
      } catch (TransformerException e) {
        this.transformerException = e;
      }
      try {
        this.xslInputStream.close();
      } catch (IOException e) {
        e.printStackTrace();
      }
    }
  }
  public void transform(InputStream xsltIn, InputStream xmlIn, OutputStream xmlOut) throws TransformerException {
    TransformerRunner transformerRunner = new TransformerRunner(xsltIn, xmlIn, xmlOut);
    Thread transformerThread = Executors.defaultThreadFactory().newThread(transformerRunner);
    transformerThread.start();
    while (transformerThread.isAlive()) {
      try {
        Thread.sleep(this.sleepTime);
      } catch (InterruptedException e) {
        e.printStackTrace();
      }
    }
    if (transformerRunner.transformerException != null) {
      throw transformerRunner.transformerException;
    }
  }
}


Hope this helps.