OSGi at LinkedIn: Java Compilation in OSGi

In this post I will describe how I was able to make LinkedIn's JSP compiler work within an OSGi container.

I guess the first question I need to answer is why on earth does LinkedIn has its own JSP compiler? The answer is partly for historical reasons and partly for feature reasons. The JSP compiler that we have (I am the author of it) has enhanced the JSP standard in the following manner:

  • Escaping HTML (ex: '<' gets turned into '&lt;') is on by default so that web developers don't even have to think about it (this is actually a pretty big deal because escaping HTML is what can help protect your web site against XSS attacks).
  • Expression language (EL) is more powerful and allow you to pass arguments to method calls.
  • JSP files can be located anywhere: inside the war, on the file system, inside a local jar file, inside a remote jar file or a combination of all those. This is quite nice in development because JSPs can be located in the source tree (instead of packaged in a WAR) and developers can be much more productive.
  • The compiler uses strong typing and does not rely on reflection at runtime (only at compile time). Depending on how you look at it, this could be considered as a downgrade instead of an enhancement :).
  • It handles non-web oriented content. For example, we use the JSP compiler to generate the emails that LinkedIn sends.
  • It has a built-in extensible pipeline process which allows us to plugin any kind of processing before compiling the JSP. For example, we recently added an i18n processing which allows us to inject all localized strings directly as local variables inside the compiled code as opposed to doing runtime lookup into resource bundles.

The second question that comes to mind, is why is it challenging to have the JSP compiler running within OSGi? To answer this question there are 2 things to understand:

  • The first is the steps involved in the JSP processing:
    1. Parse JSP file (lexical analysis, grammar analysis using JavaCC/JJTree)
    2. Generate Java source code (a servlet)
    3. Compile generated Java source code into byte code
    4. Load compiled code
    5. Execute code
  • The second is how OSGi handles class loading: to achieve dependency management at the package level and have the same class with multiple versions and dynamic reloading of classes, OSGi needs to control the class loading pretty tightly. In other words the concept of "classpath" in an OSGi container becomes pretty much nonsensical.

This is where there is a strong disconnect: step 3 needs to compile Java source code into byte code. If you are familiar with the standard Java compiler provided with the JDK (javac), you know that what you need to provide for the compilation is a classpath (with a lot of limitations of what the classpath can really be (it cannot be any arbitrary urls, etc.)).

For efficiency reasons, every time we invoke the Java compiler, we do not want to fork a new external process to invoke javac. Unfortunately with Java 5 there is no standard way of doing this. There is a non supported way of doing it, which is what we use:

String[] options = {
"-source", "1.5",
// coming from ServletContext.getAttribute("org.apache.catalina.jsp_classpath")
"-classpath", _classPath,
"-d", _baseClassDir.getPath()
}

StringWriter sw = new StringWriter();
PrintWriter out = new PrintWriter(sw);

if(com.sun.tools.javac.Main.compile(params, out) != 0)
{
out.close();
throw new CompileException(sw.toString());
}

Java 6 offers a standard way of invoking the compiler from within Java itself by using the javax.tools functionalities. I was very excited when I saw this especially the fact that there is a JavaFileManager.getClassLoader(location) method. Since OSGi has the concept of Bundle which offers an API very similar to a ClassLoader, it is fairly easy to write an adapter:

public class BundleClassLoader extends ClassLoader
{
private final Bundle _bundle;
public BundleClassLoader(Bundle bundle)
{
_bundle = bundle;
}

public BundleClassLoader(ClassLoader parent, Bundle bundle)
{
super(parent);
_bundle = bundle;
}

protected Class<?> loadClass(String name) throws ClassNotFoundException
{
return _bundle.loadClass(name);
}

protected URL findResource(String name)
{
return _bundle.getResource(name);
}

protected Enumeration<URL> findResources(String name) throws IOException
{
return _bundle.getResources(name);
}
}

Unfortunately my excitement got shattered when I realized that the ClassLoader was not used during the compilation, but only, as stated in the JavaDoc "for loading plug-ins (ex: annotation processors) from the given location". I really thought for a minute that you could use the Classloader instead of the classpath. It would have been too nice. The only method that looked potentially promising was:

public Iterable<JavaFileObject> list(Location location,
String s,
Set<JavaFileObject.Kind> kinds,
boolean b) throws IOException;

This method gets called for every single package that is declared in the source code and is expecting in return a list of all the classes that the package contains. Unfortunately, it is impossible to get this from a Bundle. There may be convoluted ways to get to it using the PackageAdmin service but it was starting to get very hairy and seemed like a lot of work.

I then switched my focus away from the JDK as I was not getting anywhere and decided to explore the JDT compiler (org.eclipse.jdt.internal.compiler.Compiler). After all, Eclipse is built on top of OSGi so there had to be a way to compile Java code with the compiler. Thankfully I found the source code for Jasper, the Apache implementation of the standard JSP compiler and this is exactly what is being used. If you look at the

org.apache.jasper.compiler.JDTCompiler

class, you can see a very good example of how to use the Eclipse compiler (and trust me you need an example... as it is over 300 lines of code to invoke the compiler!). Using this example, I was able to implement the compilation and get everything working. The big advantage of this solution is that, unlike javac which expects the content of a package, you only need to locate a class which is totally possible with a ClassLoader. Below is my INameEnvironment implementation:

private NameEnvironmentAnswer findType(String className)
{
// note that try/finally error handling has been removed for brevity of the example...
String resourceName = className.replace('.', '/') + ".class";
InputStream is = _classLoader.getResourceAsStream(resourceName);
if(is != null)
{
// read bytes from input stream
byte[] classBytes = ...;
ClassFileReader classFileReader =
new ClassFileReader(classBytes, className.toCharArray(), true);
return new NameEnvironmentAnswer(classFileReader, null);
}
else
return null;
}

It took me about a full day to investigate the Java 6 approach (which did not conclude successfully) and I had the code up and running using the Eclipse compiler in about 4 hours. To conclude, I would just like to say that I am happy to see that the JDK is finally offering a standard way of invoking the Java compiler from within Java code. However, it feels like there is more work to do. They need to offer the ability to not use the concept of classpath anymore, but instead use a ClassLoader. Also it'd be great if the concept of classpath was expanded to support the concept of URL instead of restricting the classpath to be a bunch of jar files or classes located on the file system.

I hope you enjoy this post and stay tuned for more posts on OSGi at LinkedIn. The next topic will be about extending Spring-DM using a fragment host.