Thursday, 28 June 2012

Introducing Polyglotted

My name is Shankar Vasudevan, a developer and architect, working for a leading investment bank within the London city. I have been architecting large scale enterprise applications for the past 15 years. You can visit my professional profile on LinkedIn or follow me @vshank77 on Twitter.

Over the past 15 years, I have grown a great fascination for data storage and analysis. The advent of Big Data and NoSQL in the recent years have got me drooling and my professional work in the fields of information retrieval, analytical processing, adword classification and scoring, enterprise search have added fuel to the fire. More recently I have become a true believer of Polyglot Persistence a keyword coined by Martin Fowler and apply the philosophy in day-to-day job.

Polyglotted.org is to give back to the community, projects that I conceive in the area of information analysis and polyglot persistence.

If you are still interested, Martin Fowler has an interesting introduction to NoSQL Databases and Polyglot Persistence


Cross-posted from Polyglotted Blog

Monday, 1 February 2010

JMockit & JavaAgent - Enterprise integration woes

We've been using JMockit as the preferred mocking utility at a large Bank that I'm consulting at. While generally the product and its non-intrusive mocking features are commendable, some of the under-the-hood mechanisms have caused us a nightmare with our continuous integration.After upgrading to latest version of JMockit, version 0.996.0, we had ALL our unit tests failing with the following spurious error
java.lang.NoClassDefFoundError: org.junit.runner.Runner at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:59) at org.junit.internal.requests.ClassRequest.getRunner(ClassRequest.java:24) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.(JUnit4TestReference.java:29) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestClassReference.(JUnit4TestClassReference.java:25) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestLoader.createTest(JUnit4TestLoader.java:40) at org.eclipse.jdt.internal.junit4.runner.JUnit4TestLoader.loadTests(JUnit4TestLoader.java:30) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:452) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390) at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
While my colleague who had been advocating the use of the utility asked me for a way to change the order of libraries in the classpath, we smelt rot and started debugging under the hood. After quite a bit of digging around, we found that the actual error was caused by the failure to load the JMockit agent dynamically. The following code is the sequence of calls which JMockit makes to load the agent.
 public class Startup {   // other code omitted    public static void verifyInitialization()   {      if (instrumentation == null) {         new AgentInitialization().initializeAccordingToJDKVersion();      }   }}public final class AgentInitialization{   public void initializeAccordingToJDKVersion()   {      String jarFilePath = discoverPathToJarFile();      if (Startup.jdk6OrLater) {         new JDK6AgentLoader(jarFilePath).loadAgent();      }      else if ("1.5".equals(Startup.javaSpecVersion)) {         throw new IllegalStateException(            "JMockit has not been initialized. Check that your Java 5 VM has been started " +            "with the -javaagent:" + jarFilePath + " command line option.");      }      else {         throw new IllegalStateException("JMockit requires a Java 5 VM or later.");      }   }   private String discoverPathToJarFile()   {      CodeSource codeSource = AgentInitialization.class.getProtectionDomain().getCodeSource();      if (codeSource == null) {         return findPathToJarFileFromClasspath();      }      URI jarFileURI; // URI is needed to deal with spaces and non-ASCII characters      try {         jarFileURI = codeSource.getLocation().toURI();      }      catch (URISyntaxException e) {         throw new RuntimeException(e);      }      return new File(jarFileURI).getPath();   }   private String findPathToJarFileFromClasspath()   {      String[] classPath = System.getProperty("java.class.path").split(File.pathSeparator);      for (String cpEntry : classPath) {         if (cpEntry.matches(".*jmockit[-.\\d]*.jar")) {            return cpEntry;         }      }      return null;   }}
While this approach is non-intrusive and works out of the box in most cases, in large enterprise organisations, you are often limited extensively and work in a confined environment. My current environment loads files from a network file system and the following lines from the discoverPathToJarFile() cause the class initialization error
 jarFileURI = codeSource.getLocation().toURI(); 
returns
 CodeSourcefile://remoterepository/libraries/jmockit/0.996.0/install/common/lib/jmockit.jar 
and
 return new File(jarFileURI).getPath(); 
returns
 IllegalArgumentException: URI has an authority component 
We had been using JMockit for a while now since 0.991.0 version and the agent loading had always been a problem. In our original usage, we had to do two things. We had to create our own version of the JDK6AgentLoader as this class has package level scope and create our own JUnit4 runner which loaded the agent from a hard-coded location, as below
public final class MyJMockitRunner extends BlockJUnit4ClassRunner{   static   {      MyJDK6AgentLoader.loadAgent(hardCodedPathtoAgentJar);   }   public JMockit(Class testClass) throws InitializationError   {      super(testClass);   }}

Why the problem now?
What changed majorly between the two versions was, Shadowing the org.junit.runner.Runner class which in the modified version loaded the java agent before initialization of the class.

public abstract class Runner implements Describable{   static   {      Startup.initializeIfNeeded();   }   public abstract Description getDescription();   public abstract void run(RunNotifier notifier);   public int testCount()   {      return getDescription().testCount();   }}
So even though we are trying to load the agent in our custom Runner, the loading of the Runner class itself preceding the sequence and aggressively loads the agent in its own strategy.While this is a possible approach, I would highly criticize the usage of class shadowing for the following reasons
  1. Open source API generally provide certain basic guarantees, and is generally tested extensively by the commnuity in different enviroments to ensure they work in the way they are designed. By shadowing a public class, you override this guarantee and for most common developers, the fact that they are using a modified class is oblivious. This can lead to late night debugging sessions and nightmares when things go wrong.
  2. In this particular case, while only the tests that used JMockit should have failed, thousands of test cases across multiple projects failed at once. As a programmer, I always believe to lazily load resources and only when they are needed. But there is a necessity for mocking or not, the agent is going to be loaded whenever a Unit test is run. This is fundamentally wrong.
  3. Any workarounds that had been applied to the library fail miserably with this aggressive approach to force your library to behave in a specific way.
  4. While I agree that the owner cannot foresee all the environments his library would be used, trying to perform gimmicks under the hood on an open library could have been avoided.

Solution
While I've criticized the usage of shadowing a class, unfortunately the only solution that worked for us was to shadow the JMockit class that loaded the agent as below.

public final class AgentInitialization{   public void initializeAccordingToJDKVersion()   { MyJDK6AgentLoader.loadAgent(hardCodedPathtoAgentJar);   }}

Better approach
There are a couple of suggestions for the JMockit API to overcome the shortcomings described above
  1. Please do remove the shadowed JUnit class and make the agent loading explicit. This is already the case when you use the JMockit.class as the runner.
  2. While the auto detection of the agent is brilliant, there could be some hooking mechanism which allowed the developers to extend it with their own strategies.
Last but not least, there definitely was an obvious solution to the problem, which was to explicitly specify the agent path in the VM arguments. The issue we had in that was there were too many OS, at least a few IDE, a shared continuous integration server and many many ant build scripts that need to be updated. It would have been a far more involved effort to get all this modified and maintained than applying a code fix which worked "under the hood"

Sunday, 17 January 2010

Google Appengine and Maven - another hack for datanucleus plugin

After researching for mavenizing an Appengine pet project, most of the links that Google search returned were really old when first version of AppEngine was released and did not match the expectations. Also unfortunately there is really no maven plugin that supports the latest version as Google publishes them or a plugin that allows you to point to a local installation, so that you could update at the speed of Google releasing updates for AppEngine (as of this writing, there were atleast 3 updates in the shorter period of 2 weeks)

Thanks to Salient Point, this was the best article to configure a maven project and it allowed to configure any AppEngine version with the simplicity of changing a couple of variables in a script file and building the maven project. However the only caveat that needed a few hours of debugging was the maven-datanucleus-plugin, as again the versions of the Jar files provided by datanucleus repository and the ones that are published by Google were slightly different and it was more simpler to use the same philosophy of having all the Jars provided by Google rather than downloading them from a different repository.

On this note, the following are the only deviation of configurations from the original post 1) Define the local directory to which the AppEngine Jars have been downloaded, the preferred way is to use profiles such that we could specify a default location in each platform (windows / mac)

        <profile>
            <id>gwt-dev-windows</id>
            <properties>
                <platform>windows</platform>
                <appengine.sdk.root>C:\Java\appengine-java-sdk-${appengineVersion}</appengine.sdk.root>
            </properties>
            <activation>
                <activeByDefault>false</activeByDefault>
                <os>
                    <family>windows</family>
                </os>
            </activation>
        </profile>
2) Specify a antrun task that performs the datanucleus enhancement for entity objects rather than the datanucleus plugin
            <plugin>
                <artifactId>maven-antrun-plugin</artifactId>
                <executions>
                    <execution>
                        <id>datanucleus-enhance</id>
                        <phase>process-classes</phase>
                        <configuration>
                            <tasks>
                                <property name="appengine.tools.classpath"
                                    location="${appengine.sdk.root}/lib/appengine-tools-api.jar"/>
                                <property name="dependency_classpath" refid="maven.dependency.classpath"/>

                                <taskdef name="enhance" classname="com.google.appengine.tools.enhancer.EnhancerTask"
                                    classpath="${appengine.tools.classpath}"/>
                                <enhance failonerror="true" api="JPA">
                                    <classpath>
                                        <pathelement path="${appengine.tools.classpath}"/>
                                        <pathelement path="${dependency_classpath}"/>
                                        <pathelement path="target/classes"/>
                                    </classpath>
                                    <fileset dir="target/classes" includes="**/*.class"/>
                                </enhance>
                            </tasks>
                        </configuration>
                        <goals>
                            <goal>run</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
Alas it works like magic for us and we are able to upgrade the AppEngine versions as and when they are released by Google.

Also to note is the link on configuring a multi-project setup that is documented on the maven-gwt-plugin documentation that helps to organize your code into multiple projects (separating the core API / UI / service implementation and AppEngine deployment). Refer to our sample project setup that is hosted at http://code.google.com/p/scrumsp