Friday, January 16, 2009

Avoid assumptions in infrastructure code

A few days back, while reviewing some code I came across what I considered to an over abundance of assumptions in infrastructure code. Such assumptions in infrastructure code can make software buggy and difficult to change.

Let me first explain what I mean by infrastructure code. Frameworks always have interactions that we use when we extend their classes. For example if you have used Struts, then the custom action classes we create, use Strut's infrastructure code from the ActionServlet and the Struts RequestDispatcher. These classes call methods which are overriden by our custom classes, thus allowing our code to get called.

Even when we do not use such frameworks, there are lots of places where we have hand written infrastructure code in our projects. Typically these are methods in base classes that are invoked as part of a use case. These methods will do a bunch of things that are determined by reading configuration files, decoding the request that invoked them, and perhaps other factors. While doing their stuff they also invoke base class abstract methods which has been overriden by other classes. This is almost a mini-framework. Unknowingly we all have such mini frameworks in our code. The stuff that is done by methods in the base classes is what I refer to as infrastructure code.

When we have such code, it is good to be careful, not to make too many assumptions. Because if in the future any of these assumptions change, then we may either have to override these methods in subclasses (creating difficult to read and difficult to test code), or we will have to change all the classes that are coupled with that infrastructure.

It is best to keep infrastructure code simple, and as assumption free as possible. One idiom which results in a lot of assumptions, is pushing common functionality to base classes. This does create reusable code, but it silently creates an assumption that this is what all subclasses will need. If a bug is introduced in such code, or if the assumption no longer holds true, it affects large parts of the software. If that assumption becomes false, then we either have to override base class methods in those subclasses in which that assumption does not hold true, or we have to change the base class interactions.

Overriding base class methods, is fine, but if overdone, it can lead to extremely difficult to understand classes. Such classes are also difficult to refactor because even a small change affects a lot of other classes, thus making even a small refactoring a large task.

Changing base class methods is quite a beast, which I am sure everyone will agree.

Because of these reasons, that I prefer to avoid pushing common functionality into base class methods, especially when the base classes are part of infrastructure code. Instead I prefer to factor out the common functionality into helper and utility classes and achieve reuse by composition.

Thursday, January 15, 2009

Make build scripts in GANT

I have always used ANT to create build scripts, and by and large it has served me well. ANT is simple, and it has a wide variety of tasks, which take care of almost all build requirements.

Sometime back when I came across a new build tool called GANT, I was curious as to what it would offer that ANT did not. GANT is really Groovy + ANT. For those of you who are not familiar with Groovy, it is a dynamic language which compiles to bytecode and interoperates very well with Java. So GANT uses Groovy as the language to create build scripts. However all ANT tasks have been made available through Groovy's ANTBuilder. So GANT can use ANT under the hoods, but it is not limited to ANT.

If we need to write custom stuff for a build script, we can either create our own custom ANT task, or alternatively we can write a Groovy function or class. This along with being able to easily add consitional logic in build scripts is a very useful feature. Also since we use Groovy for creating the build scripts, we move away from the cumbersome XML syntax which ANT requires. All this in my opinion is a big advantage for developers.

However, there are a few drawbacks of using GANT. First of all, you will have to spend some time getting familiar with GANT and learning Groovy. Granted that they both have a pretty small learning curve for Java developers, but it's still time that must be spent. Also tooling support for GANT is not as good as it is for ANT. That's at least true for Eclipse. I am not aware of the state of GANT support on NetBeans and IntelliJ (Please share your thoughts in the comments if you have experience using GANT with either of these).

Even though I said that GANT uses Groovy to create the build scripts, what it really uses is a DSL (Domain Specific Language) made on top of Groovy. But within a GANT script we can use Groovy syntax freely. Again there may be some restrictions, but I am not yet aware of them.

There is the sample GANT script shown on GANT's website.

Here is how you might actually compile your programs in GANT

And see this page for an example of using the javac task in ANT.

Give GANT a try, you might actually start liking it over ANT.

Tuesday, January 13, 2009

Unicode newline character in Java string

The other day I was trying to represent a String in unicode characters.

String s = new String("\u0041 \u000A");

What I wanted was this "A \n", and instead, what I got was a COMPILE ERROR

String literal is not properly closed by a double-quote

What the hell! I have represented characters as unicode earlier in my Java code. So what was wrong here. It seems the compiler did not like the unicode newline character I had added. Here's why...

The compiler translates unicode characters at the beginning of the compile cycle. Which means the above source first gets converted to

String s = new String("\u0041


before compilation starts. Now it is quite obvious why compilation would fail. Check out section 3.2 on Lexical Translations to understand what exactly happens in the translation phase of lexical analysis.

You might also enjoy reading this issue of the Java Specialists newsletter.

If you trying to represent newline or carraige return characters as unicode in your Strings, don't bother. It will not work. Use "\n" and "\r" instead.

Sunday, January 11, 2009

XML attribute value normalization

A couple of days back I was debugging a failed test case which was testing an XML generated by a Servlet. We were using JDom for generating the XML, and XMLUnit for testing. Testing involved comparing the generated XML with an XML on disk.

The test case was failing on a '\n' character in one of the attributes of the generated XML. The XML generated by the Servlet was something like this:

<root att="test \n value">

but JDom seemed to be putting some strange characters in place of '\n'

Now I had absolutely no idea about this, but the XML specification has something called "XML attribute normalization". Among other things, while adhering to this specification, JDom replaces all '\n' with Look here for more details.

The moment I replaced the '\n' in the expected data, the test worked like a charm.

Thursday, January 01, 2009

Quirks mode in browsers

Have you heard of the quirks mode in web browsers? Well I had not until a couple days back when I ran into an issue of a CSS not rendering properly in a web page I was working on.

When I made a request for the page and opened Firebug, I could see requests for the CSS files, and I could also see that they were being properly downloaded. In Firebug's CSS view, I could see that all the CSS classes which I was expecting to be applied were showing up. So, what then was the problem. By pure chance I discovered that only elements from the CSS class were being applied. All inherited elements were not being applied. I also discovered by chance that the tag in the webpage caused this behaviour. If I added a DOCTYPE, everything would render perfectly, but if I removed the DOCTYPE, then inherited CSS elements would not get rendered. Now why would that happen?

I decided to Google a bit, and found out that browsers have a quirks mode which is triggered by the absense of the DOCTYPE in the html page.

To explain very breifly, old browsers did not confirm to W3C standards for HTML as well as CSS. As a result there were a lot of (non compliant) web pages which were rendered by these browsers. Now, when browsers did start complying to web stadards, there was a problem. All these pages which were rendering perfectly would stop rendering. To resolve this issue, browsers decided to have a backwards compatibility mode, which they called the quirks mode. When this mode was triggered, the new (W3C compliant) browser would fall back to old behaviour. Now the question was, what in a web page should trigger the quirks mode in a browser? Browser makers decided to use the DOCTYPE element. So, if the DOCTYPE was present then the browser would funciton in regular mode, whereas if the DOCTYPE was not present, it would function in quirks mode.

For more information on the quirks mode, check out this article on and the Wikipedia article on quirks mode.