Skip to main content

Changes in Java to support supplementary Unicode characters

Support for supplementary characters might need changes in the Java language as well as the API. A few questions come to mind.

  • How do we support supplementary characters at the primitive level (char is only 16 bits)?
  • How do we support supplementary characters in low level API's (such as the static methods of the Character class) ?
  • How do we support supplementary characters in high level API's that deal with character sequences?
  • How do we support supplementary characters in Java literals?
  • How do we support supplementary characters in Java source files?

The expert commitee that worked on JSR-204 dealt with all these questions and many more (I'm sure) . After deliberating as well as experimenting with how the changes would affect code, they came up with the following solution.

The primitive char was left unchanged. It is still 16 bits and no other type has been added to the Java language to support the supplementary range of unicode characters.

 Low level API's, such as static methods of the Character class, accepted the char primitive type before support for supplementary characters was provided in Java. However, since Java 5.0, methods such as isLetter(...) of the Character class provide an overloaded method that accepts an int representing the code point, along with the earlier method that accepted a char.

 
JavaCharacterAPI.JPG 

 

High level API's will continue to work "as is" for most developers. They represent character sequences as UTF-16 sequences. Some methods in String and StringBuffer now have parrallel methods to work with code points. Some such methods are codePointAt(...) , codePointBefore(...), and codePointCount(). For example the codePointCount() method returns the number of code points in a String, which may not be the same as the number of characters in the String, if some characters are from the supplementary range and are represented as surrogate pairs.

 

JavaStringMethodsForUnicode.JPG 

 

Identifiers in Java can contain any letter or digit. Many supplementary characters are letters or digits. To allow supplementary characters to be used in identifiers, the Java compiler and other tools were modified to use different API methods (isJavaIdentifierPart(int), isJavaIdentifierStart(int)).

Since we need to support supplementary characters all the way, they also need to be supported in Java source files. I will discuss how to include unicode characters in Java source files and get them to compile using the Java compilers -encode option, in the next blog post.

While I was reading about encoding, I came accross this interesting blog post that describes a situation when an I18N enables Java program ceased to work after the build machine was moved from a Windows box to a Red Hat box. The reason of course was encoding related issues.

 



Note: This text was originally posted on my earlier blog at http://www.adaptivelearningonline.net

Comments

Popular posts from this blog

Commenting your code

Comments are an integral part of any program, even though they do not contribute to the logic. Appropriate comments add to the maintainability of a software. I have heard developers complain about not remembering the logic of some code they wrote a few months back. Can you imagine how difficult it can be to understand programs written by others, when we sometimes find it hard to understand our own code. It is a nightmare to maintain programs that are not appropriately commented. Java classes should contain comments at various levels. There are two types of comments; implementation comments and documentation comments. Implementation comments usually explain design desicisions, or a particularly intricate peice of code. If you find the need to make a lot of implementation comments, then it may signal overly complex code. Documentation comments usually describe the API of a program, they are meant for developers who are going to use your classes. All classes, methods and variables ...

Inheritance vs. composition depending on how much is same and how much differs

I am reading the excellent Django book right now. In the 4th chapter on Django templates , there is an example of includes and inheritance in Django templates. Without going into details about Django templates, the include is very similar to composition where we can include the text of another template for evaluation. Inheritance in Django templates works in a way similar to object inheritance. Django templates can specify certain blocks which can be redefined in subtemplates. The subtemplates use the rest of the parent template as is. Now we have all learned that inheritance is used when we have a is-a relationship between classes, and composition is used when we have a contains-a relationship. This is absolutely right, but while reading about Django templates, I just realized another pattern in these relationships. This is really simple and perhaps many of you may have already have had this insight... We use inheritance when we want to allow reuse of the bulk of one object in other ...

Planning a User Guide - Part 3/5 - Co-ordinate the Team

Photo by  Helloquence  on  Unsplash This is the third post in a series of five posts on how to plan a user guide. In the first post , I wrote about how to conduct an audience analysis and the second post discussed how to define the overall scope of the manual. Once the overall scope of the user guide is defined, the next step is to coordinate the team that will work on creating the manual. A typical team will consist of the following roles. Many of these roles will be fulfilled by freelancers since they are one-off or intermittent work engagements. At the end of the article, I have provided a list of websites where you can find good freelancers. Creative Artist You'll need to work with a creative artist to design the cover page and any other images for the user guide. Most small to mid-sized companies don't have a dedicated creative artist on their rolls. But that's not a problem. There are several freelancing websites where you can work with great creative ...