Skip to main content

Changes in Java to support supplementary Unicode characters

Support for supplementary characters might need changes in the Java language as well as the API. A few questions come to mind.

  • How do we support supplementary characters at the primitive level (char is only 16 bits)?
  • How do we support supplementary characters in low level API's (such as the static methods of the Character class) ?
  • How do we support supplementary characters in high level API's that deal with character sequences?
  • How do we support supplementary characters in Java literals?
  • How do we support supplementary characters in Java source files?

The expert commitee that worked on JSR-204 dealt with all these questions and many more (I'm sure) . After deliberating as well as experimenting with how the changes would affect code, they came up with the following solution.

The primitive char was left unchanged. It is still 16 bits and no other type has been added to the Java language to support the supplementary range of unicode characters.

 Low level API's, such as static methods of the Character class, accepted the char primitive type before support for supplementary characters was provided in Java. However, since Java 5.0, methods such as isLetter(...) of the Character class provide an overloaded method that accepts an int representing the code point, along with the earlier method that accepted a char.

 
JavaCharacterAPI.JPG 

 

High level API's will continue to work "as is" for most developers. They represent character sequences as UTF-16 sequences. Some methods in String and StringBuffer now have parrallel methods to work with code points. Some such methods are codePointAt(...) , codePointBefore(...), and codePointCount(). For example the codePointCount() method returns the number of code points in a String, which may not be the same as the number of characters in the String, if some characters are from the supplementary range and are represented as surrogate pairs.

 

JavaStringMethodsForUnicode.JPG 

 

Identifiers in Java can contain any letter or digit. Many supplementary characters are letters or digits. To allow supplementary characters to be used in identifiers, the Java compiler and other tools were modified to use different API methods (isJavaIdentifierPart(int), isJavaIdentifierStart(int)).

Since we need to support supplementary characters all the way, they also need to be supported in Java source files. I will discuss how to include unicode characters in Java source files and get them to compile using the Java compilers -encode option, in the next blog post.

While I was reading about encoding, I came accross this interesting blog post that describes a situation when an I18N enables Java program ceased to work after the build machine was moved from a Windows box to a Red Hat box. The reason of course was encoding related issues.

 



Note: This text was originally posted on my earlier blog at http://www.adaptivelearningonline.net

Comments

Popular posts from this blog

Planning a User Guide - Part 3/5 - Co-ordinate the Team

Photo by  Helloquence  on  Unsplash This is the third post in a series of five posts on how to plan a user guide. In the first post , I wrote about how to conduct an audience analysis and the second post discussed how to define the overall scope of the manual. Once the overall scope of the user guide is defined, the next step is to coordinate the team that will work on creating the manual. A typical team will consist of the following roles. Many of these roles will be fulfilled by freelancers since they are one-off or intermittent work engagements. At the end of the article, I have provided a list of websites where you can find good freelancers. Creative Artist You'll need to work with a creative artist to design the cover page and any other images for the user guide. Most small to mid-sized companies don't have a dedicated creative artist on their rolls. But that's not a problem. There are several freelancing websites where you can work with great creative ar

Inheritance vs. composition depending on how much is same and how much differs

I am reading the excellent Django book right now. In the 4th chapter on Django templates , there is an example of includes and inheritance in Django templates. Without going into details about Django templates, the include is very similar to composition where we can include the text of another template for evaluation. Inheritance in Django templates works in a way similar to object inheritance. Django templates can specify certain blocks which can be redefined in subtemplates. The subtemplates use the rest of the parent template as is. Now we have all learned that inheritance is used when we have a is-a relationship between classes, and composition is used when we have a contains-a relationship. This is absolutely right, but while reading about Django templates, I just realized another pattern in these relationships. This is really simple and perhaps many of you may have already have had this insight... We use inheritance when we want to allow reuse of the bulk of one object in other

Running your own one person company

Recently there was a post on PuneTech on mom's re-entering the IT work force after a break. Two of the biggest concerns mentioned were : Coping with vast advances (changes) in the IT landscape Balancing work and family responsibilities Since I have been running a one person company for a good amount of time, I suggested that as an option. In this post I will discuss various aspects of running a one person company. Advantages: You have full control of your time. You can choose to spend as much or as little time as you would like. There is also a good chance that you will be able to decide when you want to spend that time. You get to work on something that you enjoy doing. Tremendous work satisfaction. You have the option of working from home. Disadvantages: It can take a little while for the work to get set, so you may not be able to see revenues for some time. It takes a huge amount of discipline to work without a boss, and without deadlines. You will not get the benefits (insuranc