Skip to main content

Changes in Java to support supplementary Unicode characters

Support for supplementary characters might need changes in the Java language as well as the API. A few questions come to mind.

  • How do we support supplementary characters at the primitive level (char is only 16 bits)?
  • How do we support supplementary characters in low level API's (such as the static methods of the Character class) ?
  • How do we support supplementary characters in high level API's that deal with character sequences?
  • How do we support supplementary characters in Java literals?
  • How do we support supplementary characters in Java source files?

The expert commitee that worked on JSR-204 dealt with all these questions and many more (I'm sure) . After deliberating as well as experimenting with how the changes would affect code, they came up with the following solution.

The primitive char was left unchanged. It is still 16 bits and no other type has been added to the Java language to support the supplementary range of unicode characters.

 Low level API's, such as static methods of the Character class, accepted the char primitive type before support for supplementary characters was provided in Java. However, since Java 5.0, methods such as isLetter(...) of the Character class provide an overloaded method that accepts an int representing the code point, along with the earlier method that accepted a char.

 
JavaCharacterAPI.JPG 

 

High level API's will continue to work "as is" for most developers. They represent character sequences as UTF-16 sequences. Some methods in String and StringBuffer now have parrallel methods to work with code points. Some such methods are codePointAt(...) , codePointBefore(...), and codePointCount(). For example the codePointCount() method returns the number of code points in a String, which may not be the same as the number of characters in the String, if some characters are from the supplementary range and are represented as surrogate pairs.

 

JavaStringMethodsForUnicode.JPG 

 

Identifiers in Java can contain any letter or digit. Many supplementary characters are letters or digits. To allow supplementary characters to be used in identifiers, the Java compiler and other tools were modified to use different API methods (isJavaIdentifierPart(int), isJavaIdentifierStart(int)).

Since we need to support supplementary characters all the way, they also need to be supported in Java source files. I will discuss how to include unicode characters in Java source files and get them to compile using the Java compilers -encode option, in the next blog post.

While I was reading about encoding, I came accross this interesting blog post that describes a situation when an I18N enables Java program ceased to work after the build machine was moved from a Windows box to a Red Hat box. The reason of course was encoding related issues.

 



Note: This text was originally posted on my earlier blog at http://www.adaptivelearningonline.net

Comments

Popular posts from this blog

My HSQLDB schema inspection story

This is a simple story of my need to inspect the schema of an HSQLDB database for a participar FOREIGN KEY, and the interesting things I had to do to actually inspect it. I am using an HSQLDB 1.8 database in one of my web applications. The application has been developed using the Play framework , which by default uses JPA and Hibernate . A few days back, I wanted to inspect the schema which Hibernate had created for one of my model objects. I started the HSQLDB database on my local machine, and then started the database manager with the following command java -cp ./hsqldb-1.8.0.7.jar org.hsqldb.util.DatabaseManagerSwing When I tried the view the schema of my table, it showed me the columns and column types on that table, but it did not show me columns were FOREIGN KEYs. Image 1: Table schema as shown by HSQLDB's database manager I decided to search on StackOverflow and find out how I could view the full schema of the table in question. I got a few hints, and they all pointed to

Fuctional Programming Principles in Scala - Getting Started

Sometime back I registered for the Functional Programming Principles in Scala , on Coursera. I have been meaning to learn Scala from a while, but have been putting it on the back burner because of other commitments. But  when I saw this course being offered by Martin Odersky, on Coursera , I just had to enroll in it. This course is a 7 week course. I will blog my learning experience and notes here for the next seven weeks (well actually six, since the course started on Sept 18th). The first step was to install the required tools: JDK - Since this is my work machine, I already have a couple of JDK's installed SBT - SBT is the Scala Build Tool. Even though I have not looked into it in detail, it seems like a replacement for Maven. I am sure we will use it for several things, however upto now I only know about two uses for it - to submit assignments (which must be a feature added by the course team), and to start the Scala console. Installed sbt from here , and added the path

Five Reasons Why Your Product Needs an Awesome User Guide

Photo Credit: Peter Merholz ( Creative Commons 2.0 SA License ) A user guide is essentially a book-length document containing instructions for installing, using or troubleshooting a hardware or software product. A user guide can be very brief - for example, only 10 or 20 pages or it can be a full-length book of 200 pages or more. -- prismnet.com As engineers, we give a lot of importance to product design, architecture, code quality, and UX. However, when it comes to the user manual, we often only manage to pay lip service. This is not good. A usable manual is as important as usable software because it is the first line of help for the user and the first line of customer service for the organization. Any organization that prides itself on great customer service must have an awesome user manual for the product. In the spirit of listicles - here are at least five reasons why you should have an awesome user manual! Enhance User Satisfaction In my fourteen years as a