Skip to main content

Posts

Showing posts with the label unicode

Unicode newline character in Java string

The other day I was trying to represent a String in unicode characters. String s = new String ( "\u0041 \u000A" ) ; What I wanted was this "A \n", and instead, what I got was a COMPILE ERROR String literal is not properly closed by a double-quote What the hell! I have represented characters as unicode earlier in my Java code. So what was wrong here. It seems the compiler did not like the unicode newline character I had added. Here's why... The compiler translates unicode characters at the beginning of the compile cycle. Which means the above source first gets converted to String s = new String ( "\u0041 " ) ; before compilation starts. Now it is quite obvious why compilation would fail. Check out section 3.2 on Lexical Translations to understand what exactly happens in the translation phase of lexical analysis. You might also enjoy reading this issue of the Java Specialists newsletter. If you trying to represent newline or carraige return character...

Compiling Java source files with supplementary characters

Java source files can also contain supplementary characters as strings as well as identifiers if the character is a string or a digit. Here is a video that shows how we can compile Java source files that contain supplementary characters as Strings. Click on the image to download the video.     Note: This text was originally posted on my earlier blog at http://www.adaptivelearningonline.net

Changes in Java to support supplementary Unicode characters

Support for supplementary characters might need changes in the Java language as well as the API. A few questions come to mind. How do we support supplementary characters at the primitive level (char is only 16 bits)? How do we support supplementary characters in low level API's (such as the static methods of the Character class) ? How do we support supplementary characters in high level API's that deal with character sequences? How do we support supplementary characters in Java literals? How do we support supplementary characters in Java source files? The expert commitee that worked on JSR-204 dealt with all these questions and many more (I'm sure) . After deliberating as well as experimenting with how the changes would affect code, they came up with the following solution. The primitive char was left unchanged. It is still 16 bits and no other type has been added to the Java language to support the supplementary range of unicode characters.  Low level API's, such as ...

Supplemantary character support in Java

In the last post I wrote that supplementary characters in the Unicode standard are in the range above U+FFFF, which means they need more than 16 bits to represent them. Since the char primitive type in Java is a 16 bit character, we will have to use 2 char's for them. I just finished reading some stuff on supplementary character support in Java, and well, there are parts I understood right away and parts that are going to need further reading. I will try to share what I am learning on this blog. However, let us first clarify some terminology. Character: Is an abstract minimal unit of text. It doesn't have a fixed shape (that would be a glyph ), and it doesn't have a value. "A" is a character, and so is "€", the symbol for the common currency of Germany, France, and numerous other European countries. Character Set: Is a collection of characters. Unicode is a coded character set that assigns a unique number to every character defined in the Unicode ...