jeudi 30 juin 2016

Java can't display Unicode block characters despite properly configured terminal

I'm trying to print the Unicode block character in a Java application being run in Cygwin. Despite the terminal being set to UTF-8, and despite Bash and Python being able to print the character, Java simply prints a ?.

$ echo $LANG
en_US.UTF-8

$ echo -e "xe2x96x88"
█

$ python3 -c 'print("u2588")'
█

$ cat Block.java
public class Block {
  public static void main(String[] args) {
    System.out.println('u2588');
  }
}

$ javac Block.java

$ java -cp . Block
?

This appears to be Cygwin-specific, as when run from cmd the character is displayed:

>java -cp . Block
█

Is there anything I can do to get Cygwin/mintty to render Java's output correctly?

Update:

It appears Java on Windows/Cygwin doesn't actually use the LANG environment variable, and is therefore actually still using cp1252:

$ cat Block.java
public class Block {
  public static void main(String[] args) {
    System.out.println("Default Charset=" + java.nio.charset.Charset.defaultCharset());
    System.out.println("u2588");
  }
}

$ java -cp . Block
Default Charset=windows-1252
?

But oddly I can't get iconv to work:

$ java -cp . Block | iconv -f WINDOWS-1252 -t UTF8
Default Charset=windows-1252
?

Is there any way (short of specifying -Dfile.encoding=UTF-8) to get Java to respect Cygwin's encoding? I'm also aware of JAVA_TOOL_OPTIONS, but this causes debugging output to be written to stderr. Not the end of the world, but not ideal.

Aucun commentaire:

Enregistrer un commentaire