samedi 18 juin 2016

JSP breaks on encoding UTF-8 with other than English Characters

I am trying to append additional content to the HttpServletResponse using a CharReponseWrapper (which is in turn invoked by my filter).

In order to support multiple languages such as Chinese and Korean, I have to ensure that the resultant content (after appending) preserves the original charset and encoding. Therefore, I obtain the charset by invoking super.getContentType() and parse it to extract the charset.

For example, super.getContentType() might return text/html; charset=UTF-8 which I parse to extract UTF-8.

Subsequently, I supply the charset while creating PrintWriter object (which wraps the OutputStreamWriter). (note: try/catch omitted for clarity)

CharResponseWrapper.java :

public class CharResponseWrapper extends HttpServletResponseWrapper
{
 ....
@Override
  public PrintWriter getWriter() 
  {

            String charEnc = getCharsetFromContentType(getContentType());
            if (charEnc != null) {
                pwriter = new PrintWriter(new OutputStreamWriter(getOutputStream(), charEnc), false);
            } else {
                pwriter = new PrintWriter(getOutputStream());
            }
          }
      return pwriter;
  }
 ....
}
  • I have a JSP with Korean text.
  • In that JSP specified contentType="text/html; charset=UTF-8" and pageEncoding="UTF-8".

JSP's Source:

 <%@ page language="java" contentType="text/html; charset=UTF-8"
    pageEncoding="UTF-8"%>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Insert title here</title>
</head>
<body>
<%
    out.println("한글자모 / 조선글");
%>
</body>
</html>

When I am trying to access the JSP with Korean characters I am getting few ��? garbled characters in browser.

But I am getting the garbled characters with response page like given below.

한글��?모 / 조선글

Aucun commentaire:

Enregistrer un commentaire