java字串編碼轉換及在tomcat中的應用
最近在正式環境中手機訪問系統時,中文都會出現亂碼的情況。順帶研究一下字符集的問題。
字串編碼轉換
java檔案編譯時,JVM按照檔案的編碼方式解析成字元,然後轉換為unicode格式的位元組陣列。 那麼不論原始碼檔案是什麼格式,同樣的字串,最後得到的unicode位元組陣列是完全一致的,顯示的時候,將unicode轉換成OS的方式來顯示。
gbk 一個字串佔2個位元組
utf8 一個字串佔3個位元組
iso-8859-1 一個字串佔1個位元組
getBytes 方法的作用
在Java中,String的getBytes()方法是得到一個作業系統預設的編碼格式的位元組陣列。這表示在不同的作業系統下,返回的東西不一樣!
1、 str.getBytes(); 如果括號中不寫charset,則採用的是Sytem.getProperty("file.encoding"),即當前檔案的編碼方式,
2、 str.getBytes("charset");//指定charset,即將底層儲存的Unicode碼解析為charset編碼格式的位元組陣列方式
亂碼
亂碼產生的本質上都是由於 字串原本的編碼格式 與 讀取時解析用的編碼格式不一致導致的。如:
System.out.println("當前檔案的字符集:" + System.getProperty("file.encoding")); // GBK String lm = null; lm = new String("我們".getBytes("ISO-8859-1"), "UTF-8"); System.out.println(lm + "\t" + bytes2HexString(lm.getBytes())); // ?? 3F3F lm = new String("我們".getBytes("GBK"), "UTF-8"); System.out.println(lm + "\t" + bytes2HexString(lm.getBytes())); // ???? 3F3F3F3F
new String("我們".getBytes("ISO-8859-1"), "UTF-8") 執行順序是:
1. 由於當前檔案編碼是GBK,編譯時先將"我們"轉換成unicode。
2. 將unicode轉成ISO-8859-1的位元組陣列
3.將這個位元組陣列以UTF-8的編碼方式decode
此時就會亂碼。
示例
package org.wxy.demo.test; import java.io.UnsupportedEncodingException; import java.net.URLEncoder; import org.junit.jupiter.api.Test; /** * Java 正確的做字串編碼轉換<br> * https://blog.csdn.net/h12kjgj/article/details/73496528 * * * @author wang * */ public class CharsetTest { public static void main(String[] args) throws UnsupportedEncodingException { System.out.println("當前檔案的字符集:" + System.getProperty("file.encoding")); // GBK System.out.println("\n============GBK============"); String gbk = new String("我們".getBytes("GBK"), "GBK"); // 獲取UNICODE System.out.println(bytes2HexString(gbk.getBytes())); // CED2C3C7 System.out.println(bytes2HexString(gbk.getBytes("UTF-8"))); // E68891E4BBAC String encode = URLEncoder.encode(gbk); System.out.println(gbk + "(預設)\t" + encode); // %CE%D2%C3%C7 encode = URLEncoder.encode(gbk, "GBK"); System.out.println(gbk + "GBK\t\t" + encode); // %CE%D2%C3%C7 encode = URLEncoder.encode(gbk, "UTF-8"); System.out.println(gbk + "UTF-8\t" + encode); // %E6%88%91%E4%BB%AC System.out.println("\n============UTF-8============"); // getBytes 原GBK => Unicode => UTF-8 String utf = new String("我們".getBytes("UTF-8"), "UTF-8"); System.out.println(bytes2HexString(utf.getBytes())); // CED2C3C7 System.out.println(bytes2HexString(utf.getBytes("UTF-8"))); // E68891E4BBAC encode = URLEncoder.encode(utf); System.out.println(utf + "(預設)\t" + encode); // %CE%D2%C3%C7 encode = URLEncoder.encode(utf, "GBK"); System.out.println(utf + "GBK\t\t" + encode); // %CE%D2%C3%C7 encode = URLEncoder.encode(utf, "UTF-8"); System.out.println(utf + "UTF-8\t" + encode); // %E6%88%91%E4%BB%AC System.out.println("\n============亂碼============"); System.out.println("當前檔案的字符集:" + System.getProperty("file.encoding")); // GBK String lm = null; lm = new String("我們".getBytes("ISO-8859-1"), "UTF-8"); System.out.println(lm + "\t" + bytes2HexString(lm.getBytes())); // ?? 3F3F lm = new String("我們".getBytes("GBK"), "UTF-8"); System.out.println(lm + "\t" + bytes2HexString(lm.getBytes())); // ???? 3F3F3F3F } /* * 位元組陣列轉16進位制字串 */ public static String bytes2HexString(byte[] b) { String r = ""; for (int i = 0; i < b.length; i++) { String hex = Integer.toHexString(b[i] & 0xFF); if (hex.length() == 1) { hex = '0' + hex; } r += hex.toUpperCase(); } return r; } /* * 16進位制字串轉位元組陣列 */ public static byte[] hexString2Bytes(String hex) { if ((hex == null) || (hex.equals(""))) { return null; } else if (hex.length() % 2 != 0) { return null; } else { hex = hex.toUpperCase(); int len = hex.length() / 2; byte[] b = new byte[len]; char[] hc = hex.toCharArray(); for (int i = 0; i < len; i++) { int p = 2 * i; b[i] = (byte) (charToByte(hc[p]) << 4 | charToByte(hc[p + 1])); } return b; } } /* * 字元轉換為位元組 */ private static byte charToByte(char c) { return (byte) "0123456789ABCDEF".indexOf(c); } }
tomcat
網上好多資料說tomcat預設get請求是 ISO-8859-1,但經過下面的驗證好像並不是這麼會事,應該是UTF-8。但也可能與我當前電腦的環境有關係,後面再在別的電腦上試一下。
當前環境
win10+jdk8+eclipsse text file encoding GBK+tomcat8.5
server.xml檔案(預設)
<ConnectorconnectionTimeout="20000" port="8080"protocol="HTTP/1.1" redirectPort="8443" />傳入引數GBK UTF-8轉碼:
@Test
public void test1() throws UnsupportedEncodingException {
String str = "{\"approveComment\":\"我們\",\"approveResult\":\"0\"}";
// %7B%22approveComment%22%3A%22%CE%D2%C3%C7%22%2C%22approveResult%22%3A%220%22%7D
System.out.println(URLEncoder.encode(str, "GBK"));
// %7B%22approveComment%22%3A%22%E6%88%91%E4%BB%AC%22%2C%22approveResult%22%3A%220%22%7D
System.out.println(URLEncoder.encode(str, "UTF-8"));
// %7B%22approveComment%22%3A%22%3F%3F%22%2C%22approveResult%22%3A%220%22%7D
System.out.println(URLEncoder.encode(str, "ISO-8859-1"));
}
servlet:
@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
String keyword = req.getParameter("keyword");
System.out.println("keyword=" + keyword);
// 設定響應內容型別
resp.setContentType("text/html");
PrintWriter out = resp.getWriter();
out.println("<h1>test</h1>");
}
1. 使用get請求傳GBK編碼
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%CE%D2%C3%C7%22%2C%22approveResult%22%3A%220%22%7D
輸出(亂碼):
keyword={"approveComment":"????","approveResult":"0"}
2. 使用get請求傳UTF-8編碼
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%E6%88%91%E4%BB%AC%22%2C%22approveResult%22%3A%220%22%7D
輸出:
keyword = {"approveComment":"我們","approveResult":"0"}
3. 使用get請求傳ISO-8859-1編碼
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%3F%3F%22%2C%22approveResult%22%3A%220%22%7D
輸出(亂碼,為何):
keyword={"approveComment":"??","approveResult":"0"}
server.xml檔案(UTF-8)
《為何與預設是一樣的結果》
<Connector URIEncoding="UTF-8" connectionTimeout="20000" port="8080" protocol="HTTP/1.1" redirectPort="8443"/>
1. 使用get請求傳GBK編碼http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%CE%D2%C3%C7%22%2C%22approveResult%22%3A%220%22%7D
輸出(亂碼):
keyword = {"approveComment":"????","approveResult":"0"}
2. 使用get請求傳UTF-8編碼
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%E6%88%91%E4%BB%AC%22%2C%22approveResult%22%3A%220%22%7D
輸出:
keyword = {"approveComment":"我們","approveResult":"0"}
3. 使用get請求傳ISO-8859-1編碼
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%3F%3F%22%2C%22approveResult%22%3A%220%22%7D
輸出(亂碼):
keyword = {"approveComment":"??","approveResult":"0"}
server.xml檔案(GBK)
1. 使用get請求傳GBK編碼
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%CE%D2%C3%C7%22%2C%22approveResult%22%3A%220%22%7D
輸出:
keyword = {"approveComment":"我們","approveResult":"0"}
2. 使用get請求傳UTF-8編碼
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%E6%88%91%E4%BB%AC%22%2C%22approveResult%22%3A%220%22%7D
輸出:
keyword = {"approveComment":"鎴戜滑","approveResult":"0"}
3. 使用get請求傳ISO-8859-1編碼
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%3F%3F%22%2C%22approveResult%22%3A%220%22%7D
輸出(亂碼):
keyword = {"approveComment":"??","approveResult":"0"}
server.xml檔案(ISO-8859-1)
1. 使用get請求傳GBK編碼
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%CE%D2%C3%C7%22%2C%22approveResult%22%3A%220%22%7D
輸出(亂碼):
keyword = {"approveComment":"????","approveResult":"0"}
2. 使用get請求傳UTF-8編碼
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%E6%88%91%E4%BB%AC%22%2C%22approveResult%22%3A%220%22%7D
輸出(亂碼):
keyword = {"approveComment":"??????","approveResult":"0"}
3. 使用get請求傳ISO-8859-1編碼
http://localhost:8080/simpleServlet/test?keyword=%7B%22approveComment%22%3A%22%3F%3F%22%2C%22approveResult%22%3A%220%22%7D
輸出(亂碼,為何):
keyword = {"approveComment":"??","approveResult":"0"}