只能推断一层哦。如果已经乱了,再转换就更乱了,就无法再推断了。呵呵。
- import java.io.UnsupportedEncodingException;
- public class TestEncode {
- public static void main(String[] args) {
- String utf_8_gbk = "";
- String gbk_utf_8 = "";
- String utf_8_gb2312 = "";
- String gb2312_utf_8 = "";
- String utf_8_iso8859_1 = "";
- String iso8859_1_utf_8 = "";
- String gbk_iso8859_1 = "";
- String iso8859_1_gbk = "";
- String iso8859_1_gb2312 = "";
- String gb2312_iso8859_1 = "";
- // UTF-8 GBK ISO8859-1 GB2312
- try {
- utf_8_gbk = new String("金永华2011".getBytes("UTF-8"), "GBK");
- gbk_utf_8 = new String("金永华2011".getBytes("GBK"), "UTF-8");
- utf_8_gb2312 = new String("金永华2011".getBytes("UTF-8"), "GB2312");
- gb2312_utf_8 = new String("金永华2011".getBytes("GB2312"), "UTF-8");
- utf_8_iso8859_1 = new String("金永华2011".getBytes("UTF-8"), "ISO8859-1");
- iso8859_1_utf_8 = new String("金永华2011".getBytes("ISO8859-1"), "UTF-8");
- gbk_iso8859_1 = new String("金永华2011".getBytes("GBK"), "ISO8859-1");
- iso8859_1_gbk = new String("金永华2011".getBytes("ISO8859-1"), "GBK");
- iso8859_1_gb2312 = new String("金永华2011".getBytes("ISO8859-1"), "GB2312");
- gb2312_iso8859_1 = new String("金永华2011".getBytes("GB2312"), "ISO8859-1");
- } catch (UnsupportedEncodingException e) {
- e.printStackTrace();
- }
- System.out.println("utf_8_gbk: " + utf_8_gbk);
- System.out.println("gbk_utf_8: " + gbk_utf_8);
- System.out.println("utf_8_gb2312: " + utf_8_gb2312);
- System.out.println("gb2312_utf_8: " + gb2312_utf_8);
- System.out.println("utf_8_iso8859_1: " + utf_8_iso8859_1);
- System.out.println("iso8859_1_utf_8: " + iso8859_1_utf_8);
- System.out.println("gbk_iso8859_1: " + gbk_iso8859_1);
- System.out.println("iso8859_1_gbk: " + iso8859_1_gbk);
- System.out.println("iso8859_1_gb2312: " + iso8859_1_gb2312);
- System.out.println("gb2312_iso8859_1: " + gb2312_iso8859_1);
- }
- }
在windows xp下打印:
utf_8_gbk: 閲戞案鍗�011
gbk_utf_8: ������2011
utf_8_gb2312: ��案��011
gb2312_utf_8: ������2011
utf_8_iso8859_1: 金永åŽ2011
iso8859_1_utf_8: ???2011
gbk_iso8859_1: ½ðÓÀ»ª2011
iso8859_1_gbk: ???2011
iso8859_1_gb2312: ???2011
gb2312_iso8859_1: ½ðÓÀ»ª2011
分析的方法是,对照中文和数字的乱码情形是上面那种,就能推断出乱码是经由哪种编码转换所形成的。以上方法对于推断乱码的形成,有一点作用。但,有时也使不上力,因为:转换后的编码很有可能是不可逆的(转换过程中可能存在信息丢失,iso8859-1例外)。
所以,解决乱码问题时,一定要找到原始的字节数组(字符串在内存中的真实本质内容),然后,再结合表象,分析出它的编码方式。或者,更准确的方式是拿非乱码的文本的getBytes("xxx编码");得到的数组 与 上面得到的原始的字节数组去比较,获得确切的编码方式。然后,可以做如下编码转换:
String str = new String(字符串原始字符数组, 得到该原始字符数组的确切的编码方式);
这样,str就变成非乱码了。在内存中会转化为正确的unicode码。乱码解决的大致思路就是这样的。
再啰嗦一下,编码转换很有可能会导致信息丢失,导致无法反向转回来。譬如,UTF-8编码的字符串:“金永华2011”,转换成GBK编码,然后,反向转回来(从GBK转回UTF-8),此时,已经无法得到最初的“金永华2011”的UTF-8的编码原始byte数组。看代码:
- try {
- utf_8_gbk = new String("金永华2011".getBytes("UTF-8"), "GBK");
- target = new String(utf_8_gbk.getBytes("GBK"), "UTF-8");
- } catch (UnsupportedEncodingException e) {
- e.printStackTrace();
- }
- System.out.println("utf_8_gbk: " + utf_8_gbk);
- System.out.println("target: " + target);
执行结果是:
utf_8_gbk: 閲戞案鍗�011
target: 金永�?011
可见,只是还原出了部分汉字,还是有部分汉字存在乱码。