1. 程式人生 > >java將GBK轉為utf-8

java將GBK轉為utf-8

/**
 * 將一個GBK編碼的txt文件轉化為UTF-8的XML檔案
 * @author SUNBIN
 *
 */
public class ConvertXML {
	public static void main(String[] args) {
		getXML("敏感詞庫大全.txt");
	}
	public static void getXML(String path){
		try {
			//獲取txt檔案
			File file = new File(path);
			//獲取xml檔案 
			File xmlFile = new File("src/sensitive.xml");
			if(!xmlFile.exists()){//不存在則建立檔案
				xmlFile.createNewFile();
			}
			//InputStreamReader讀取GBK檔案
			InputStreamReader isr = new InputStreamReader(new FileInputStream(file), "GBK");
//OutputStreamWriter輸出UTF-8檔案 OutputStreamWriter osw = new OutputStreamWriter(new FileOutputStream(xmlFile), "UTF-8"); //包裝一下,以便一行一行讀寫檔案 BufferedReader bufr = new BufferedReader(isr); BufferedWriter bufw = new BufferedWriter(osw); String line = null; bufw.write("<?xml version=\"1.0\" encoding=\"UTF-8\"?>"+"\n");//xml頭 bufw.write("<sensitiveWords>"+"\n");//根標籤 int id=1; while((line = bufr.readLine())!=null){//讀一行寫一行 if(!"".equals(line.trim())){ bufw.write("\t"); bufw.write("<sensitiveWord id=\""+(id++)+"\">"); bufw.write(line); bufw.write("</sensitiveWord>"+"\n"); } } bufw.write("</sensitiveWords>");//根結束標籤 bufw.flush(); bufr.close(); bufw.close(); } catch (FileNotFoundException e) { e.printStackTrace(); } catch (IOException e) { e.printStackTrace(); } } }

Java使用UNICODE編碼,可以在讀檔案時使用GBK,此時記憶體中存在的是GBK轉化為UNICODE儲存的,寫出時採用UTF-8寫出(UNICODE轉化為UTF-8)。

同理:我們可以利用java的這一特性進行各種不同規範的編碼轉化.