1. 程式人生 > >Java筆記之java.lang.String#trim

Java筆記之java.lang.String#trim

abc || splay remove stat 頻率 package oca charat

String的trim()方法是使用頻率頻率很高的一個方法,直到不久前我不確定trim去除兩端的空白符時對換行符是怎麽處理的點進去看了下源碼的實現,才發現String#trim的實現跟我想像的完全不一樣,原來一直以來我對這個函數存在著很深的誤解。

我想的trim方法是類似於下面這樣的:

package cc11001100.trimStudy;

/**
 * @author CC11001100
 */
public class CustomString {

	private char[] values;

	public CustomString(char[] values) {
		this.values = values;
	}

	// ...

	public CustomString trim() {
		char[] localValues = values;
		int left = 0, right = localValues.length;
		while (left < right && isBlankChar(localValues[left])) {
			left++;
		}
		while (right > left && isBlankChar(localValues[right - 1])) {
			right--;
		}
		if (left != 0 || right != localValues.length) {
			char[] newValue = new char[right - left];
			System.arraycopy(localValues, left, newValue, 0, newValue.length);
			return new CustomString(newValue);
		} else {
			return this;
		}
	}

	private boolean isBlankChar(char c) {
		return c == ‘ ‘ || c == ‘\t‘ || c == ‘\r‘ || c == ‘\n‘;
	}

	@Override
	public String toString() {
		return new java.lang.String(values);
	}

	// ...

}

即去除字符串兩邊的回車換行、制表符、回車換行符等等,然而String#trim的實際實現是這樣的:

/**
 * Returns a string whose value is this string, with any leading and trailing
 * whitespace removed.
 * <p>
 * If this {@code String} object represents an empty character
 * sequence, or the first and last characters of character sequence
 * represented by this {@code String} object both have codes
 * greater than {@code ‘\u005Cu0020‘} (the space character), then a
 * reference to this {@code String} object is returned.
 * <p>
 * Otherwise, if there is no character with a code greater than
 * {@code ‘\u005Cu0020‘} in the string, then a
 * {@code String} object representing an empty string is
 * returned.
 * <p>
 * Otherwise, let <i>k</i> be the index of the first character in the
 * string whose code is greater than {@code ‘\u005Cu0020‘}, and let
 * <i>m</i> be the index of the last character in the string whose code
 * is greater than {@code ‘\u005Cu0020‘}. A {@code String}
 * object is returned, representing the substring of this string that
 * begins with the character at index <i>k</i> and ends with the
 * character at index <i>m</i>-that is, the result of
 * {@code this.substring(k, m + 1)}.
 * <p>
 * This method may be used to trim whitespace (as defined above) from
 * the beginning and end of a string.
 *
 * @return  A string whose value is this string, with any leading and trailing white
 *          space removed, or this string if it has no leading or
 *          trailing white space.
 */
public String trim() {
    int len = value.length;
    int st = 0;
    char[] val = value;    /* avoid getfield opcode */

    while ((st < len) && (val[st] <= ‘ ‘)) {
        st++;
    }
    while ((st < len) && (val[len - 1] <= ‘ ‘)) {
        len--;
    }
    return ((st > 0) || (len < value.length)) ? substring(st, len) : this;
}

會將字符串兩側小於空格的字符都去除掉,這裏可以簡單的將\u005Cu0020理解為ASCII 0x20,即十進制的32,在ASCII碼表中小於等於32的字符都將被去除:

技術分享圖片

先來看一下trim必須要去除的幾個字符:

\t是9

\r是13

\n是10

這幾個字符倒是都小於空格,而且前31位都是不可見字符,32是空格,這樣做的話好像也沒有太大的毛病,只是以後再使用trim的時候要想一下自己的數據有沒有可能出現小於32不是空格制表符換行之類又需要保留的。

下面是對String#trim的一個簡單測試:

package cc11001100.trimStudy;

/**
 * @author CC11001100
 */
public class TrimStudy {

	public static void main(String[] args) {

		StringBuilder sb = new StringBuilder();
		for (int i = 0; i < 128; i++) {
			sb.append((char) i);
		}
		String s = sb.toString().trim();
		// trim效果
		System.out.println("-" + s + "-");
		// trim之後第一個字符的ASCII碼
		System.out.println((int) s.charAt(0));
		// 刪除
		System.out.println((char) 127);
		// 查看其它空白字符的打印效果
		System.out.println(sb.toString());

	}

}

運行結果:
技術分享圖片

註意ASCII 127刪除字符應該也可以算作是不可見的空白字符。

後來我不死心,又去找了被依賴超多次數的Apache commons-lang中StringUtils#trim的實現:

/**
 * <p>Removes control characters (char &lt;= 32) from both
 * ends of this String, handling <code>null</code> by returning
 * <code>null</code>.</p>
 *
 * <p>The String is trimmed using {@link String#trim()}.
 * Trim removes start and end characters &lt;= 32.
 * To strip whitespace use {@link #strip(String)}.</p>
 *
 * <p>To trim your choice of characters, use the
 * {@link #strip(String, String)} methods.</p>
 *
 * <pre>
 * StringUtils.trim(null)          = null
 * StringUtils.trim("")            = ""
 * StringUtils.trim("     ")       = ""
 * StringUtils.trim("abc")         = "abc"
 * StringUtils.trim("    abc    ") = "abc"
 * </pre>
 *
 * @param str  the String to be trimmed, may be null
 * @return the trimmed string, <code>null</code> if null String input
 */
public static String trim(String str) {
    return str == null ? null : str.trim();
}

然而也只是調用了String#trim,也不是我想象的那樣….

看來我一直以來都對trim有著很深的誤解,trim是編程中對字符串處理的一個比較通用的概念,也不知道其它語言的具體實現是怎樣的。

.

Java筆記之java.lang.String#trim