JDK1.8原始碼(三)——java.lang.String類

一、概述

1、介紹

　　String是一個final類，不可被繼承，代表不可變的字元序列，是一個類型別的變數。Java程式中的所有字串字面量（如"abc"）都作為此類的例項實現，"abc"是一個物件。
字串是常量，建立之後不能更改，包括該類後續的所有方法都是不能修改該物件的，直至該物件被銷燬（該類的一些方法看似改變了字串，其實內部都是建立一個新的字串）。
　　String物件的字元內容是儲存在一個字元陣列 value[] 中的。

二、類原始碼

1、類宣告

原始碼示例：

 1  * @author  Lee Boynton

 2  * @author  Arthur van Hoff

 3  * @author  Martin Buchholz

 4  * @author  Ulf Zibis

 5  * @see     java.lang.Object#toString()

 6  * @see     java.lang.StringBuffer

 7  * @see     java.lang.StringBuilder

 8  * @see     java.nio.charset.Charset

 9  * @since   JDK1.0

10  */

11 public final class String

12     implements java.io.Serializable, Comparable<String>, CharSequence {}

　　實現了 Serializable 介面，標識該類可序列化。
　　實現了 Comparable 介面，用於比較兩個字串的大小。
　　實現了 CharSequence 介面，表示是一個有序字元的集合。

2、類屬性

　　原始碼示例：讀一下原始碼中的英文註釋。

 1 // 被用於儲存字元

 2 /** The value is used for character storage. */

 3 private final char value[];

 4

 5 // 用於快取字串的雜湊碼.預設是 0

 6 /** Cache the hash code for the string */

 7 private int hash; // Default to 0

 8

 9 // 實現序列化標識後的UID

10 /** use serialVersionUID from JDK 1.0.2 for interoperability */

11 private static final long serialVersionUID = -6849794470754667710L;

　　可以看到，String 底層維護了一個 final 的 char[] 。

3、類構造器

　　String 類有多個過載的構造器。

4、equals() 方法

　　String 類重寫了 equals 方法，比較的是組成字串的每一個字元是否相同，如果都相同則返回true，否則返回false。
　　原始碼示例：

 1 public boolean equals(Object anObject) {

 2     // 如果引用相同,則為true

 3     if (this == anObject) {

 4         return true;

 5     }

 6     if (anObject instanceof String) {

 7         String anotherString = (String)anObject;

 8         int n = value.length;

 9         // 判斷入參與當前 String 長度是否一致

10         if (n == anotherString.value.length) {

11             char v1[] = value;

12             char v2[] = anotherString.value;

13             int i = 0;

14

15             // 迴圈判斷兩個字串的每一個字元是否相同

16             while (n-- != 0) {

17                 if (v1[i] != v2[i])

18                     return false;

19                 i++;

20             }

21             return true;

22         }

23     }

24     return false;

25 }

5、hashCode() 方法

　　原始碼示例：

 1 public int hashCode() {

 2     int h = hash;

 3     // 判斷快取起來的雜湊值是否為 0 且字元長度大於0

 4     if (h == 0 && value.length > 0) {

 5         char val[] = value;

 6

 7         // 字串每一個字元都參與 雜湊值 的計算

 8         for (int i = 0; i < value.length; i++) {

 9             h = 31 * h + val[i]; // 為什麼是 31 ？

10         }

11         hash = h;

12     }

13     return h;

14 }

　　這個方法不難讀懂，中間的 for 迴圈，計算公式如下：

　　s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

　　這裡，為什麼選擇31作為乘積因子，而且沒有用一個常量來宣告？主要原因有兩個：
　　①31是一個不大不小的質數，是作為 hashCode 乘子的優選質數之一。
　　②31可以被 JVM 優化，31 * i = (i << 5) - i。因為移位運算比乘法執行更快更省效能。
　　具體解釋可以參考這篇文章。

6、charAt() 方法

　　原始碼示例：

1 public char charAt(int index) {

2     // 判斷索引是否越界

3     if ((index < 0) || (index >= value.length)) {

4         throw new StringIndexOutOfBoundsException(index);

5     }

6

7     // 根據索引下標返回陣列中字元

8     return value[index];

9 }

7、compareTo() 和 compareToIgnoreCase() 方法

　　原始碼示例：

 1 public int compareTo(String anotherString) {

 2     int len1 = value.length;

 3     int len2 = anotherString.value.length;

 4

 5     // 取當前字串與入參字串的長度最小值

 6     int lim = Math.min(len1, len2);

 7     char v1[] = value;

 8     char v2[] = anotherString.value;

 9

10     int k = 0;

11     // 迴圈比較兩個字串的 字元

12     while (k < lim) {

13         char c1 = v1[k];

14         char c2 = v2[k];

15

16         // 如果不相等了,返回他們的 ASCII 差值

17         if (c1 != c2) {

18             return c1 - c2;

19         }

20         k++;

21     }

22

23     // 若 lim 的長度值都相同,返回兩個字串長度之差。

24     return len1 - len2;

25 }

26

27

28 public int compareToIgnoreCase(String str) {

29     return CASE_INSENSITIVE_ORDER.compare(this, str);

30 }

　　compareToIgnoreCase() 方法在 compareTo 方法的基礎上忽略大小寫，我們知道大寫字母是比小寫字母的 ASCII 值小32的。

8、concat() 方法

　　該方法是將指定的字串拼接到該字串的末尾。
　　原始碼示例：

 1 public String concat(String str) {

 2     int otherLen = str.length();

 3     // 如果拼接的字串長度為 0 ,返回當前字串本身.

 4     if (otherLen == 0) {

 5         return this;

 6     }

 7

 8     int len = value.length;

 9     // 該方法可以拷貝 value 陣列中的值到長度為 len + otherLen 的陣列中

10     // 前面是 value 字元,後面是空

11     char buf[] = Arrays.copyOf(value, len + otherLen);

12

13     // 將要拼接的字串放入新陣列 buf 後面為空的位置。

14     str.getChars(buf, len);

15

16     // 重新通過 new 關鍵字建立了一個新的字串，原字串是不變的。

17     return new String(buf, true);

18 }

　　注意：最後重新通過 new 關鍵字建立了一個新的字串，原字串是不變的。這裡也體現了字元序列的不可變性。

9、indexOf() 方法

　　返回指定字元第一次出現的此字串中的索引。
　　原始碼示例：

 1 public int indexOf(int ch) {

 2     // 從第一個字元開始搜尋

 3     return indexOf(ch, 0);

 4 }

 5

 6 // 從第 fromIndex 個字元開始搜尋

 7 public int indexOf(int ch, int fromIndex) {

 8     final int max = value.length;

 9     // 小於0, 預設從 0 開始搜尋

10     if (fromIndex < 0) {

11         fromIndex = 0;

12     } else if (fromIndex >= max) {

13         // Note: fromIndex might be near -1>>>1.

14

15         // 大於了字串的長度,預設直接找不到,返回 -1

16         return -1;

17     }

18

19     //一個char佔用兩個位元組，如果ch小於2的16次方（65536），絕大多數字符都在此範圍內

20     if (ch < Character.MIN_SUPPLEMENTARY_CODE_POINT) {

21         // handle most cases here (ch is a BMP code point or a

22         // negative value (invalid code point))

23         final char[] value = this.value;

24

25         // 迴圈從fromIndex開始查詢每一個字元是否是ch

26         for (int i = fromIndex; i < max; i++) {

27             if (value[i] == ch) {

28                 return i;

29             }

30         }

31

32         // 找不到,返回 -1

33         return -1;

34     } else {

35         // 當字元大於65536,判斷是否是有效字元,然後依次進行比較

36         return indexOfSupplementary(ch, fromIndex);

37     }

38 }

10、split() 方法

　　將該字串按指定的正則表示式進行切割。對於 split(String regex,int limit) 中 limit 的取值有三種情況：
　　①、limit > 0 ，則pattern（模式）應用 n - 1 次

1 String str = "a,b,c";

2 String[] c1 = str.split(",", 2);

3

4 System.out.println(c1.length); // 2

5 System.out.println(Arrays.toString(c1)); // {"a","b,c"}

　　②、limit = 0 ，則pattern（模式）應用無限次並且省略末尾的空字串

1 String str = "a,b,c,,";

2 String[] c1 = str.split(",", 0);

3

4 System.out.println(c1.length); // 3

5 System.out.println(Arrays.toString(c1)); // {"a","b","c"}

　　③、limit < 0 ，則pattern（模式）應用無限次

1 String str = "a,b,c,,";

2 String[] c1 = str.split(",", -1);

3

4 System.out.println(c1.length); // 5

5 System.out.println(Arrays.toString(c1)); // {"a","b","c","",""}

　　原始碼示例：

 1 public String[] split(String regex) {

 2     return split(regex, 0);

 3 }

 4

 5 public String[] split(String regex, int limit) {

 6

 7     /* 1、單個字元，且不是".$|()[{^?*+\\"其中一個

 8      * 2、兩個字元，第一個是"\",第二個大小寫字母或者數字

 9      */

10     /* fastpath if the regex is a

11      (1)one-char String and this character is not one of the

12         RegEx's meta characters ".$|()[{^?*+\\", or

13      (2)two-char String and the first char is the backslash and

14         the second is not the ascii digit or ascii letter.

15      */

16     char ch = 0;

17     if (((regex.value.length == 1 &&

18          ".$|()[{^?*+\\".indexOf(ch = regex.charAt(0)) == -1) ||

19          (regex.length() == 2 &&

20           regex.charAt(0) == '\\' &&

21           (((ch = regex.charAt(1))-'0')|('9'-ch)) < 0 &&

22           ((ch-'a')|('z'-ch)) < 0 &&

23           ((ch-'A')|('Z'-ch)) < 0)) &&

24         (ch < Character.MIN_HIGH_SURROGATE ||

25          ch > Character.MAX_LOW_SURROGATE))

26     {

27         int off = 0;

28         int next = 0;

29

30         // 判斷模式

31         boolean limited = limit > 0;

32         ArrayList<String> list = new ArrayList<>();

33         while ((next = indexOf(ch, off)) != -1) {

34             // 當引數limit <= 0 或者 集合list的長度小於 limit-1

35             if (!limited || list.size() < limit - 1) {

36                 list.add(substring(off, next));

37                 off = next + 1;

38             } else {    // last one

39                 //assert (list.size() == limit - 1);

40                 // 判斷最後一個list.size() == limit - 1

41                 list.add(substring(off, value.length));

42                 off = value.length;

43                 break;

44             }

45         }

46         // If no match was found, return this

47         // 如果沒有一個能匹配的,返回一個新的字串,內容和原來的一樣

48         if (off == 0)

49             return new String[]{this};

50

51         // Add remaining segment

52         // 當 limit<=0 時,limited==false,或者集合的長度 小於 limit時，擷取新增剩下的字串

53         if (!limited || list.size() < limit)

54             list.add(substring(off, value.length));

55

56         // Construct result

57         // 當 limit == 0 時,如果末尾新增的元素為空(長度為0),則集合長度不斷減1,直到末尾不為空

58         int resultSize = list.size();

59         if (limit == 0) {

60             while (resultSize > 0 && list.get(resultSize - 1).length() == 0) {

61                 resultSize--;

62             }

63         }

64         String[] result = new String[resultSize];

65         return list.subList(0, resultSize).toArray(result);

66     }

67     return Pattern.compile(regex).split(this, limit);

68 }

11、replace() 和 replaceAll() 方法

　　①將原字串中所有的oldChar字元都替換成newChar字元，返回一個新的字串。
　　②將匹配正則表示式regex的匹配項都替換成replacement字串，返回一個新的字串。
　　原始碼示例：

 1 public String replace(char oldChar, char newChar) {

 2     if (oldChar != newChar) {

 3         int len = value.length;

 4         int i = -1;

 5         char[] val = value; /* avoid getfield opcode */

 6

 7         // 找到 value 中的 oldChar 起始位置

 8         while (++i < len) {

 9             if (val[i] == oldChar) {

10                 break;

11             }

12         }

13

14         if (i < len) {

15             char buf[] = new char[len];

16             // 將前面的欄位放入buf

17             for (int j = 0; j < i; j++) {

18                 buf[j] = val[j];

19             }

20             // 遍歷 i 後面的字元

21             while (i < len) {

22                 char c = val[i];

23                 // 將 oldChar 替換成 newChar 放入buf

24                 buf[i] = (c == oldChar) ? newChar : c;

25                 i++;

26             }

27             // 重新通過 new 關鍵字建立了一個新的字串，原字串是不變的。

28             return new String(buf, true);

29         }

30     }

31     return this;

32 }

12、substring() 方法

　　①返回一個從索引 beginIndex 開始一直到結尾的子字串。
　　②返回一個從索引 beginIndex 開始，到 endIndex 結尾的子字串。
　　原始碼示例：

 1 public String substring(int beginIndex) {

 2     if (beginIndex < 0) {

 3         throw new StringIndexOutOfBoundsException(beginIndex);

 4     }

 5

 6     // 表示從 beginIndex 開始

 7     int subLen = value.length - beginIndex;

 8     if (subLen < 0) {

 9         throw new StringIndexOutOfBoundsException(subLen);

10     }

11

12     // 如果索引值beginIdex == 0,直接返回原字串

13     // 如果不等於0,則返回從beginIndex開始,一直到結尾

14     return (beginIndex == 0) ? this : new String(value, beginIndex, subLen);

15 }

13、intern() 方法

　　這是一個本地方法：返回String物件在常量池中的引用。詳情可以參考這篇文章。

1 public native String intern();

　　呼叫一個String物件的intern()方法，如果常量池中：
　　有，直接返回該字串的引用（存在堆中就返回堆中，存在池中就返回池中）。
　　沒有，則將該物件新增到池中，並返回池中的引用。

 1 String str1 = "hello"; // 字面量 只會在常量池中建立物件

 2 String str2 = str1.intern();

 3 System.out.println(str1 == str2); //true

 4

 5 String str3 = new String("world"); // new 關鍵字只會在堆中建立物件

 6 String str4 = str3.intern();

 7 System.out.println(str3 == str4); // false

 8

 9 String str5 = str1 + str2; // 變數拼接的字串，會在常量池中和堆中都建立物件

10 String str6 = str5.intern(); // 這裡由於池中已經有物件了，返回池中的引用

11 System.out.println(str5 == str6); // true

12

13 String str7 = "hello1" + "world1"; // 常量拼接的字串，只會在常量池中建立物件

14 String str8 = str7.intern();

15 System.out.println(str7 == str8); // true

三、String 真的不可變嗎?

　　String 字串是由許多單個字元組成的，存放在char[] value 字元陣列中。
　　value 被 final 修飾，只能保證引用不被改變，但是 value 所指向的堆中的陣列，才是真實存放的資料，只要能夠操作堆中的陣列，依舊能改變資料。而且 value 是基本型別構成，那麼一定是可變的，即使被宣告為 private，我們也可以通過反射來改變。
　　程式碼示例：

 1 public class Main {

 2     public static void main(String[] args) throws Exception {

 3         String str = "vae";

 4         System.out.println(str); // vae

 5         // 獲取String類中名為 value 的欄位

 6         Field fieldStr = String.class.getDeclaredField("value");

 7         // 因為value是private的,這裡修改其訪問許可權

 8         fieldStr.setAccessible(true);

 9

10         // 獲取str物件上的value屬性的值

11         char[] value = (char[]) fieldStr.get(str);

12

13         // 將第一個字元修改為 V(小寫改大寫)

14         value[0] = 'V';

15         System.out.println(str); // Vae

16     }

17 }

　　顯然：String 被改變了。但是在程式碼裡，幾乎不會使用反射的機制去操作 String 字串，所以，依然認為 String 型別是不可變的。

　　那麼，為什麼String 類被設計成不可變呢？

　　安全：
　　①引發安全問題。比如：資料庫的使用者名稱、密碼都是以字串的形式傳入來獲得資料庫的連線；在socket程式設計中，主機名和埠都是以字串的形式傳入。若改變字串指向的物件的值，會造成安全漏洞。
　　②保證執行緒安全。在併發場景下，多個執行緒同時讀寫資源時，會引競態條件，由於 String 是不可變的，不會引發執行緒的問題而保證了執行緒。
　　③HashCode。當 String 被創建出來的時候，hashcode也會隨之被快取，hashcode的計算與value有關。若 String 可變，那麼 hashcode 也會隨之變化，針對於 Map、Set 等容器，他們的鍵值需要保證唯一性和一致性，因此，String 的不可變性使其比其他物件更適合當容器的鍵值。
　　效能：
　　當字串是不可變時，字串常量池才有意義。字串常量池的出現，可以減少建立相同字面量的字串，讓不同的引用指向池中同一個字串，為執行時節約很多的堆記憶體。若字串可變，字串常量池失去意義，基於常量池的String.intern()方法也失效，每次建立新的 String 將在堆內開闢出新的空間，佔據更多的記憶體。

JDK1.8原始碼(三)——java.lang.String類