深入學習java原始碼之isSpaceChar()與reverseBytes()
深入學習java原始碼之isSpaceChar()與reverseBytes()
Character類包裝一個物件中的基本型別char的值。 型別為Character的物件包含一個單一欄位,其型別為char 。
此外,該類還提供了幾種方法來確定字元的類別(小寫字母,數字等),並將字元從大寫轉換為小寫,反之亦然。
字元資訊基於Unicode標準版本6.2.0。
的方法和類的資料Character通過在UnicodeData檔案的是由Unicode Consortium維護的Unicode字元資料庫的一部分的資訊來定義。 該檔案為每個定義的Unicode程式碼點或字元範圍指定各種屬性,包括名稱和常規類別。
char資料型別(因此Character物件封裝的值)基於原始Unicode規範,其將字元定義為固定寬度的16位實體。 Unicode標準已經被更改為允許其表示需要超過16位的字元。 法定程式碼點的範圍現在是U + 0000到U + 10FFFF,稱為Unicode標量值 。 (請參閱Unicode標準中U + n符號的 definition。 )
The set of characters from U+0000 to U+FFFF有時被稱為基本多語言平面(BMP) 。 Characters其程式碼點大於U + FFFF稱為增補字元秒。 Java平臺在char陣列和String和StringBuffer類中使用UTF-16表示。 在此表示中,補充字元表示為一對char值,第一個來自高替代範圍(\ uD800- \ uDBFF),第二個來自低代理範圍(\ uDC00- \ uDFFF)。
因此, char值代表基本多語言平面(BMP)程式碼點,包括程式碼程式碼點或UTF-16編碼的程式碼單位。 int值代表所有Unicode程式碼點,包括補充程式碼點。 下(至少顯著)的21個位元int用於表示Unicode程式碼點和上部(最顯著)11位必須為零。 除非另有說明,關於補充字元和代數char值的行為如下:
僅接受char值的方法不能支援補充字元。 他們將char範圍中的char值視為未定義的字元。 例如, Character.isLetter('\uD840')返回false ,即使這個特定值如果後面跟著一個字串中的任何低代理值都會表示一個字母。
接受int值的方法支援所有Unicode字元,包括補充字元。 例如, Character.isLetter(0x2F81A)返回true ,因為程式碼點值表示一個字母(一個CJK表意文字)。
在Java SE API文件中, Unicode程式碼點用於U + 0000和U + 10FFFF之間的字元值, Unicode程式碼單位用作UTF-16編碼的16位char值。 有關Unicode術語的更多資訊
方法
Modifier and Type | Method and Description |
---|---|
static int |
charCount(int codePoint) 確定代表指定字元(Unicode程式碼點)所需的 |
char |
charValue() 返回此 |
static int |
codePointAt(char[] a, int index) 返回 |
static int |
codePointAt(char[] a, int index, int limit) 返回 |
static int |
codePointAt(CharSequence seq, int index) 返回 |
static int |
codePointBefore(char[] a, int index) 返回 |
static int |
codePointBefore(char[] a, int index, int start) 返回 |
static int |
codePointBefore(CharSequence seq, int index) 返回的給定索引前面的程式碼點 |
static int |
codePointCount(char[] a, int offset, int count) 返回 |
static int |
codePointCount(CharSequence seq, int beginIndex, int endIndex) 返回指定字元序列的文字範圍內的Unicode程式碼點數。 |
static int |
compare(char x, char y) 數值比較兩個 |
int |
compareTo(Character anotherCharacter) 數字比較兩個 |
static int |
digit(char ch, int radix) 返回指定基數中字元 |
static int |
digit(int codePoint, int radix) 返回指定基數中指定字元(Unicode程式碼點)的數值。 |
boolean |
equals(Object obj) 將此物件與指定物件進行比較。 |
static char |
forDigit(int digit, int radix) 確定指定基數中特定數字的字元表示。 |
static byte |
getDirectionality(char ch) 返回給定字元的Unicode方向屬性。 |
static byte |
getDirectionality(int codePoint) 返回給定字元的Unicode方向性屬性(Unicode程式碼點)。 |
static String |
getName(int codePoint) 返回指定字元的Unicode名稱 |
static int |
getNumericValue(char ch) 返回指定的Unicode字元代表的 |
static int |
getNumericValue(int codePoint) 返回 |
static int |
getType(char ch) 返回一個值,表示一個字元的一般類別。 |
static int |
getType(int codePoint) 返回一個值,表示一個字元的一般類別。 |
int |
hashCode() 返回這個 |
static int |
hashCode(char value) 返回一個 |
static char |
highSurrogate(int codePoint) 返回主導替代(一個 high surrogate code unit所述的) surrogate pair表示在UTF-16編碼指定的補充的字元(Unicode程式碼點)。 |
static boolean |
isAlphabetic(int codePoint) 確定指定的字元(Unicode程式碼點)是否是字母表。 |
static boolean |
isBmpCodePoint(int codePoint) 確定指定的字元(Unicode程式碼點)是否在 Basic Multilingual Plane (BMP)中 。 |
static boolean |
isDefined(char ch) 確定字元是否以Unicode定義。 |
static boolean |
isDefined(int codePoint) 確定Unicode中是否定義了一個字元(Unicode程式碼點)。 |
static boolean |
isDigit(char ch) 確定指定的字元是否是數字。 |
static boolean |
isDigit(int codePoint) 確定指定的字元(Unicode程式碼點)是否為數字。 |
static boolean |
isHighSurrogate(char ch) 確定給定的 |
static boolean |
isIdentifierIgnorable(char ch) 確定指定的字元是否應被視為Java識別符號或Unicode識別符號中的可忽略字元。 |
static boolean |
isIdentifierIgnorable(int codePoint) 確定指定字元(Unicode程式碼點)是否應被視為Java識別符號或Unicode識別符號中的可忽略字元。 |
static boolean |
isIdeographic(int codePoint) 確定指定字元(Unicode程式碼點)是否是Unicode標準定義的CJKV(中文,日文,韓文和越南文)表意文字。 |
static boolean |
isISOControl(char ch) 確定指定的字元是否是ISO控制字元。 |
static boolean |
isISOControl(int codePoint) 確定引用的字元(Unicode程式碼點)是否是ISO控制字元。 |
static boolean |
isJavaIdentifierPart(char ch) 確定指定的字元是否可以是Java識別符號的一部分,而不是第一個字元。 |
static boolean |
isJavaIdentifierPart(int codePoint) 確定字元(Unicode程式碼點)可能是Java識別符號的一部分,而不是第一個字元。 |
static boolean |
isJavaIdentifierStart(char ch) 確定指定字元是否允許作為Java識別符號中的第一個字元。 |
static boolean |
isJavaIdentifierStart(int codePoint) 確定字元(Unicode程式碼點)是否允許作為Java識別符號中的第一個字元。 |
static boolean |
isJavaLetter(char ch) 已棄用 替換為isJavaIdentifierStart(char)。 |
static boolean |
isJavaLetterOrDigit(char ch) 已棄用 由isJavaIdentifierPart(char)替代。 |
static boolean |
isLetter(char ch) 確定指定的字元是否是一個字母。 |
static boolean |
isLetter(int codePoint) 確定指定的字元(Unicode程式碼點)是否是一個字母。 |
static boolean |
isLetterOrDigit(char ch) 確定指定的字元是字母還是數字。 |
static boolean |
isLetterOrDigit(int codePoint) 確定指定的字元(Unicode程式碼點)是字母還是數字。 |
static boolean |
isLowerCase(char ch) 確定指定的字元是否是小寫字元。 |
static boolean |
isLowerCase(int codePoint) 確定指定的字元(Unicode程式碼點)是否是小寫字元。 |
static boolean |
isLowSurrogate(char ch) 確定給定的 |
static boolean |
isMirrored(char ch) 根據Unicode規範確定字元是否映象。 |
static boolean |
isMirrored(int codePoint) 確定是否根據Unicode規範映象指定的字元(Unicode程式碼點)。 |
static boolean |
isSpace(char ch) 已棄用 替換為isWhitespace(char)。 |
static boolean |
isSpaceChar(char ch) 確定指定的字元是否是Unicode空格字元。 |
static boolean |
isSpaceChar(int codePoint) 確定指定字元(Unicode程式碼點)是否為Unicode空格字元。 |
static boolean |
isSupplementaryCodePoint(int codePoint) 確定指定字元(Unicode程式碼點)是否在 supplementary character範圍內。 |
static boolean |
isSurrogate(char ch) 確定給定的 |
static boolean |
isSurrogatePair(char high, char low) 確定指定的一對 |
static boolean |
isTitleCase(char ch) 確定指定的字元是否是一個titlecase字元。 |
static boolean |
isTitleCase(int codePoint) 確定指定的字元(Unicode程式碼點)是否是一個titlecase字元。 |
static boolean |
isUnicodeIdentifierPart(char ch) 確定指定的字元是否可以是Unicode識別符號的一部分,而不是第一個字元。 |
static boolean |
isUnicodeIdentifierPart(int codePoint) 確定指定的字元(Unicode程式碼點)是否可能是Unicode識別符號的一部分,而不是第一個字元。 |
static boolean |
isUnicodeIdentifierStart(char ch) 確定指定字元是否允許為Unicode識別符號中的第一個字元。 |
static boolean |
isUnicodeIdentifierStart(int codePoint) 確定Unicode識別符號中的第一個字元是否允許指定的字元(Unicode程式碼點)。 |
static boolean |
isUpperCase(char ch) 確定指定的字元是否為大寫字元。 |
static boolean |
isUpperCase(int codePoint) 確定指定的字元(Unicode程式碼點)是否為大寫字元。 |
static boolean |
isValidCodePoint(int codePoint) 確定指定的程式碼點是否有效 Unicode code point value 。 |
static boolean |
isWhitespace(char ch) 根據Java確定指定的字元是否為空格。 |
static boolean |
isWhitespace(int codePoint) 根據Java確定指定字元(Unicode程式碼點)是否為空格。 |
static char |
lowSurrogate(int codePoint) 返回尾隨替代(一個 low surrogate code unit所述的) surrogate pair表示在UTF-16編碼指定的補充的字元(Unicode程式碼點)。 |
static int |
offsetByCodePoints(char[] a, int start, int count, int index, int codePointOffset) 返回給定的 |
static int |
offsetByCodePoints(CharSequence seq, int index, int codePointOffset) 返回給定的char序列中與 |
static char |
reverseBytes(char ch) 返回通過反轉指定的 char值中的位元組順序獲得的值。 |
static char[] |
toChars(int codePoint) 將指定的字元(Unicode程式碼點)轉換為儲存在 |
static int |
toChars(int codePoint, char[] dst, int dstIndex) 將指定的字元(Unicode程式碼點)轉換為其UTF-16表示形式。 |
static int |
toCodePoint(char high, char low) 將指定的代理對轉換為其補充程式碼點值。 |
static char |
toLowerCase(char ch) 使用UnicodeData檔案中的大小寫對映資訊將字元引數轉換為小寫。 |
static int |
toLowerCase(int codePoint) 使用UnicodeData檔案中的大小寫對映資訊將字元(Unicode程式碼點)引數轉換為小寫。 |
String |
toString() 返回 |
static String |
toString(char c) 返回一個 |
static char |
toTitleCase(char ch) 使用UnicodeData檔案中的案例對映資訊將字元引數轉換為titlecase。 |
static int |
toTitleCase(int codePoint) 使用UnicodeData檔案中的案例對映資訊將字元(Unicode程式碼點)引數轉換為titlecase。 |
static char |
toUpperCase(char ch) 使用UnicodeData檔案中的案例對映資訊將字元引數轉換為大寫。 |
static int |
toUpperCase(int codePoint) 使用UnicodeData檔案中的案例對映資訊將字元(Unicode程式碼點)引數轉換為大寫。 |
static Character |
valueOf(char c) 返回一個 表示指定的 char值的 Character例項。 |
java原始碼
import java.util.Arrays;
import java.util.Map;
import java.util.HashMap;
import java.util.Locale;
public final
class Character implements java.io.Serializable, Comparable<Character> {
public static final int MIN_RADIX = 2;
public static final int MAX_RADIX = 36;
public static final char MIN_VALUE = '\u0000';
public static final char MAX_VALUE = '\uFFFF';
@SuppressWarnings("unchecked")
public static final Class<Character> TYPE = (Class<Character>) Class.getPrimitiveClass("char");
public static final byte UNASSIGNED = 0;
public static final byte UPPERCASE_LETTER = 1;
static final int ERROR = 0xFFFFFFFF;
public static final byte DIRECTIONALITY_UNDEFINED = -1;
public static final char MIN_HIGH_SURROGATE = '\uD800';
public static final char MAX_HIGH_SURROGATE = '\uDBFF';
public static final char MIN_SURROGATE = MIN_HIGH_SURROGATE;
private final char value;
private static final long serialVersionUID = 3786198910865385080L;
public Character(char value) {
this.value = value;
}
private static class CharacterCache {
private CharacterCache(){}
static final Character cache[] = new Character[127 + 1];
static {
for (int i = 0; i < cache.length; i++)
cache[i] = new Character((char)i);
}
}
public static boolean isJavaIdentifierPart(char ch) {
return isJavaIdentifierPart((int)ch);
}
public static boolean isJavaIdentifierPart(int codePoint) {
return CharacterData.of(codePoint).isJavaIdentifierPart(codePoint);
}
public static boolean isUnicodeIdentifierStart(char ch) {
return isUnicodeIdentifierStart((int)ch);
}
public static boolean isUnicodeIdentifierStart(int codePoint) {
return CharacterData.of(codePoint).isUnicodeIdentifierStart(codePoint);
}
public static boolean isUnicodeIdentifierPart(char ch) {
return isUnicodeIdentifierPart((int)ch);
}
public static boolean isUnicodeIdentifierPart(int codePoint) {
return CharacterData.of(codePoint).isUnicodeIdentifierPart(codePoint);
}
public static boolean isIdentifierIgnorable(char ch) {
return isIdentifierIgnorable((int)ch);
}
public static boolean isIdentifierIgnorable(int codePoint) {
return CharacterData.of(codePoint).isIdentifierIgnorable(codePoint);
}
public static char toLowerCase(char ch) {
return (char)toLowerCase((int)ch);
}
public static int toLowerCase(int codePoint) {
return CharacterData.of(codePoint).toLowerCase(codePoint);
}
public static char toUpperCase(char ch) {
return (char)toUpperCase((int)ch);
}
public static int toUpperCase(int codePoint) {
return CharacterData.of(codePoint).toUpperCase(codePoint);
}
public static char toTitleCase(char ch) {
return (char)toTitleCase((int)ch);
}
public static int toTitleCase(int codePoint) {
return CharacterData.of(codePoint).toTitleCase(codePoint);
}
public static int digit(char ch, int radix) {
return digit((int)ch, radix);
}
public static int digit(int codePoint, int radix) {
return CharacterData.of(codePoint).digit(codePoint, radix);
}
public static int getNumericValue(char ch) {
return getNumericValue((int)ch);
}
public static int getNumericValue(int codePoint) {
return CharacterData.of(codePoint).getNumericValue(codePoint);
}
@Deprecated
public static boolean isSpace(char ch) {
return (ch <= 0x0020) &&
(((((1L << 0x0009) |
(1L << 0x000A) |
(1L << 0x000C) |
(1L << 0x000D) |
(1L << 0x0020)) >> ch) & 1L) != 0);
}
public static boolean isSpaceChar(char ch) {
return isSpaceChar((int)ch);
}
public static boolean isSpaceChar(int codePoint) {
return ((((1 << Character.SPACE_SEPARATOR) |
(1 << Character.LINE_SEPARATOR) |
(1 << Character.PARAGRAPH_SEPARATOR)) >> getType(codePoint)) & 1)
!= 0;
}
public static boolean isWhitespace(char ch) {
return isWhitespace((int)ch);
}
public static boolean isWhitespace(int codePoint) {
return CharacterData.of(codePoint).isWhitespace(codePoint);
}
public static boolean isISOControl(char ch) {
return isISOControl((int)ch);
}
public static boolean isISOControl(int codePoint) {
// Optimized form of:
// (codePoint >= 0x00 && codePoint <= 0x1F) ||
// (codePoint >= 0x7F && codePoint <= 0x9F);
return codePoint <= 0x9F &&
(codePoint >= 0x7F || (codePoint >>> 5 == 0));
}
public static int getType(char ch) {
return getType((int)ch);
}
public static int getType(int codePoint) {
return CharacterData.of(codePoint).getType(codePoint);
}
public static char forDigit(int digit, int radix) {
if ((digit >= radix) || (digit < 0)) {
return '\0';
}
if ((radix < Character.MIN_RADIX) || (radix > Character.MAX_RADIX)) {
return '\0';
}
if (digit < 10) {
return (char)('0' + digit);
}
return (char)('a' - 10 + digit);
}
public static byte getDirectionality(char ch) {
return getDirectionality((int)ch);
}
public static byte getDirectionality(int codePoint) {
return CharacterData.of(codePoint).getDirectionality(codePoint);
}
public static boolean isMirrored(char ch) {
return isMirrored((int)ch);
}
public static boolean isMirrored(int codePoint) {
return CharacterData.of(codePoint).isMirrored(codePoint);
}
public int compareTo(Character anotherCharacter) {
return compare(this.value, anotherCharacter.value);
}
public static int compare(char x, char y) {
return x - y;
}
static int toUpperCaseEx(int codePoint) {
assert isValidCodePoint(codePoint);
return CharacterData.of(codePoint).toUpperCaseEx(codePoint);
}
static char[] toUpperCaseCharArray(int codePoint) {
// As of Unicode 6.0, 1:M uppercasings only happen in the BMP.
assert isBmpCodePoint(codePoint);
return CharacterData.of(codePoint).toUpperCaseCharArray(codePoint);
}
public static final int SIZE = 16;
public static final int BYTES = SIZE / Byte.SIZE;
public static char reverseBytes(char ch) {
return (char) (((ch & 0xFF00) >> 8) | (ch << 8));
}
public static String getName(int codePoint) {
if (!isValidCodePoint(codePoint)) {
throw new IllegalArgumentException();
}
String name = CharacterName.get(codePoint);
if (name != null)
return name;
if (getType(codePoint) == UNASSIGNED)
return null;
UnicodeBlock block = UnicodeBlock.of(codePoint);
if (block != null)
return block.toString().replace('_', ' ') + " "
+ Integer.toHexString(codePoint).toUpperCase(Locale.ENGLISH);
// should never come here
return Integer.toHexString(codePoint).toUpperCase(Locale.ENGLISH);
}
}
abstract class CharacterData {
abstract int getProperties(int ch);
abstract int getType(int ch);
abstract boolean isWhitespace(int ch);
abstract boolean isMirrored(int ch);
abstract boolean isJavaIdentifierStart(int ch);
abstract boolean isJavaIdentifierPart(int ch);
abstract boolean isUnicodeIdentifierStart(int ch);
abstract boolean isUnicodeIdentifierPart(int ch);
abstract boolean isIdentifierIgnorable(int ch);
abstract int toLowerCase(int ch);
abstract int toUpperCase(int ch);
abstract int toTitleCase(int ch);
abstract int digit(int ch, int radix);
abstract int getNumericValue(int ch);
abstract byte getDirectionality(int ch);
//need to implement for JSR204
int toUpperCaseEx(int ch) {
return toUpperCase(ch);
}
char[] toUpperCaseCharArray(int ch) {
return null;
}
boolean isOtherLowercase(int ch) {
return false;
}
boolean isOtherUppercase(int ch) {
return false;
}
boolean isOtherAlphabetic(int ch) {
return false;
}
boolean isIdeographic(int ch) {
return false;
}
// Character <= 0xff (basic latin) is handled by internal fast-path
// to avoid initializing large tables.
// Note: performance of this "fast-path" code may be sub-optimal
// in negative cases for some accessors due to complicated ranges.
// Should revisit after optimization of table initialization.
static final CharacterData of(int ch) {
if (ch >>> 8 == 0) { // fast-path
return CharacterDataLatin1.instance;
} else {
switch(ch >>> 16) { //plane 00-16
case(0):
return CharacterData00.instance;
case(1):
return CharacterData01.instance;
case(2):
return CharacterData02.instance;
case(14):
return CharacterData0E.instance;
case(15): // Private Use
case(16): // Private Use
return CharacterDataPrivateUse.instance;
default:
return CharacterDataUndefined.instance;
}
}
}
}
class CharacterDataLatin1 extends CharacterData {
int getProperties(int ch) {
char offset = (char)ch;
int props = A[offset];
return props;
}
int getPropertiesEx(int ch) {
char offset = (char)ch;
int props = B[offset];
return props;
}
boolean isOtherLowercase(int ch) {
int props = getPropertiesEx(ch);
return (props & 0x0001) != 0;
}
boolean isOtherUppercase(int ch) {
int props = getPropertiesEx(ch);
return (props & 0x0002) != 0;
}
boolean isOtherAlphabetic(int ch) {
int props = getPropertiesEx(ch);
return (props & 0x0004) != 0;
}
boolean isIdeographic(int ch) {
int props = getPropertiesEx(ch);
return (props & 0x0010) != 0;
}
int getType(int ch) {
int props = getProperties(ch);
return (props & 0x1F);
}
boolean isJavaIdentifierStart(int ch) {
int props = getProperties(ch);
return ((props & 0x00007000) >= 0x00005000);
}
boolean isJavaIdentifierPart(int ch) {
int props = getProperties(ch);
return ((props & 0x00003000) != 0);
}
boolean isUnicodeIdentifierStart(int ch) {
int props = getProperties(ch);
return ((props & 0x00007000) == 0x00007000);
}
boolean isUnicodeIdentifierPart(int ch) {
int props = getProperties(ch);
return ((props & 0x00001000) != 0);
}
boolean isIdentifierIgnorable(int ch) {
int props = getProperties(ch);
return ((props & 0x00007000) == 0x00001000);
}
int toLowerCase(int ch) {
int mapChar = ch;
int val = getProperties(ch);
if (((val & 0x00020000) != 0) &&
((val & 0x07FC0000) != 0x07FC0000)) {
int offset = val << 5 >> (5+18);
mapChar = ch + offset;
}
return mapChar;
}
int toUpperCase(int ch) {
int mapChar = ch;
int val = getProperties(ch);
if ((val & 0x00010000) != 0) {
if ((val & 0x07FC0000) != 0x07FC0000) {
int offset = val << 5 >> (5+18);
mapChar = ch - offset;
} else if (ch == 0x00B5) {
mapChar = 0x039C;
}
}
return mapChar;
}
int toTitleCase(int ch) {
return toUpperCase(ch);
}
int digit(int ch, int radix) {
int value = -1;
if (radix >= Character.MIN_RADIX && radix <= Character.MAX_RADIX) {
int val = getProperties(ch);
int kind = val & 0x1F;
if (kind == Character.DECIMAL_DIGIT_NUMBER) {
value = ch + ((val & 0x3E0) >> 5) & 0x1F;
}
else if ((val & 0xC00) == 0x00000C00) {
// Java supradecimal digit
value = (ch + ((val & 0x3E0) >> 5) & 0x1F) + 10;
}
}
return (value < radix) ? value : -1;
}
int getNumericValue(int ch) {
int val = getProperties(ch);
int retval = -1;
switch (val & 0xC00) {
default: // cannot occur
case (0x00000000): // not numeric
retval = -1;
break;
case (0x00000400): // simple numeric
retval = ch + ((val & 0x3E0) >> 5) & 0x1F;
break;
case (0x00000800) : // "strange" numeric
retval = -2;
break;
case (0x00000C00): // Java supradecimal
retval = (ch + ((val & 0x3E0) >> 5) & 0x1F) + 10;
break;
}
return retval;
}
boolean isWhitespace(int ch) {
int props = getProperties(ch);
return ((props & 0x00007000) == 0x00004000);
}
byte getDirectionality(int ch) {
int val = getProperties(ch);
byte directionality = (byte)((val & 0x78000000) >> 27);
if (directionality == 0xF ) {
directionality = -1;
}
return directionality;
}
boolean isMirrored(int ch) {
int props = getProperties(ch);
return ((props & 0x80000000) != 0);
}
int toUpperCaseEx(int ch) {
int mapChar = ch;
int val = getProperties(ch);
if ((val & 0x00010000) != 0) {
if ((val & 0x07FC0000) != 0x07FC0000) {
int offset = val << 5 >> (5+18);
mapChar = ch - offset;
}
else {
switch(ch) {
// map overflow characters
case 0x00B5 : mapChar = 0x039C; break;
default : mapChar = Character.ERROR; break;
}
}
}
return mapChar;
}
static char[] sharpsMap = new char[] {'S', 'S'};
char[] toUpperCaseCharArray(int ch) {
char[] upperMap = {(char)ch};
if (ch == 0x00DF) {
upperMap = sharpsMap;
}
return upperMap;
}
static final CharacterDataLatin1 instance = new CharacterDataLatin1();
private CharacterDataLatin1() {};
static final int A[] = new int[256];
static final String A_DATA =
"\u4800\u100F\u4800\u100F\u4800\u100F\u4800\u100F\u4800\u100F\u4800\u100F\u4800"+
"\u100F\u4800\u100F\u4800\u100F\u5800\u400F\u5000\u400F\u5800\u400F\u6000\u400F"+
"\u5000\u400F\u4800\u100F\u4800\u100F\u4800\u100F\u4800\u100F\u4800\u100F\u4800"+
"\031\201\u7002\201\u7002\201\u7002\201\u7002\201\u7002\201\u7002\201\u7002"+
"\u061D\u7002";
// The B table has 256 entries for a total of 512 bytes.
static final char B[] = (
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"+
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"+
"\000\000\000\000\000\000\000\000\000").toCharArray();
// In all, the character property tables require 1024 bytes.
static {
{ // THIS CODE WAS AUTOMATICALLY CREATED BY GenerateCharacter:
char[] data = A_DATA.toCharArray();
assert (data.length == (256 * 2));
int i = 0, j = 0;
while (i < (256 * 2)) {
int entry = data[i++] << 16;
A[j++] = entry | data[i++];
}
}
}
}
package java.lang;
/**
* The CharacterData00 class encapsulates the large tables once found in
* java.lang.Character
*/
class CharacterData00 extends CharacterData {
int getProperties(int ch) {
char offset = (char)ch;
int props = A[Y[X[offset>>5]|((offset>>1)&0xF)]|(offset&0x1)];
return props;
}
int getPropertiesEx(int ch) {
char offset = (char)ch;
int props = B[Y[X[offset>>5]|((offset>>1)&0xF)]|(offset&0x1)];
return props;
}
int getType(int ch) {
int props = getProperties(ch);
return (props & 0x1F);
}
boolean isOtherLowercase(int ch) {
int props = getPropertiesEx(ch);
return (props & 0x0001) != 0;
}
boolean isOtherUppercase(int ch) {
int props = getPropertiesEx(ch);
return (props & 0x0002) != 0;
}
boolean isOtherAlphabetic(int ch) {
int props = getPropertiesEx(ch);
return (props & 0x0004) != 0;
}
boolean isIdeographic(int ch) {
int props = getPropertiesEx(ch);
return (props & 0x0010) != 0;
}
boolean isJavaIdentifierStart(int ch) {
int props = getProperties(ch);
return ((props & 0x00007000) >= 0x00005000);
}
boolean isJavaIdentifierPart(int ch) {
int props = getProperties(ch);
return ((props & 0x00003000) != 0);
}
boolean isUnicodeIdentifierStart(int ch) {
int props = getProperties(ch);
return ((props & 0x00007000) == 0x00007000);
}
boolean isUnicodeIdentifierPart(int ch) {
int props = getProperties(ch);
return ((props & 0x00001000) != 0);
}
boolean isIdentifierIgnorable(int ch) {
int props = getProperties(ch);
return ((props & 0x00007000) == 0x00001000);
}
int toLowerCase(int ch) {
int mapChar = ch;
int val = getProperties(ch);
if ((val & 0x00020000) != 0) {
if ((val & 0x07FC0000) == 0x07FC0000) {
switch(ch) {
// map the offset overflow chars
case 0x212B : mapChar = 0x00E5; break;
// map the titlecase chars with both a 1:M uppercase map
// and a lowercase map
case 0x1F88 : mapChar = 0x1F80; break;
case 0x1F89 : mapChar = 0x1F81; break;
case 0xA77D : mapChar = 0x1D79; break;
case 0xA78D : mapChar = 0x0265; break;
case 0xA7AA : mapChar = 0x0266; break;
// default mapChar is already set, so no
// need to redo it here.
// default : mapChar = ch;
}
}
else {
int offset = val << 5 >> (5+18);
mapChar = ch + offset;
}
}
return mapChar;
}
int toUpperCase(int ch) {
int mapChar = ch;
int val = getProperties(ch);
if ((val & 0x00010000) != 0) {
if ((val & 0x07FC0000) == 0x07FC0000) {
switch(ch) {
// map chars with overflow offsets
case 0x00B5 : mapChar = 0x039C; break;
case 0x017F : mapChar = 0x0053; break;
case 0x2D2D : mapChar = 0x10CD; break;
// ch must have a 1:M case mapping, but we
// can't handle it here. Return ch.
// since mapChar is already set, no need
// to redo it here.
//default : mapChar = ch;
}
}
else {
int offset = val << 5 >> (5+18);
mapChar = ch - offset;
}
}
return mapChar;
}
int toTitleCase(int ch) {
int mapChar = ch;
int val = getProperties(ch);
if ((val & 0x00008000) != 0) {
// There is a titlecase equivalent. Perform further checks:
if ((val & 0x00010000) == 0) {
// The character does not have an uppercase equivalent, so it must
// already be uppercase; so add 1 to get the titlecase form.
mapChar = ch + 1;
}
else if ((val & 0x00020000) == 0) {
// The character does not have a lowercase equivalent, so it must
// already be lowercase; so subtract 1 to get the titlecase form.
mapChar = ch - 1;
}
// else {
// The character has both an uppercase equivalent and a lowercase
// equivalent, so it must itself be a titlecase form; return it.
// return ch;
//}
}
else if ((val & 0x00010000) != 0) {
// This character has no titlecase equivalent but it does have an
// uppercase equivalent, so use that (subtract the signed case offset).
mapChar = toUpperCase(ch);
}
return mapChar;
}
int digit(int ch, int radix) {
int value = -1;
if (radix >= Character.MIN_RADIX && radix <= Character.MAX_RADIX) {
int val = getProperties(ch);
int kind = val & 0x1F;
if (kind == Character.DECIMAL_DIGIT_NUMBER) {
value = ch + ((val & 0x3E0) >> 5) & 0x1F;
}
else if ((val & 0xC00) == 0x00000C00) {
// Java supradecimal digit
value = (ch + ((val & 0x3E0) >> 5) & 0x1F) + 10;
}
}
return (value < radix) ? value : -1;
}
int getNumericValue(int ch) {
int val = getProperties(ch);
int retval = -1;
switch (val & 0xC00) {
default: // cannot occur
case (0x00000000): // not numeric
retval = -1;
break;
case (0x00000400): // simple numeric
retval = ch + ((val & 0x3E0) >> 5) & 0x1F;
break;
case (0x00000800) : // "strange" numeric
switch (ch) {
case 0x0BF1: retval = 100; break; // TAMIL NUMBER ONE HUNDRED
case 0x0BF2: retval = 1000; break; // TAMIL NUMBER ONE THOUSAND
case 0x1375: retval = 40; break; // ETHIOPIC NUMBER FORTY
case 0x0D71: retval = 100; break; // MALAYALAM NUMBER ONE HUNDRED
case 0x0D72: retval = 1000; break; // MALAYALAM NUMBER ONE THOUSAND
case 0x2186: retval = 50; break; // ROMAN NUMERAL FIFTY EARLY FORM
case 0x2187: retval = 50000; break; // ROMAN NUMERAL FIFTY THOUSAND
case 0x2188: retval = 100000; break; // ROMAN NUMERAL ONE HUNDRED THOUSAND
default: retval = -2; break;
}
break;
case (0x00000C00): // Java supradecimal
retval = (ch + ((val & 0x3E0) >> 5) & 0x1F) + 10;
break;
}
return retval;
}
boolean isWhitespace(int ch) {
int props = getProperties(ch);
return ((props & 0x00007000) == 0x00004000);
}
byte getDirectionality(int ch) {
int val = getProperties(ch);
byte directionality = (byte)((val & 0x78000000) >> 27);
if (directionality == 0xF ) {
switch(ch) {
case 0x202A :
// This is the only char with LRE
directionality = Character.DIRECTIONALITY_LEFT_TO_RIGHT_EMBEDDING;
break;
case 0x202B :
// This is the only char with RLE
directionality = Character.DIRECTIONALITY_RIGHT_TO_LEFT_EMBEDDING;
break;
case 0x202C :
// This is the only char with PDF
directionality = Character.DIRECTIONALITY_POP_DIRECTIONAL_FORMAT;
break;
case 0x202D :
// This is the only char with LRO
directionality = Character.DIRECTIONALITY_LEFT_TO_RIGHT_OVERRIDE;
break;
case 0x202E :
// This is the only char with RLO
directionality = Character.DIRECTIONALITY_RIGHT_TO_LEFT_OVERRIDE;
break;
default :
directionality = Character.DIRECTIONALITY_UNDEFINED;
break;
}
}
return directionality;
}
boolean isMirrored(int ch) {
int props = getProperties(ch);
return ((props & 0x80000000) != 0);
}
int toUpperCaseEx(int ch) {
int mapChar = ch;
int val = getProperties(ch);
if ((val & 0x00010000) != 0) {
if ((val & 0x07FC0000) != 0x07FC0000) {
int offset = val << 5 >> (5+18);
mapChar = ch - offset;
}
else {
switch(ch) {
// map overflow characters
case 0x00B5 : mapChar = 0x039C; break;
case 0x017F : mapChar = 0x0053; break;
case 0x2D27 : mapChar = 0x10C7; break;
case 0x2D2D : mapChar = 0x10CD; break;
default : mapChar = Character.ERROR; break;
}
}
}
return mapChar;
}
char[] toUpperCaseCharArray(int ch) {
char[] upperMap = {(char)ch};
int location = findInCharMap(ch);
if (location != -1) {
upperMap = charMap[location][1];
}
return upperMap;
}
/**
* Finds the character in the uppercase mapping table.
*
* @param ch the <code>char</code> to search
* @return the index location ch in the table or -1 if not found
* @since 1.4
*/
int findInCharMap(int ch) {
if (charMap == null || charMap.length == 0) {
return -1;
}
int top, bottom, current;
bottom = 0;
top = charMap.length;
current = top/2;
// invariant: top > current >= bottom && ch >= CharacterData.charMap[bottom][0]
while (top - bottom > 1) {
if (ch >= charMap[current][0][0]) {
bottom = current;
} else {
top = current;
}
current = (top + bottom) / 2;
}
if (ch == charMap[current][0][0]) return current;
else return -1;
}
static final CharacterData00 instance = new CharacterData00();
private CharacterData00() {};
static final char X[] = (
"\000\020\040\060\100\120\140\160\200\220\240\260\300\320\340\360\200\u0100"+
"\u0110\u0120\u0130\u0140\u0150\u0160\u0170\u0170\u0180\u0190\u01A0\u01B0\u01C0"+
"\u02B0\u02B0\u15A0\u15B0\040\u15C0\u15D0\u15E0\u15F0\u1600\u1610").toCharArray();
// The Y table has 5664 entries for a total of 11328 bytes.
static final char Y[] = (
"\000\000\000\000\002\004\006\000\000\000\000\000\000\000\010\004\012\014\016"+
"\020\022\024\026\030\032\032\032\032\032\034\036\040\042\044\044\044\044\044"+
"\224\224\224\362\224\224\224\362\224\u01BC\362\072\u039C\u01D4\u02AE\u02C4"+
"\u0162\u02D8\u01D6\362\362\362\362\u039E\u03A0\u016A\362").toCharArray();
// The A table has 930 entries for a total of 3720 bytes.
static final int A[] = new int[930];
static final String A_DATA =
"\u4800\u100F\u4800\u100F\u4800\u100F\u5800\u400F\u5000\u400F\u5800\u400F\u6000"+
"\u400F\u5000\u400F\u5000\u400F\u5000\u400F\u6000\u400C\u6800\030\u6800\030"+
"\u6800\030\u2800\u601A\u7800\000\u4800\u1010\u6800\031\u6800\033\u7800\000"+
"\u6800\u1010\u6800\u1010\u6800\u1010";
// The B table has 930 entries for a total of 1860 bytes.
static final char B[] = (
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"+
"\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"+ "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000").toCharArray();
// In all, the character property tables require 19144 bytes.
static {
charMap = new char[][][] {
{ {'\u00DF'}, {'\u0053', '\u0053', } },
{ {'\u0130'}, {'\u0130', } },
{ {'\u0149'}, {'\u02BC', '\u004E', } },
{ {'\uFB17'}, {'\u0544', '\u053D', } },
};
{ // THIS CODE WAS AUTOMATICALLY CREATED BY GenerateCharacter:
char[] data = A_DATA.toCharArray();
assert (data.length == (930 * 2));
int i = 0, j = 0;
while (i < (930 * 2)) {
int entry = data[i++] << 16;
A[j++] = entry | data[i++];
}
}
}
}
class CharacterDataPrivateUse extends CharacterData {
int getProperties(int ch) {
return 0;
}
int getType(int ch) {
return (ch & 0xFFFE) == 0xFFFE
? Character.UNASSIGNED
: Character.PRIVATE_USE;
}
boolean isJavaIdentifierStart(int ch) {
return false;
}
boolean isJavaIdentifierPart(int ch) {
return false;
}
boolean isUnicodeIdentifierStart(int ch) {
return false;
}
boolean isUnicodeIdentifierPart(int ch) {
return false;
}
boolean isIdentifierIgnorable(int ch) {
return false;
}
int toLowerCase(int ch) {
return ch;
}
int toUpperCase(int ch) {
return ch;
}
int toTitleCase(int ch) {
return ch;
}
int digit(int ch, int radix) {
return -1;
}
int getNumericValue(int ch) {
return -1;
}
boolean isWhitespace(int ch) {
return false;
}
byte getDirectionality(int ch) {
return (ch & 0xFFFE) == 0xFFFE
? Character.DIRECTIONALITY_UNDEFINED
: Character.DIRECTIONALITY_LEFT_TO_RIGHT;
}
boolean isMirrored(int ch) {
return false;
}
static final CharacterData instance = new CharacterDataPrivateUse();
private CharacterDataPrivateUse() {};
}
class CharacterDataUndefined extends CharacterData {
int getProperties(int ch) {
return 0;
}
int getType(int ch) {
return Character.UNASSIGNED;
}
boolean isJavaIdentifierStart(int ch) {
return false;
}
boolean isJavaIdentifierPart(int ch) {
return false;
}
boolean isUnicodeIdentifierStart(int ch) {
return false;
}
boolean isUnicodeIdentifierPart(int ch) {
return false;
}
boolean isIdentifierIgnorable(int ch) {
return false;
}
int toLowerCase(int ch) {
return ch;
}
int toUpperCase(int ch) {
return ch;
}
int toTitleCase(int ch) {
return ch;
}
int digit(int ch, int radix) {
return -1;
}
int getNumericValue(int ch) {
return -1;
}
boolean isWhitespace(int ch) {
return false;
}
byte getDirectionality(int ch) {
return Character.DIRECTIONALITY_UNDEFINED;
}
boolean isMirrored(int ch) {
return false;
}
static final CharacterData instance = new CharacterDataUndefined();
private CharacterDataUndefined() {};
}