JAVA源码之JDK（三）——String、StringBuffer、StrinBuilder

Java中，除了8种基本类型，最长用的应该就是String类了。那么我们来看看JDK中的源码是怎么建造String、StringBuffer、StrinBuilder一系列类的。

java.lang.String

在JAVA里，String类是一个非常特殊的类，我们来看一下它是怎么来表示一个字符串的。首先来看一下它的比较重要的几个属性，源码如下：

/** The value is used for character storage. */
private final char value[];

/** The offset is the first index of the storage that is used. */
private final int offset;

/** The count is the number of characters in the String. */
private final int count;

是的，其实String是用一个char数组来储存的，offset表示char[]的起始位置。count表示字符串的长度，通过字符串的length()方法只有一行return count;就看得出来。

　　这里的属性都是final的，这就是String值为什么不能被改变【可不是因为final class String的final呦】。这么做的好处是String对象可以在不同用户间共用，本身是线程安全的。

来看一个构造方法：

 1 public String(String original) {
 2     int size = original.count;
 3     char[] originalValue = original.value;
 4     char[] v;
 5     if (originalValue.length > size) {
 6         // 如果传入的original的存储char[]的长度大于要构造的字符串的长度，就copy original有效的那部分。
 7         int off = original.offset;
 8         v = Arrays.copyOfRange(originalValue, off, off+size);
 9     } else {
10         // 因为传入的original的存储char[]的长度等于要构造的字符串的长度，所以无需copy
11         v = originalValue;
12     }
13     this.offset = 0;
14     this.count = size;
15     this.value = v;
16 }

　　先说点题外的，笔者我第一次看到第2行的时候感到很诧异，String类的count属性不是private的吗。后来经过自己实验并反复回忆初学JAVA时老师对private的描述，private修饰的属性是只有在类的内部时才能被访问，于是恍然大悟。重新认识了private关键字，也是个收获。

　　再来说这个构造方法，大概翻译了下JDK的注释。这里有两点，一是第8行的数组复制，里面是最终用的是System.arraycopy()方法，这个方法在JAVA中复制数组是最快的【这是个native方法，至于底层怎么实现就不太了解，猜测是通过内存里的地址吧】。再就是第11行的v = originalValue;，就是说，用于构造String的字符串，和构造出来的新的字符串，value属性是指向同一个char[]的【其实这也无所谓，因为String.value是final的】。

　　关于字符集的构造方法这里就不说了，转换来转换去好烦。

　　下面来看几个Sring类常用的方法的实现：

public boolean equals(Object anObject)

public boolean equals(Object anObject) {
    if (this == anObject) {
        return true;
    }
    if (anObject instanceof String) {
        String anotherString = (String)anObject;
        int n = count;
        if (n == anotherString.count) {
            char v1[] = value;
            char v2[] = anotherString.value;
            int i = offset;
            int j = anotherString.offset;
            while (n-- != 0) {
                if (v1[i++] != v2[j++])
                return false;
            }
            return true;
        }
    }
    return false;
}

View Code

先比较hashCode，然后再遍历对比两个String的value值，没什么好说的。

public boolean startsWith(String prefix, int toffset)

public boolean startsWith(String prefix, int toffset) {
    char ta[] = value;
    int to = offset + toffset;
    char pa[] = prefix.value;
    int po = prefix.offset;
    int pc = prefix.count;
    // Note: toffset might be near -1>>>1.
    if ((toffset < 0) || (toffset > count - pc)) {
        return false;
    }
    while (--pc >= 0) {
        if (ta[to++] != pa[po++]) {
            return false;
        }
    }
    return true;
}

View Code

也很简单，遍历字符串的value值。不过注释很有意思：-1>>>1。【-1的无符号右移1位，就是整数最大值，你懂得】

public int indexOf(String str)

static int indexOf(char[] source, int sourceOffset, int sourceCount,
                       char[] target, int targetOffset, int targetCount,
                       int fromIndex) {
    if (fromIndex >= sourceCount) {
            return (targetCount == 0 ? sourceCount : -1);
    }
    if (fromIndex < 0) {
        fromIndex = 0;
    }
    if (targetCount == 0) {
        return fromIndex;
    }

    char first  = target[targetOffset];
    //遍历source时，如果剩下的字符比target还少，就没必要继续遍历了，就是这个max的作用。
    int max = sourceOffset + (sourceCount - targetCount);

    for (int i = sourceOffset + fromIndex; i <= max; i++) {
        /* 找到匹配到的第一个字符 */
        if (source[i] != first) {
            while (++i <= max && source[i] != first);
        }

        /* 匹配整个target是否完全匹配，如果不完全匹配并且source字符串还没遍历完就继续外层的for循环 */
        if (i <= max) {
            int j = i + 1;
            int end = j + targetCount - 1;
            for (int k = targetOffset + 1; j < end && source[j] ==
                     target[k]; j++, k++);

            if (j == end) {
                /* Found whole string. */
                return i - sourceOffset;
            }
        }
    }
    return -1;
}

View Code

我们在使用这个方法时会有一个陷阱，就是sourceString.indexOf("")，传入空字符串会返回0。

public String substring(int beginIndex, int endIndex)

public String substring(int beginIndex, int endIndex) {
    if (beginIndex < 0) {
        throw new StringIndexOutOfBoundsException(beginIndex);
    }
    if (endIndex > count) {
        throw new StringIndexOutOfBoundsException(endIndex);
    }
    if (beginIndex > endIndex) {
        throw new StringIndexOutOfBoundsException(endIndex - beginIndex);
    }
    return ((beginIndex == 0) && (endIndex == count)) ? this :
        new String(offset + beginIndex, endIndex - beginIndex, value);
}

View Code

很简单，构造一个新的字符串。

　　总之，这些常用的方法也不过是对char[]翻来覆去的操作。但提到String类就不得不提String Pool，这是JVM为了提高效率的一个缓存机制【至于JVM的这个原理，研究层次就更深入了，我想我以后会去了解吧】。目前要搞清楚String对象什么时候会在String Pool中建立，什么时候在Heap中建立，并将引用指向哪个，这个也是非常重要的。

String Pool（字符串池）

在创建字符串时，JVM的管理有如下特点：

每当用任意方式创建一个String对象时，JVM都会String Pool中查找是否存在内容相同的字符串对象，如果不存在，则在池中创建一个字符串。
在使用new关键字或包含变量的字符串拼接时，一定会在Heap中建立这个对象，并将引用指向这个对象。【同时仍然会维护String Pool】
使用等号直接指定，或纯字符串拼接时，只会操作String Pool，如果有这个字符串则直接将引用指向，如果没有则创建后指向。

String temp = "ab";
String str1 = new String("abcd");
String str2 = "abcd";
String str3 = "ab" + "cd";
String str4 = temp + "cd";

System.out.println(str1 == str2);//false
System.out.println(str2 == str3);//true
System.out.println(str3 == str4);//false

有兴趣的可以自己试试，这里列举几个例子。

public native String intern();

　　最后来说intern()。这是个native方法，它的执行结果是将字符串的引用指向String Pool中的对象。这样Heap中的对象就可以被回收掉了。

java.lang.StringBuffer

　　我们经常需要String的变化，但String却是不可变的，那么只好串接后生成一个新的String对象来满足变化的需求。如果频繁的进行字符串拼接就会产生出大量的对象，这样很耗性能。所以这时，轮到StringBuffer来登场了。顾名思义，这是一个字符串缓冲区，可以随意变化。StringBuffer 上的主要操作是 append 和 insert 方法。

public synchronized StringBuffer append(String str)

1 public synchronized StringBuffer append(String str) {
2     super.append(str);//父类方法如下
3     return this;
4 }

 1 public AbstractStringBuilder append(String str) {
 2     if (str == null) str = "null";
 3         int len = str.length();
 4     if (len == 0) return this;
 5     int newCount = count + len;
 6     if (newCount > value.length)
 7         expandCapacity(newCount);//这个扩容的方法如下
 8     str.getChars(0, len, value, count);
 9     count = newCount;
10     return this;
11 }

1 void expandCapacity(int minimumCapacity) {
2     int newCapacity = (value.length + 1) * 2;
3     if (newCapacity < 0) {
4         newCapacity = Integer.MAX_VALUE;
5     } else if (minimumCapacity > newCapacity) {
6         newCapacity = minimumCapacity;
7     }
8     value = Arrays.copyOf(value, newCapacity);
9 }

　　从上面我们可以看出，一是如果拼接字符串时，变量为NULL，则按"null"来处理；二则是那个扩容。我们在构造一个StringBuffer时，如果不设置一个长度，则JDK会默认长度为16，如果我们最终拼接的字符串的长度远远超出这个长度，则StringBuffer会频繁的做扩容操作，所以，在new StringBuffer()，传入一个预估的长度参数是一个好习惯。

java.lang.StringBuilder

这个类与StringBuffer的区别就是synchronized，提高效率嘛，都懂得。笔者这里测试了一下，大概也就差1倍的速度。测试代码如下：

StringBuffer sb1 = new StringBuffer(1024);
StringBuilder sb2 = new StringBuilder(1024);
for(int i = 0; i < 10000000; i++){
    //sb1.append("abc");
    sb2.append("abc");
    if((i & 1023) == 0){
        //sb1 = new StringBuffer(1024);
        sb2 = new StringBuilder(1024);
    }
}

View Code

测试结果为：

StringBuffer平均耗时399ms

StringBuilder平均耗时254ms

好了，本次的学习就到这里吧。

学习是件快乐而又有成就感的事。

posted on 2013-12-24 21:01 Yancey.Han 阅读(1664) 评论(1) 收藏举报

刷新页面返回顶部

Yancey

公告

java.lang.String

public boolean equals(Object anObject)

public boolean startsWith(String prefix, int toffset)

public int indexOf(String str)

public String substring(int beginIndex, int endIndex)

String Pool（字符串池）

public native String intern();

java.lang.StringBuffer

public synchronized StringBuffer append(String str)

java.lang.StringBuilder