2016-07-02

集合容器学习之List

List是列表接口，内部元素可重复。ArrayList和LinkedList是最常用的两种List。这一次我们来学习这两种容器的源码。

ArrayList学习

ArrayList是最常用最基础的容器之一。ArrayList是一个动态数组，当容量不够时会自动扩容。api是这样说的：

Resizable-array implementation of the List interface. Implements all optional list operations, and permits all elements, including null. In addition to implementing the List interface, this class provides methods to manipulate the size of the array that is used internally to store the list. (This class is roughly equivalent to Vector, except that it is unsynchronized.)

构造函数

我们从构造函数出发：

public ArrayList(int initialCapacity) {
    super();
    if (initialCapacity < 0)
        throw new IllegalArgumentException("Illegal Capacity: "+
                                           initialCapacity);
    this.elementData = new Object[initialCapacity];
}

public ArrayList() {
    super();
    this.elementData = EMPTY_ELEMENTDATA;
}

public ArrayList(Collection<? extends E> c) {
    elementData = c.toArray();
    size = elementData.length;
    // c.toArray might (incorrectly) not return Object[](see 6260652)
    if (elementData.getClass() != Object[].class)
        elementData = Arrays.copyOf(elementData, size, Object[].class);
}

从这三个构造方法我们可以看出，实际上就是初始化elementData这个Object数组。而默认的数组大小是10。同时，可以将一个Collection作为参数传给构造方法，实际上是把里面的值copy到ArrayList。

add和remove方法

list主要的逻辑就是添加和删除。这是add方法的实现：

//默认添加到list最后一个元素
public boolean add(E e) {
	//判断是否需要扩容
    ensureCapacityInternal(size + 1);  // Increments modCount!!
    elementData[size++] = e;
    return true;
}
//将元素插入到index位置，原位置及其后面的元素向后移动一位
public void add(int index, E element) {
	//检验index的合法性
    rangeCheckForAdd(index);

    ensureCapacityInternal(size + 1);  // Increments modCount!!
    //将index后的元素后移一位
    System.arraycopy(elementData, index, elementData, index + 1,  size - index);
    elementData[index] = element;
    size++;
}

总而言之，add方法先检查是否到容量上限，然后将元素加到指定位置。如果在ArrayList中间添加元素，那么将会移动后面的所有元素，效率较低。
这里是get方法：

public E get(int index) {
        rangeCheck(index);
        return elementData(index);
    }

可以看出，get方法很简单，就是直接在数组的指定位置得到元素。get方法体现了ArrayList的优势：读取速度极快。

扩容ensureCapacityInternal和grow方法

我们注意到，在add方法中，每次都会检查容量是否足够，使用的都是ensureCapacityInternal方法，而这个方法就是实现 自动扩容机制 的核心。源码如下：

private void ensureCapacityInternal(int minCapacity) {
    if (elementData == EMPTY_ELEMENTDATA) {
        minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
    }
    ensureExplicitCapacity(minCapacity);
}
private void ensureExplicitCapacity(int minCapacity) {
    modCount++;
    // overflow-conscious code
    if (minCapacity - elementData.length > 0)
        grow(minCapacity);
}

而ensureCapacityInternal调用ensureExplicitCapacity方法，而ensureExplicitCapacity方法继续调用grow方法，grow方法如下：

//实际上进行扩容的方法
private void grow(int minCapacity) {
    int oldCapacity = elementData.length;
    //新容量为原来的1.5倍
    int newCapacity = oldCapacity + (oldCapacity >> 1);
    //再判断一下新数组的容量够不够，够了就直接使用这个长度创建新数组， 
    //不够就将数组长度设置为需要的长度
    if (newCapacity - minCapacity < 0)
        newCapacity = minCapacity;
    //判断是否超过最大长度
    if (newCapacity - MAX_ARRAY_SIZE > 0)
        newCapacity = hugeCapacity(minCapacity);
    //将原数组copy到新数组中
    elementData = Arrays.copyOf(elementData, newCapacity);
}

也就是说，当增加数据时触发了扩容机制，那么ArrayList就会扩容至1.5倍，然后将旧的数据存入新的数组中，并将引用指向新的数组。因此，在ArrayList中添加元素性能很低，表明其应用场合应该是读操作远远大于写操作。如果需要频繁的增删，那么LinkedList更佳。

Arrays.copyof和System.arraycopy()

在ArrayList中复制操作基本用的都是这两个方法，有必要对这两个方法进一步的学习。
copyOf具有很多重载方法，但思路基本相同。这是泛型方法：

1
2
3

public static <T> T[] copyOf(T[] original, int newLength) {
    return (T[]) copyOf(original, newLength, original.getClass());
}

调用的另一个copyOf是这样的：

public static <T,U> T[] copyOf(U[] original, int newLength, Class<? extends T[]> newType) {
    T[] copy = ((Object)newType == (Object)Object[].class)
        ? (T[]) new Object[newLength]
        : (T[]) Array.newInstance(newType.getComponentType(), newLength);
    System.arraycopy(original, 0, copy, 0,
                     Math.min(original.length, newLength));
    return copy;
}

可以发现，内部创建了一个长度为newLength的Object数组，调用System.arraycopy方法将原数组元素复制到了新数组中。
接下来看看System.arraycopy方法：

1
2
3

public static native void arraycopy(Object src,  int  srcPos,
                                        Object dest, int destPos,
                                        int length);

可以看出，这是一个native方法。查找资料得知，该函数实际上最终调用了C语言的memmove()函数，因此它可以保证同一个数组内元素的正确复制和移动，比一般的复制方法的实现效率要高很多，很适合用来批量处理数组。 Java强烈推荐在复制大量数组元素时用该方法，以取得更高的效率 。

小结

ArrayList本质就是一个数组，add和remove都是对数组的操作。同时，具有扩容机制，当数组长度不够时容量扩大为1.5倍。ArrayList一切特性来自于数组：有序、元素可重复、插入慢、索引快等等。

参考
ArrayList其实就那么一回事儿之源码浅析
 ArrayList源码剖析

LinkedList学习

LinkedList本质是双向链表，具有链表的特点，快速增删，但读取元素较慢。api是这样说的：

Doubly-linked list implementation of the List and Deque interfaces. Implements all optional list operations, and permits all elements (including null).

构造方法

LinkedList的构造方法是这样的：

public LinkedList() {
}
public LinkedList(Collection<? extends E> c) {
    this();
    addAll(c);
}

无参构造方法没有传入参数，不需要指定长度之类的参数。而第二个构造方法，将集合传入，调用addAll方法将每个元素放入LinkedList。

链表Node

LinkedList由双向链表组成，节点Node是一个内部类：

private static class Node<E> {
    E item;
    Node<E> next;
    Node<E> prev;
    Node(Node<E> prev, E element, Node<E> next) {
        this.item = element;
        this.next = next;
        this.prev = prev;
    }
}

每个节点内部包含3个元素：数据，前驱，后继，明显是一个双向链表。那总体链表的结构是怎样的呢？

//元素个数
transient int size = 0;
//头结点
transient Node<E> first;
//尾节点
transient Node<E> last;

整个链表维护着元素个数，头结点和尾节点。链表不成环，头结点的前驱为null，尾节点的后继为null。

add、addAll方法

add方法：

public boolean add(E e) {
    linkLast(e);
    return true;
}
public void add(int index, E element) {
    checkPositionIndex(index);
    if (index == size)
        linkLast(element);
    else
        linkBefore(element, node(index));
}

如果不指定位置，那么将数据插入到队尾；否则调用linkBefore方法，将其插入到index位置。这是linkBefore方法：

void linkBefore(E e, Node<E> succ) {
    // assert succ != null;
    final Node<E> pred = succ.prev;
    final Node<E> newNode = new Node<>(pred, e, succ);
    succ.prev = newNode;
    if (pred == null)
        first = newNode;
    else
        pred.next = newNode;
    size++;
    modCount++;
}

这是addAll方法，将一个集合插入到链表中：

//在index节点处将Collection c 的元素全部插入
//比如[3, 5, 7]在index=1处添加[9, 11]，那么链表变为[3, 9, 11, 5, 7]
public boolean addAll(int index, Collection<? extends E> c) {
	//判断index的合法性
    checkPositionIndex(index);
    //将a转变为Object数组a，numNew为其长度
    Object[] a = c.toArray();
    int numNew = a.length;
    if (numNew == 0)
        return false;

    //找到插入链表的前驱pred
    //node()方法是按顺序得到index位置的节点
    Node<E> pred, succ;
    if (index == size) {
        succ = null;
        pred = last;
    } else {
        succ = node(index);
        pred = succ.prev;
    }

    //为每个数据创建节点，并依次插入到pred后面
    for (Object o : a) {
        @SuppressWarnings("unchecked") E e = (E) o;
        Node<E> newNode = new Node<>(pred, e, null);
        if (pred == null)
            first = newNode;
        else
            pred.next = newNode;
        pred = newNode;
    }

    //指定插入数据的后继
    if (succ == null) {
        last = pred;
    } else {
        pred.next = succ;
        succ.prev = pred;
    }

    size += numNew;
    modCount++;
    return true;
}

get和remove方法

get方法直接调用node方法，返回index位置的元素

public E get(int index) {
    checkElementIndex(index);
    return node(index).item;
}

remove方法有3个，分别是：

public E remove() {
    return removeFirst();
}

public E remove(int index) {
    checkElementIndex(index);
    return unlink(node(index));
}

public boolean remove(Object o) {
    if (o == null) {
        for (Node<E> x = first; x != null; x = x.next) {
            if (x.item == null) {
                unlink(x);
                return true;
            }
        }
    } else {
        for (Node<E> x = first; x != null; x = x.next) {
            if (o.equals(x.item)) {
                unlink(x);
                return true;
            }
        }
    }
    return false;
}

remove()等同于removeFirst()，除掉第一个元素。remove(int index)是删除index处的元素。remove(Object o)是删除o这个对象，有几个删除几个。
需要注意的是 ：由于JDK1.5引入自动装箱拆箱机制，基本类型和包装类会在合适的时机互相转换。当LinkedList装的是Integer对象时，调用remove(Object o)会自动变成调用remove(int index)！这一点需要特别注意。

LinkedList的其他功能

LinkedList的数据结构是双向链表，以此为基础很容易开发队列和栈的功能。在LinkedList中直接提供了相关的方法。

栈操作：

//入栈
public void push(E e) {
//出栈
public E pop() {
//栈顶元素
public E peek() {

队操作：

//入队
public boolean add(E e) {
//出队
public E poll() {

小结

LinkedList其实是基于双向链表实现的，它的特性就是链表的特性。
LinkedList和ArrayList的区别：

ArrayList基于数组实现，因此具有：有序、元素可重复、插入慢、索引快这些数组的特性；
LinkedList 基于双向链表实现，因此具有链表插入快、索引慢的特性；

参考
LinkedList源码剖析
 LinkedList其实就那么一回事儿之源码分析