基于两个集合全量比较的性能初步性能分析

背景：最近项目中遇到一个需求，需要比较两个大集合中的数据是否一致。（一般情况下是相等的，但是得用代码分析出来）

分析：对于这样的问题，第一个想法是，相比较两个集合大小，进而循环比较集合中每个元素。第二个思路是，看看List源码，是否可以从其API中，利用现有的方法组合，达到同样的效果。

基于第一个想法，这边直接上代码，并列出耗时时间。

    public static void main(String[] args) {
        List<String> listA = new ArrayList<>();
        List<String> listB = new ArrayList<>();
        for (int i = 0; i < 90000000; i++) {
            listA.add("A");
            listB.add("A");
        }
        long startTime = System.currentTimeMillis();
        if (listA.size() == listB.size()) {
            for (String str : listA) {
                if (!listB.contains(str)) {
                    break;
                }
            }
            long endTime = System.currentTimeMillis();
            System.out.println("比较时间" + (endTime - startTime));
        }
    }

测试6次，时间分别是：71,78,76,76,72,71。

上述代码中如果（i < 90000000），时间分别是：120,120,123,121,121,119。

ArrayList中contains源码：

    /**
     * Returns <tt>true</tt> if this list contains the specified element.
     * More formally, returns <tt>true</tt> if and only if this list contains
     * at least one element <tt>e</tt> such that
     * <tt>(o==null&nbsp;?&nbsp;e==null&nbsp;:&nbsp;o.equals(e))</tt>.
     *
     * @param o element whose presence in this list is to be tested
     * @return <tt>true</tt> if this list contains the specified element
     */
    public boolean contains(Object o) {
        return indexOf(o) >= 0;
    }

indexOf(o)源码：

    /**
     * Returns the index of the first occurrence of the specified element
     * in this list, or -1 if this list does not contain the element.
     * More formally, returns the lowest index <tt>i</tt> such that
     * <tt>(o==null&nbsp;?&nbsp;get(i)==null&nbsp;:&nbsp;o.equals(get(i)))</tt>,
     * or -1 if there is no such index.
     */
    public int indexOf(Object o) {
        if (o == null) {
            for (int i = 0; i < size; i++)
                if (elementData[i]==null)
                    return i;
        } else {
            for (int i = 0; i < size; i++)
                if (o.equals(elementData[i]))
                    return i;
        }
        return -1;
    }

基于第二种思路，打算采用containsAll方法：listA.containsAll(listB)&& listB.containsAll(listA)

    public static void main(String[] args) {
        List<String> listA = new ArrayList<>();
        List<String> listB = new ArrayList<>();
        for (int i = 0; i < 50000000; i++) {
            listA.add("A");
            listB.add("A");
        }
        long startTime = System.currentTimeMillis();
        if (listA.containsAll(listB) && listB.containsAll(listA)) {
            long endTime = System.currentTimeMillis();
            System.out.println("比较时间" + (endTime - startTime));
        }
    }

测试6次，时间分别是：119,128,120,120,130,119。

上述代码中如果（i < 90000000），时间分别是：206,213,204,203,205,203。

通过上面数据，大胆推测，for循环的时间大概是后者的二分之一，那么是不是呢？接下来从源码分析一下：

上述代码中List的实现类都是ArrayList。跟了一下contailsAll源码，发现List和ArrayList都没有具体的实现方法，这边索性把：

        List<String> listA = new ArrayList<>();
　　　　　// 改成直接实现类定义
        ArrayList<String> listA = new ArrayList<>();

这样，发现实际调用的ArrayList的父类的父类方法：ArrayList<E> extends AbstractList<E>，AbstractList<E> extends AbstractCollection<E>

源码如下：

    /**
     * {@inheritDoc}
     *
     * <p>This implementation iterates over the specified collection,
     * checking each element returned by the iterator in turn to see
     * if it's contained in this collection.  If all elements are so
     * contained <tt>true</tt> is returned, otherwise <tt>false</tt>.
     *
     * @throws ClassCastException            {@inheritDoc}
     * @throws NullPointerException          {@inheritDoc}
     * @see #contains(Object)
     */
    public boolean containsAll(Collection<?> c) {
        for (Object e : c)
            if (!contains(e))
                return false;
        return true;
    }

循环调用的contains源码如下：（也在这个类下）

    /**
     * {@inheritDoc}
     *
     * <p>This implementation iterates over the elements in the collection,
     * checking each element in turn for equality with the specified element.
     *
     * @throws ClassCastException   {@inheritDoc}
     * @throws NullPointerException {@inheritDoc}
     */
    public boolean contains(Object o) {
        Iterator<E> it = iterator();
        if (o==null) {
            while (it.hasNext())
                if (it.next()==null)
                    return true;
        } else {
            while (it.hasNext())
                if (o.equals(it.next()))
                    return true;
        }
        return false;
    }

所以发现本质都是循环比较。只是这边用了iterator而已。

如果把ArrayList换成LinkedList，也是调用AbstractCollection<E>的 containsAll(Collection<?> c)方法。

所以第二种方法，其实是循环了两次（每次循环是两层for循环的嵌套），时间上大致是第一种方法的两倍。

综上：两种方法的优缺点：

第一种for循环效率较高，代码量较多。

第二种效率较低，更易于理解，代码量较少。

posted on 2019-08-04 01:04 betterLearing 阅读(631) 评论(0) 收藏举报

刷新页面返回顶部

betterLearing

基于两个集合全量比较的性能初步性能分析

导航

公告