一千零一夜：檢查陣列包含某一目標元素的幾種方法分析

阿新 • • 發佈：2019-02-01

最近看programcreek的《Simple Java》材料，在 How to Check if an Array Contains a Value in Java Efficiently一文中作者列舉了四中解決方案，分別是使用List、Set、loop、binarySearch方法，如下所示：

package atlas;

import java.util.Arrays;
import java.util.HashSet;
import java.util.Set;

/**
 * @author atlas
 */

//Four Different Ways to Check If an Array Contains a Value

public class checkArrayContailAValue {

    // use list
    public boolean useList(String[] arr, String targetValue) {
        return Arrays.asList(arr).contains(targetValue);
    }

    //use set
    public boolean useSet(String[] arr, String targetValue) {
        Set<String> set = new HashSet<String>(Arrays.asList(arr));
        return set.contains(targetValue);
    }

    //use loop
    public boolean useLoop(String[] arr, String targetValue) {
        for(String s: arr){
            if(s.equals(targetValue))
                return true;
        }
        return false;
    }

    //use binarysearch
    public boolean useArraysBinarySearch(String[] arr, String targetValue)
    {
        int a = Arrays.binarySearch(arr, targetValue);
        return a > 0;
    }
}

並且使用了陣列為不同大小的的測試用例：5、1k、10k

在我機器執行的時間分別是：

結果很明顯，使用二分查詢的方式是最快的，這個不難理解（O(log(n))的複雜度），但是不要忘了一個前提，二分查詢的陣列必須是有序的！，以為到這裡文章結束了麼？不，並沒有那麼簡單。我們看到其他三種方式的差別比較大，這是為什麼呢？這是我們今天研究的重點！

首先，我們來分析下兩個時間相近的方式，使用List和Loop的方式。

使用loop的方式，好理解是ava的for迴圈並結合泛型使用（本質是採用了迭代器Iterator的遍歷），這裡速度是最快的；

其次來看下List，為什麼它的耗時比loop方式大一些呢，分析這個原因，需要知道這兩點，（1）將陣列array轉化為list是需要成本的；（2）list的contatains方式的處理方式，我們逐個分析，將陣列轉為list，是呼叫的Arrays.asList()方法，看Arrays的原始碼中關於這個實現，

    /**
     * Returns a fixed-size list backed by the specified array.  (Changes to
     * the returned list "write through" to the array.)  This method acts
     * as bridge between array-based and collection-based APIs, in
     * combination with {@link Collection#toArray}.  The returned list is
     * serializable and implements {@link RandomAccess}.
     *
     * <p>This method also provides a convenient way to create a fixed-size
     * list initialized to contain several elements:
     * <pre>
     *     List<String> stooges = Arrays.asList("Larry", "Moe", "Curly");
     * </pre>
     *
     * @param a the array by which the list will be backed
     * @return a list view of the specified array
     */
    public static <T> List<T> asList(T... a) {
	return new ArrayList<T>(a);
    }

是呼叫ArrayList的一個建構函式，傳入的引數一個數組，返回一個可調整大小的arrayList。

    private static class ArrayList<E> extends AbstractList<E>
	implements RandomAccess, java.io.Serializable
    {
        private static final long serialVersionUID = -2764017481108945198L;
	private final E[] a;

	ArrayList(E[] array) {
            if (array==null)
                throw new NullPointerException();
	    a = array;
	}
        ...
}

這個轉換的過程是一個賦值的過程，需要消耗一定的時間。我們再來看下contains方式的實現，

    /**
     * Returns <tt>true</tt> if this list contains the specified element.
     * More formally, returns <tt>true</tt> if and only if this list contains
     * at least one element <tt>e</tt> such that
     * <tt>(o==null ? e==null : o.equals(e))</tt>.
     *
     * @param o element whose presence in this list is to be tested
     * @return <tt>true</tt> if this list contains the specified element
     */
    public boolean contains(Object o) {
	return indexOf(o) >= 0;
    }

    /**
     * Returns the index of the first occurrence of the specified element
     * in this list, or -1 if this list does not contain the element.
     * More formally, returns the lowest index <tt>i</tt> such that
     * <tt>(o==null ? get(i)==null : o.equals(get(i)))</tt>,
     * or -1 if there is no such index.
     */
    public int indexOf(Object o) {
	if (o == null) {
	    for (int i = 0; i < size; i++)
		if (elementData[i]==null)
		    return i;
	} else {
	    for (int i = 0; i < size; i++)
		if (o.equals(elementData[i]))
		    return i;
	}
	return -1;
    }

可以看到contains方式內部也是通過一個for迴圈比較來尋找是否有這個元素，也就是同loop方式一樣；

由此，可以推算出來，陣列轉為list的開銷也比較大。

最後，來看一下最耗時的方式Set方法，為啥這個方式最耗時呢，首先你肯定想到了，轉換的開銷是比較大的，而且還是經過了兩種的轉換，

Set<String> set = new HashSet<String>(Arrays.asList(arr));

    private transient HashMap<E,Object> map
    /**
     * Constructs a new set containing the elements in the specified
     * collection.  The <tt>HashMap</tt> is created with default load factor
     * (0.75) and an initial capacity sufficient to contain the elements in
     * the specified collection.
     *
     * @param c the collection whose elements are to be placed into this set
     * @throws NullPointerException if the specified collection is null
     */
    public HashSet(Collection<? extends E> c) {
	map = new HashMap<E,Object>(Math.max((int) (c.size()/.75f) + 1, 16));
	addAll(c);
    }

    /**
     * {@inheritDoc}
     *
     * <p>This implementation iterates over the specified collection, and adds
     * each object returned by the iterator to this collection, in turn.
     *
     * <p>Note that this implementation will throw an
     * <tt>UnsupportedOperationException</tt> unless <tt>add</tt> is
     * overridden (assuming the specified collection is non-empty).
     *
     * @throws UnsupportedOperationException {@inheritDoc}
     * @throws ClassCastException            {@inheritDoc}
     * @throws NullPointerException          {@inheritDoc}
     * @throws IllegalArgumentException      {@inheritDoc}
     * @throws IllegalStateException         {@inheritDoc}
     *
     * @see #add(Object)
     */
    public boolean addAll(Collection<? extends E> c) {
	boolean modified = false;
	Iterator<? extends E> e = c.iterator();
	while (e.hasNext()) {
	    if (add(e.next()))
		modified = true;
	}
	return modified;
    }

首先是先申請一個hashmap，然後通過addall()方法將list元素放入到map中，addall方法也是用過迭代器的方式挨個放入元素，然後呼叫contains方式，

        public Iterator<Map.Entry<K,V>> iterator() {
            return newEntryIterator();
        }
        public boolean contains(Object o) {
            if (!(o instanceof Map.Entry))
                return false;
            Map.Entry<K,V> e = (Map.Entry<K,V>) o;
            Entry<K,V> candidate = getEntry(e.getKey());
            return candidate != null && candidate.equals(e);
        }
        public boolean remove(Object o) {
            return removeMapping(o) != null;
        }
        public int size() {
            return size;
        }
        public void clear() {
            HashMap.this.clear();
        }
    }

同樣也是一個迴圈比較的過程。

至此，我們分析了這幾種方式的耗時情況以及原因，在專案開發中對於資料量不大的情況下還是建議使用Loop的方式來處理，你知道了麼？

一千零一夜：檢查陣列包含某一目標元素的幾種方法分析

一千零一夜：檢查陣列包含某一目標元素的幾種方法分析

《資訊奧賽一本通》1082：求小數的某一位

每日一python（3）：python 中對list去重的兩種方法

複製某個欄位一整列的資料到另外一個新的欄位的幾種方法

Linux 技巧：讓程序在後臺可靠執行的幾種方法

Linux 技巧：讓程序在後臺可靠執行的幾種方法&CentOS 7通過Firewall開放防火牆埠

《火星人開發紀實：敏捷開發一千零一夜》序言

敏捷開發一千零一問：怎樣處理重要但不明白的任務？

敏捷開發一千零一問系列之十三：故事點好還是人天好？

敏捷開發一千零一問系列之三十四：如何弄清楚專案需求（需求開發步驟）？

敏捷開發一千零一問系列之一：序言及解決問題的心法（無我）

敏捷開發一千零一問系列之十四：敏捷開發加班嗎？

敏捷開發一千零一問系列之二十六：如何進行優先順序排序？

敏捷開發一千零一問系列之三十八：計劃撲克就是打不出個結果怎麼辦？

《一千零一夜》全集百度雲網盤資源

敏捷開發一千零一問系列之三十六：如何做小版本迭代的程式碼管理

Problem F: 零起點學演算法85——陣列中插入一個數

敏捷開發一千零一問系列之十七長期受制於強勢客戶怎麼辦（上）

敏捷開發一千零一問系列之十五同時實施CMMI和敏捷哪個為主

Python：列表中按某一列作為索引查詢其他列表中對應資訊，找到後插入當前列表。

一千零一夜：檢查陣列包含某一目標元素的幾種方法分析

相關推薦