Google Interview Question

How do you remove duplicates in a large array.

Interview Answers

Anonymous

Nov 11, 2009

Create a hash table with size equal to n, the number of items in the array. Set a counter, uniqueCount to 0. Then walk through the array. Check if a[i] is in the hash table. If it is not, add a[i] to the hash table with the unique count as its value. Increment the unique count. At the end, put the hash results into your array, using each key’s value as its index, and set all remaining elements to null. The worst-case for this solution is O(n), but it also requires O(n) memory to store the hash table. Is there a way to do this in place? removeDupes(int[] a){ int uniqueCount = 0; for (int i = 0; i < n; i++){ if (a[i] not in hash table) { add a[i] to hash table with value uniqueCount; uniqueCount++; } } set = get key-value pairs from hash table; for (each element e in set) { a[e.value] = e; } for (int i = uniqueCount; i < n; i++){ a[i] = null; } }

Anonymous

Nov 11, 2009

If we are allowed to modify the array, we could do an in-place sort, O(n log n), and then a walk-through, deleting the duplicates. This solution would run in O(n log n) time and require constant memory. If the array’s elements can be mapped to unique integers and the largest of these integers is guaranteed to be less than 2^31, then we can use a bit array to store presence, which may require less memory than a hash table.

Anonymous

Feb 18, 2010

since it is an array, when you remove, you'd think about shifting rest elements