Amazon Interview Question: How would you find a duplicat... | Glassdoor

## Interview Question

Software Developer Engineer II Interview Herndon, VA

# How would you find a duplicate number in a very large

unsorted array of ints.

1

findDuplicates(int array[])
{
int duplicates[] , index;
for (int i=0; ii; j--)
{
if (array[i] == array[j])
{
duplicates[index++] = array[i];
break;
}

}
print duplicates;

Yuvich86 on Jan 5, 2013
0

O(n^2) is the usual naive answer but there are properties that if true can reduce this to O(n) using bit ops:

In general, if the given array range can also be generated where the duplicated number you are trying to find gets no special treatment and is included just like all the rest a single time, then you can get the answer this way:

set total to 0
foreach (n in given array)
xor all n into total
foreach (n in generated range)
xor all n into total

This works because all the non-duplicated single entries will cancel out via xor with their single entry from the generated set (since they are all paired) and the duplicated number will have an extra odd entry (since it will have 2 entries already from the given array + 1 from the generated set = 3 entries).

And because of course xor is commutative; the order of the xor'ing doesn't matter:

6^6^5^5^4^4 = 0, as does 6^5^4^5^4^6

It is a variation of these problems:
- find a missing number in an unsorted array
- find an unduplicated number in an unsorted array of duplicates

Ed on Apr 20, 2013
0

I should have added t the above:

Ask the interviewer if the array of N has any special distribution. In particular, for the duplicate question here, ask if the array of N contains [0, N-2] or [1, N-1] values unsorted, in addition to one extra entry duplicated in that set duplicated.

Ed on Apr 20, 2013
1

[Apologies for multiple posts, I don't see any way to edit a post once made ...]

There is also a way to find 2 different numbers instead of just 1 resulting from any of the above variations.

If you first do the above xor O(n), your answer is actually A^B since you cannot separate the 2 numbers ... yet.

The trick here is to get A and B into different unsorted subarrays, with all the other pairings that self-cancel into either subarray with them (but still in pairs together).

That is done by noticing that even though you have no idea what A and B are individually, you do know that since the total is A^B that any 1-bit in the total A^B means that bit is different between A and B. So divide up *all* the numbers based on each having a 0 or 1 in one of those different bits.

pseudocode
-get the A^B total via the xor O(n) algorithm previously posted
-determine as pivot an Nth bit which is different between A and B (any 1-bit in the A^B total)
-run through all the same numbers again, but this pass accumulate xor's into an even or odd total depending on the chosen bit
- the final answer for A and B will be the 2 accumulated totals

Complexity: 2 O(n) which is just O(n)

Ed on Apr 20, 2013