Amazon Interview Question: List all anagrams in a file. ... | Glassdoor

Interview Question

Software Development Engineer Interview Seattle, WA

List all anagrams in a file. Assumptions: case-insensitive

 , a-z characters only, one word per line. For example, if the file contains dog, cat, ddd, goo, act, god -- output dog, god, act, cat
technical, java, programming

Interview Answer

10 Answers


Thankfully I was taking a theory course and one trick used in the course was "encoding" programs as a large natural number using product of primes.

1. Adapt the encoding as follows -- generate the first 26 primes and assign a prime to each letter
2a. For each word, take the product of each letter's prime. So #(act) = 2*5*prime(t)
2b. As you can see, #(cat) = 5*2*prime(t) = #(act)
3. Insert a handwavey argument about inserting the number/word pairing into a HashMap>

Interview Candidate on Jun 24, 2010

Sort the words. Anagrams appear next to each other in the sorted list.

Bill on Jul 8, 2010

Sorry, sort the letters in each word, then sort the words. Anagrams appear next to each other in the list. For example the sorted list for the input would be:


Bill on Jul 8, 2010

Thanks for sharing Bill

For this set of input, the expected output should contain only [cat, act, god, dog]. I'm curious to see what "next steps" your algorithm will perform to provide this expected output

Interview Candidate on Jul 8, 2010

You keep track of the mapping from the sorted word to the actual word in a pair, for example:

[act, cat]
[act, act]
[ddd, ddd]
[dgo, god]
[dgo, dog]
[goo, goo]

Then you go through this list and count if you have a duplicate entry or not. If you do, like for act, you print out those duplicate entries: cat, act.

Bill on Jul 8, 2010

Bill, your algorithm is O(n*log(n)) while the candidates would be O(n) - provided he uses a decent hash function

donutello on Jul 19, 2010


bills algo is not n log n it is n*log(k)
where as candidates algo is n * k again (multiplications for each word)
where k = length of the longest word
on top of that calculating primes is expensive anyway
I would go with bills answer

Anonymous on Feb 19, 2011

Bills algo is nlogk + nlgn.
After sorting the k letters for n times you also have to sort the n words.

deveffort on Feb 23, 2011

#Get inputs
a = []
f = open('input.txt','r')
for line in f:
    line = line.strip()

#Sort letters in a word
def sort_letter(s):
    r = []
    for i in s:
    t = sorted(r)
    v = ''.join(t)
    return v

#Find anagrams
d = {}
for v in a:
    sv = sort_letter(v)
    if sv in d:
        d[sv] = [v]

#Print results
for k,v in d.items():
    if len(v) > 1:
        for s in v:
            print s

Quoc on Dec 4, 2014

think of each line as a set of characters, not a word, then create set of sets of characters which you fill from the input. then print the set (order does not matter as its not specified)

Anonymous on Dec 24, 2014

Add Answers or Comments

To comment on this, Sign In or Sign Up.