This employer has taken extra steps to respond to reviews and provide job seekers with accurate company information, photos, and reviews. Interested for your company? Learn More.

# `Google` `Software Engineer` Interview Question

`"Develop an algorithm for finding the shortest distance between two words in a document. After the phone interview is over, take a few hours to develop a working example in C++ and send it to the manager."`

Tags: | c++ programming See more , See less 8 |

Part of a `Software Engineer` Interview Review - one of `3,031` `Google` Interview Reviews

Answers & Comments

▲

1

▼

of 1vote

Assume that we will be passed two Strings that represent the words we’d like to find. Assume “distance” means the number of characters between occurrences – that is, line breaks will be ignored. Assume the document is represented by a long string. We need to find the two occurrences of these words that are closest together.

Simple solution:

- Walk through the string and find each occurrence of word A. Store the middle index of this occurrence

- Walk through again and find each occurrence of word B. Store the middle index of this occurrence.

- Compare the arrays of indicies. Find the smallest difference between each middle position of A and each middle position of B.

- Running time = O(n + ab) where a and b are the number of occurrences of the input strings A and B.

Walk through the document, character by character, looking to match <space>word<space>. If we find such a match, put the index of the middle character into the array. Continue on our walk. Repeat for the second word. Now we have two arrays of integers. Our new task is to find the smallest difference between the elements in each array. Use the mergesort merge method, keeping track of the smallest difference. Return this distance.

Optimization:

Instead of walking one character at a time, when we find a mismatch, skip ahead by an intelligent number of characters – that is, to the next index at which we might find a good match. We can use the Boyer-Moore algorithm for this approach.

Further optimization:

Look for both words in one pass by using Boyer-Moore tables that account for both words. As we walk, keep track of the current shortest distance. Running time is O(n).

Does anyone have a simpler or clearer answer?

Simple solution:

- Walk through the string and find each occurrence of word A. Store the middle index of this occurrence

- Walk through again and find each occurrence of word B. Store the middle index of this occurrence.

- Compare the arrays of indicies. Find the smallest difference between each middle position of A and each middle position of B.

- Running time = O(n + ab) where a and b are the number of occurrences of the input strings A and B.

Walk through the document, character by character, looking to match <space>word<space>. If we find such a match, put the index of the middle character into the array. Continue on our walk. Repeat for the second word. Now we have two arrays of integers. Our new task is to find the smallest difference between the elements in each array. Use the mergesort merge method, keeping track of the smallest difference. Return this distance.

Optimization:

Instead of walking one character at a time, when we find a mismatch, skip ahead by an intelligent number of characters – that is, to the next index at which we might find a good match. We can use the Boyer-Moore algorithm for this approach.

Further optimization:

Look for both words in one pass by using Boyer-Moore tables that account for both words. As we walk, keep track of the current shortest distance. Running time is O(n).

Does anyone have a simpler or clearer answer?

▲

0

▼

of 0votes

@ ellemeno

Perfect!

Perfect!

▲

0

▼

of 0votes

Thank you! I have my first technical phone interview with Google on Monday. Wish me luck!

▲

2

▼

of 2votes

I would clarify with the interviewer, but it doesn't seem to me that finding the words in the document is the principal focus of the question, so would concentrate on the basic algorithm assuming an efficient word finder (which could be implemented independently).

So here's my take on the algorithm:

1. Find the first instance of word A or B (assume A going forward). Continue searching (e.g. iterating) and each time you find another A, discard last recorded position of A until you reach and record the first B. (If B is the first word reverse A and B in the last sentence). The distance is the first try.

2. Continue searching until you reach the next instance of A or B. Measure the distance to the last recorded instance of the other word. If it's less than the current smallest distance, discard the last recorded instance of the word and record the distance.

3. Repeat 2. until the distance between the words found is greater than the current smallest distance.

4. Keeping track of the distance, start 1 again from current index until you find the next smallest distance or until you run out of words. Replace the recorded smallest distance each time you find one smaller.

So here's my take on the algorithm:

1. Find the first instance of word A or B (assume A going forward). Continue searching (e.g. iterating) and each time you find another A, discard last recorded position of A until you reach and record the first B. (If B is the first word reverse A and B in the last sentence). The distance is the first try.

2. Continue searching until you reach the next instance of A or B. Measure the distance to the last recorded instance of the other word. If it's less than the current smallest distance, discard the last recorded instance of the word and record the distance.

3. Repeat 2. until the distance between the words found is greater than the current smallest distance.

4. Keeping track of the distance, start 1 again from current index until you find the next smallest distance or until you run out of words. Replace the recorded smallest distance each time you find one smaller.

▲

1

▼

of 5votes

The shortest distance between two words is a space. My algorithm would simply search for the first occurrence of a space.

▲

0

▼

of 2votes

Grant, I whole-heartedly agree. It doesn't say that they are two specific words, just the shortest distance between two words... Which would naturally be the one space character between each word. In this case, the algorithm could simply return 1, and not even bother with searching the document at all. After all, the question doesn't really ask where the shortest distance is located in the document.

▲

0

▼

of 0votes

Since this is labeled a C++ programming question asked at a top-notch software company solving difficult problems, handing in a program that always returns 1 probably isn't a good idea. If I were a hiring manager at Google I'd want to see the algorithms, data structures, & design skills you have at your fingertips. If I got this question during an interview at Google with no chance to ask additional questions, I'd assume they wanted the answer to a hard problem. And if I were interviewing for a C++ programming position but never asked such a question, I'd worry about the competence of current staff.

▲

2

▼

of 3votes

P.S. - If I could ask a question, it would be whether the document was already indexed ('cuz I'm interviewing at Google after all). Then try to show a bit of linguistic sensitivity, such as whether intervening sentence and paragraph boundaries should count as "extra" distance. After the interviewer left the room, I'd Google the Web for an answer - why reinvent the wheel? ;-)

▲

0

▼

of 0votes

index1 = -1

index2 = -1

minDistance = -1

Scan the string for words.

For each word:

1. If it matches 1st word, set index1 to index of word.

2. If it matches 2nd word, set index2 to index of word.

3. If it matches neither word, continue

4. If index1 == -1 or index2 == -1 continue

5. distance = abs( index1 - index2 )

6. minDistance = (minDistance == -1) ? distance : min( distance, minDistance )

index2 = -1

minDistance = -1

Scan the string for words.

For each word:

1. If it matches 1st word, set index1 to index of word.

2. If it matches 2nd word, set index2 to index of word.

3. If it matches neither word, continue

4. If index1 == -1 or index2 == -1 continue

5. distance = abs( index1 - index2 )

6. minDistance = (minDistance == -1) ? distance : min( distance, minDistance )

** To comment on this question, Sign In with Facebook or Sign Up **

votes

neakoronNov 11, 2009