Facebook

  www.facebook.com
  www.facebook.com

Interview Question

Software Engineer Interview

given a list of words with a same size and a big string

  that contains one of the permutation of all the words combined(say p), find the startindex of the string p in the big string
Answer

Interview Answer

6 Answers

0

//brute force solution is to find all permutations of the words, and find that string p in the big string or another solution is like find the positions of all the words in the big string and if you mod with the size of word(same for all words) should give the same value for all the word positions in the big string and the lowest position is the startindex. I think there are some string matching algos which to attack this problem any thoughts??

Interview Candidate on Oct 17, 2011
2

Hi Word are all of same size As given
So let words size be k and there be n words so so size of stream will nk

Now qusetion is how we store these n words I prefer stroing them as in a trie
so we start match stream if it follows the path of word "p" we go along if it doesn't
then we start matching it from kth position of stream, then 2k and so on

Anand on Oct 20, 2011
1

Assumtions
----------------
1. The characters are ascii
2. The string p appears in the big string at the beggining of a word

1. have an array of 255 chars. Find the the frequency of each char in the given set of words. update the array. This code be optimized to be a hash.
2. go through the large string and for each word find the frequency of the characters and see if they match the frequency of the words that was found previously.

There are other optimizations that can be done to the solution. O(length of large string).

none on Oct 24, 2011
0

import java.util.ArrayList;
import java.util.HashMap;

public class Solution {
    public ArrayList<Integer> findStartIndex(String S, String[] L) {
        int len = L[0].length();

        ArrayList<Integer> ret = new ArrayList<Integer>();
        HashMap<String, Integer> count = new HashMap<String, Integer>();
        HashMap<String, Integer> found = new HashMap<String, Integer>();

        for (String word : L) {
            if (!count.containsKey(word)) {
                count.put(word, 1);
            } else {
                count.put(word, count.get(word) + 1);
            }
        }

        loop: for (int i = 0; i <= S.length() - len * L.length; i++) {
            String sub = S.substring(i, i + len);
            if (!count.containsKey(sub)) {
                continue;
            }

            found.clear();
            found.put(sub, 1);

            for (int j = 1; j < L.length; j++) {
                String next_sub = S.substring(i + len * j, i + len * (j + 1));

                if (!count.containsKey(next_sub)) {
                    continue loop;
                }

                Integer foundVal = !found.containsKey(next_sub) ? 1 : found.get(next_sub) + 1;

                if (foundVal > count.get(next_sub)) {
                    continue loop;
                }

                found.put(next_sub, foundVal);
            }

            ret.add(i);
        }

        return ret;
    }
}

learned from others on May 17, 2012
0

#include <iostream>
#include <vector>
#include <map>
using namespace std;
int match_pos(string &str, vector<string> & wlist)
{
    if(wlist.size() < 0)
        return -1;
    else if(wlist.size() ==0 )
        return 0;
    int wlen = wlist[0].size();
    map<string, int> last_index;
    for(size_t i=0; i<wlist.size(); i++)
        last_index.insert(make_pair(wlist[i], -1));

    vector<int> start;
    start.resize(wlen);
    for(int i=0; i<wlen; i++)
        start[i] = i;

    int start_index = 0;
    int cur_start;
    int count = 0;
    while(true)
    {
        int real_start = start[start_index];
        if(real_start > str.size() - wlen*wlist.size())
            break;

        for(map<string, int>::iterator it = last_index.begin(); it != last_index.end(); ++ it)
            it -> second = -1;

        cur_start = real_start;
        while(count < wlist.size())
        {
            string word = str.substr(cur_start, wlen);
            map<string, int>::iterator it = last_index.find(word);
            if(it == last_index.end())
            {
                start[start_index] = cur_start + wlen;
                start_index = (start_index + 1)%wlen;
                count = 0;
                break;
            }
            else
            {
                if(it -> second == -1)
                {
                    it -> second = cur_start;
                    cur_start = cur_start + wlen;
                    count ++;
                }
                else
                {
                    start[start_index] = it -> second + wlen;
                    start_index = (start_index + 1) % wlen;
                    count = 0;
                    break;
                }
            }
        }
        if(count == wlist.size())
        {
            string res = str.substr(real_start, wlen * count);
            cout << res << endl;
            return real_start;
        }
    }
    return -1;
}
int main()
{
    vector<string> wlist;
    wlist.push_back("AAA");
    wlist.push_back("BBB");
    wlist.push_back("CCC");
    string s;
    while(cin >> s)
    {
        cout << match_pos(s, wlist) << endl;
    }
    return 0;
}

For job. on Jul 25, 2012
0

In above code, I used two kinds of optimization.
(1) if a word doesn't appear in the word list, then next time we only need to start from pos+wlen
(2) if a world has appeared before at pos, then next time we only need to start from pos + wlen.
By these two kinds of optimization, we can reduce a lot of effort.

For job. on Jul 25, 2012

Add Answers or Comments

To comment on this, Sign In or Sign Up.