Amazon.com

www.amazon.com
Employer Engaged

Interview Question

Software Engineer Interview Seattle, WA

If you have a file containing millions of integers, how

  would you sort the data in the file using extremely limited resources, such a s 1GB of memory?
Tags:
sorting algorithm, scaling
Answer

Interview Answer

8 Answers

0

What was the answer?

In my opinion, I would answer this way:

Sort each x integers (x is the number of integers that the memory can hold) then save it to the a file (n files) . Then for each file maintain a current index initiated with 0. Loop through n file and take the max integer, put it into the result file, and increase the file's current index to one. Continue doing it till the end.

Duy on Jan 2, 2012
0

An integer is only 4 bytes! I am just saying, a million integers should consume about 4 MB of memory, and easily fit in 1 GB! So "millions" had better be closer to a quarter billion if memory is an issue.

OK, to answer the intended question, I would solve it with statistics. Compute bucket ranges by using a random subset of the file, with bucket size chosen such that the expected portion of the entire file will fit within the memory available. Divide the file into these buckets, and sort the buckets individually. Concatenate the sorted buckets to create a single sorted file.

Kurt on Jan 8, 2012
0

To Kurt:
You missed one step. Concatenating buckets will not make a sorted list. You have to merge the buckets. However, for merging you do not necessarily have to load the whole bucket into memory.

Naiem on Jan 13, 2012
0

millions is not too much , you can build a int array which have millions of cells. The array take only several mega bytes . Initialize to zero for every cell, then traverse the file and keep track of every integer by array , eg . 4789 will make array[4789]++ , this could handle duplicate number . Finally scan the array ...

iatbst on Feb 2, 2012
1

the interviewer is looking for the word "divide and conquer".
read the n number of integers and write them in a binary tree. hold the left and right value in a datastructure which keeps track of the file an integer is written to.

something like this

class NumHolder {
  int min;
  int max;
  String fileName;
}
Map partInfo = new HashMap()
for (int i=0; i<1...; i++) {
  int[] array = readNNumbersFromFile()
  sortAndWrite(array) //updates the partInfo map with NumHolder data
}
readFromAllPartFilesAndSaveToNewFile(partInfo)

void sortAndWrite(array) {
   sort(array);
   if array[0] - array[last] > 100000 //100000 = sum number that determines how many part files will be created.
     writeToOtherFileToo
  writeToPartFile
  update partInfo with this write
}

dudes on Feb 17, 2012
0

its called 'external sorting'... read more on wiki...

Anonymous on Feb 19, 2012
0

divide the file in chunks, sort individuals chunks using quicksort and store in files. Then use merging step of merge sort on those saved files. This would be external sort.

Mandar on Apr 8, 2012
0

use external sorting algorithm similar to merge sort which splits and merges files instead of arrays in memory

Anonymous on Dec 12, 2014

Add Answers or Comments

To comment on this, Sign In or Sign Up.