Site Reliability Engineer Interview Questions | Glassdoor

Site Reliability Engineer Interview Questions

374

Site reliability engineer interview questions shared by candidates

Top Interview Questions

Sort: RelevancePopular Date

There were two and they both happened during the live-debugging portion of the interview. All of the live debugging questions revolved around a simple website that had something broken in it. You were to fix the brokenness to be able to move on to the next page. In total there were 4 questions, each getting progressively more difficult to debug. The first question was a simple permissions problem on a file being requested by the client. The ownership of the file (a blank text file) was too restrictive, so it was raising an error. You could verify this in the apache web logs. The second error was due to a permission problem too, however this time the file was hidden in a sub directory of the main web site. You could only determine this by looking at the apache configuration file to see that the shtml file was located somewhere else. After that, change the permissions to fix. The third was a head scratcher. The filename in question was raising a 500 error and showing urlencoded characters in the filename in the web log. Looking at the name of the file on disk though, showed nothing out of the ordinary. It turns out that the unicode representations for the characters in the file name are printed in the terminal as english ascii characters. The only way you can tell that this is the case is to open the file and do a search for the filename itself and see if it matches. For example, if the correct filename is called "challenge1.shtml" you can search for that exact string but NOT find the unicode version of it. Once you find the incorrect file name, delete it and type the correct file name (in this case "challenge3.shtml" into the file and the page works. The final question was a segfault occurring in apache. It resulted in no information being returned to the client. You could see this occurring in the apache web logs as well as the Chrome tools. The apache web logs noted that a core file was dumped. This challenge required that you know a little bit about gdb and C programming. Basically, you need to run the core dump through gdb. gdb /path/to/apache /path/to/core/dump It will spew out a lot of stuff. In particular, it mentions that there is something happening in an apache module; mod_rewrite or something...it doesnt really matter. The output also points to the C source file for that module which is, conveniently on disk. Open that file in vi and jump to the line number mentioned in the gdb output (line 1861 or something). There you will see that if the filename matches challenge4.shtml to SIGSEGV; there's your smoke gun. They dont ask you to fix the final challenge, only to explain what the strstr is doing. The error in question basically looks like this if (strstr($r->filename, "challenge4.shtml") != NULL) { SIGSEGV } Just point out to them that, yeah, it's segfaulting when I ask for that file.

9 Answers

In each of the questions posed in the interactive interview, they stressed that these are "not your every day errors, nor representative of what you'd run into in your position" They are just curious to see how you go about debugging problems, what tools you use, what resources you go to, what your thinking is like. I tried to keep talking with the interviewer during the whole time so that we never got into one of those "silent room" sorts of interviews where you really start to feel pressured to perform. Keep your mouth running and you'll coast through it. They'll assist as needed in case you are not familiar with tools, but the assistance will be like "so where would you go to find that answer" not "ok here's how you use gdb". You're expected to be self-sufficient in your debug skills.

Did they ask you only two questions during onsite?

OP here. Yes, for my interview on site they only asked two

What's 2^32 ?

6 Answers

You need to distribute a terabyte of data from a single server to 10,000 nodes, and then keep that data up to date. It takes several hours to copy the data just to one server. How would you do this so that it didn't take 20,000 hours to update all the servers? Also, how would you make sure that the file wasn't corrupted during the copy?

6 Answers

Find all pairs of 3 in an array that add to n.

5 Answers

Given a string, return true if after jumbling/rearranging the characters of the string will it be a palindrome. and false if not. eg: given string "evlel", it can be rearranged to "level" and thus it is a palindrome, and return true. eg: 1234 cannot be rearranged to become a palindrome hence false.

4 Answers

1. Review poorly written code and point out the places where it would fail. 2. Some things about scaling IT infrastructure.

3 Answers

Parse a log except, providing a csv of the count of each proc per DTTM.

4 Answers

Describe on a scale of 1 to 10 your familiarity with systems administration. Followup: which system call returns inode information? What signal does the "kill" command send by default ? How many IP addresses are usable on a /23 network. Can you describe a connection setup in TCP

4 Answers

What happens when I type "ps" into a UNIX prompt?

5 Answers

"What is your favorite networking protocol?" Follow-on: "What do you like about it and what don't you like about it?"

2 Answers
110 of 374 Interview Questions