Spectrum Interview Question

How do you benchmark LLM performance?