Go Vivace Interview Question

Why and when do we use multi-headed attention module in Natural Language Processing