Avanade Interview Question

How to identify duplicates using Spark SQL from a datasets