Binance Interview Question

How vision language model works?