SmolVLM2 VQA Demo (NuScenes multimodal QA dataset)

This is a demo for the SmolVLM-SmolVLM2 model family on the NuScenes multimodal QA dataset.

You can select different model versions and predict answers to questions based on the camera feed.

Select Model Version

CAM_FRONT_LEFT

CAM_FRONT

CAM_FRONT_RIGHT

CAM_BACK_LEFT

CAM_BACK

CAM_BACK_RIGHT

Question

Expected Answer

Predicted Answer

Correct?

Inference Time