This is a demo for the SmolVLM-SmolVLM2 model family on the NuScenes multimodal QA dataset.
You can select different model versions and predict answers to questions based on the camera feed.
Check out the SmolVLM2 collection
Check out the SmolVLM collection
Check out the NuScenes multimodal QA dataset