Build with Gemini, our largest and most capable AI model. Get an API key.

Leveraging the Gemini Pro Vision model for image understanding, multimodal prompts and accessibility

Explore how you can use the new Gemini Pro Vision model with the Gemini API to handle multimodal input data including text and image prompts to receive a text result. In this solution, you will learn how to access the Gemini API with image and text data, explore a variety of examples of prompts that can be achieved using images using Gemini Pro Vision and finally complete a codelab exploring how to use the API for a real-world problem scenario involving accessibility and basic web development.

Leveraging the Gemini Pro Vision model for image understanding, multimodal prompts and accessibility

Leveraging the Gemini Pro Vision model for image understanding, multimodal prompts and accessibility

Quickstart: Get started with the Gemini API in Node.js applications

Multimodal text and image prompting

Prompting with images and text using the Gemini API for accessibility

Quiz