Features > OpenAI Vision

OpenAI Vision is the way of extracting what the input image contains in the world of AI in seconds! ✨ In React Native Starter AI, we build the Image to Text AI Generation flow with OpenAI Vision API. In the default AI flow, we are extracting the nutrition info from the input image and show it in the screen. The codebase is pretty flexible and you can change that logic and implement your own very quickly! Or you can even use the existing logic and publish your own calorie tracker mobile application as well!

Setup OpenAI Vision API

In AI Services Docs, setting up OpenAI API key and getting started with working backend and mobile application is explained. You can check there if you haven't already!

Running Text To Image Flow

In the text to image generation flow, we are directly hitting ChatGPT-4 with an image input and asking ChatGPT to give us some data about the image. In our niche use case, we are it to give us nutritional info about the meal in the image. The API only gets 1 argument in the request body as below:

{
    "inputImg": "<BASE_64_IMG>"
  }

The code basically consists of 3 ChatGPT calls. 2 of them are directly asking ChatGPT to answer with plain string for title and description of the input image. We are getting the food's title and nutritional info description with prompts below:

react-native-starter-backend/functions/src/utils/construct-image-to-text-body.ts

messages = [
  {
    role: "user",
    content: [
      {
        type: "text",
        text: `tell me the name of this food. Strictly just tell the name as title and don't add anything else.`,
      },
      {
        type: "image_url",
        image_url: {
          url: `data:image/png;base64,${inputImg}`,
        },
      },
    ],
  },
]

react-native-starter-backend/functions/src/utils/construct-image-to-text-body.ts

const messages = [
  {
    role: "user",
    content: [
      {
        type: "text",
        text: `tell me the nutrition info in this food as a paragraph without bullet points. Mention about how healthy it is and the vitamins/minerals inside. The paragraph should not exceed 80 words.`,
      },
      {
        type: "image_url",
        image_url: {
          url: `data:image/png;base64,${inputImg}`,
        },
      },
    ],
  },
];

After getting title and description info about the food, we are now asking ChatGPT to give us some JSON about the nutritional info about the product about the carbohydrates, calories, proteins and fats in the food like below:

react-native-starter-backend/functions/src/utils/construct-image-to-text-body.ts

messages = [
    {
      role: "user",
      content: [
        {
          type: "text",
          text: `tell me the nutrition info in this food. The output should be as below strictly without any additional text:

              {
              "carbohydrates": "10",
              "calories": "100",
              "protein": "100",
              "fats": "100"
              }
              `,
        },
        {
          type: "image_url",
          image_url: {
            url: `data:image/png;base64,${inputImg}`,
          },
        },
      ],
    },
  ],

After getting all the info, we are just returning the response to the mobile application and in the UI, we are basically rendering the outputs. This is how we are directly using OpenAI Vision API but as mentioned above, this is just a niche example; you can modify this logic depending on your needs easily!