Monday, May 20, 2024
HomeiOS DevelopmentConstructing an AI Picture Recognition App Utilizing Google Gemini

Constructing an AI Picture Recognition App Utilizing Google Gemini

Beforehand, we offered a temporary introduction to Google Gemini APIs and demonstrated how one can construct a Q&A software utilizing SwiftUI. It’s best to notice how simple it’s to combine Google Gemini and improve your apps with AI options. Now we have additionally developed a demo software to exhibit how one can assemble a chatbot app utilizing the AI APIs.

The gemini-pro mannequin mentioned within the earlier tutorial is proscribed to producing textual content from text-based enter. Nonetheless, Google Gemini additionally gives a multimodal mannequin known as gemini-pro-vision, which might generate textual content descriptions from photographs. In different phrases, this mannequin has the capability to detect and describe objects in a picture.

On this tutorial, we are going to exhibit how one can use Google Gemini APIs for picture recognition. This straightforward app permits customers to pick a picture from their picture library and makes use of Gemini to explain the contents of the picture.


Earlier than continuing with this tutorial, please go to Google AI Studio and create your individual API key in the event you haven’t carried out so already.

Including Google Generative AI Package deal in Xcode Initiatives

Assuming you’ve already created an app venture in Xcode, step one to utilizing Gemini APIs is importing the SDK. To perform this, right-click on the venture folder within the venture navigator and choose Add Package deal Dependencies. Within the dialog field, enter the next bundle URL:

You possibly can then click on on the Add Package deal button to obtain and incorporate the GoogleGenerativeAI bundle into the venture.

Subsequent, to retailer the API key, create a property file named GeneratedAI-Information.plist. On this file, create a key named API_KEY and enter your API key as the worth.


To learn the API key from the property file, create one other Swift file named APIKey.swift. Add the next code to this file:

Constructing the App UI


The consumer interface is easy. It encompasses a button on the backside of the display, permitting customers to entry the built-in Picture library. After a photograph is chosen, it seems within the picture view.

To deliver up the built-in Pictures library, we use PhotosPicker, which is a local picture picker view for managing picture alternatives. When presenting the PhotosPicker view, it showcases the picture album in a separate sheet, rendered atop your app’s interface.

First, you’ll want to import the PhotosUI framework with a purpose to use the picture picker view:

Subsequent, replace the ContentView struct like this to implement the consumer interface:

To make use of the PhotosPicker view, we declare a state variable to retailer the picture choice after which instantiate a PhotosPicker view by passing the binding to the state variable. The matching parameter means that you can specify the asset sort to show.

When a photograph is chosen, the picture picker routinely closes, storing the chosen picture within the selectedItem variable of sort PhotosPickerItem. The loadTransferable(sort:completionHandler:) technique can be utilized to load the picture. By attaching the onChange modifier, you’ll be able to monitor updates to the selectedItem variable. If there’s a change, we invoke the loadTransferable technique to load the asset information and save the picture to the selectedImage variable.

As a result of selectedImage is a state variable, SwiftUI routinely detects when its content material modifications and shows the picture on the display.

Picture Evaluation and Object Recognition

Having chosen a picture, the following step is to make use of the Gemini APIs to carry out picture evaluation and generate a textual content description from the picture.

Earlier than utilizing the APIs, insert the next assertion on the very starting of ContentView.swift to import the framework:

Subsequent, declare a mannequin property to carry the AI mannequin:

For picture evaluation, we make the most of the gemini-pro-vision mannequin offered by Google Gemini. Then, we declare two state variables: one for storing the generated textual content and one other for monitoring the evaluation standing.

Subsequent, create a brand new perform named analyze() to carry out picture evaluation:

Earlier than utilizing the mannequin’s API, we have to convert the picture view into an UIImage. We then invoke the generateContent technique with the picture and a predefined immediate, asking Google Gemini to explain the picture and determine the objects inside it.

When the response arrives, we extract the textual content description and assign it to the analyzedResult variable.

Subsequent, insert the next code and place it above the Spacer() view:

This scroll view shows the textual content generated by Gemini. Optionally, you’ll be able to add an overlay modifier to the selectedImage view. This may show a progress view whereas a picture evaluation is being carried out.

After implementing all of the modifications, the preview pane ought to now be displaying a newly designed consumer interface. This interface includes of the chosen picture, the picture description space, and a button to pick photographs from the picture library. That is what it is best to see in your preview pane if all of the steps have been adopted and executed appropriately.


Lastly, insert a line of code within the onChange modifier to name the analyze() technique after the selectedImage. That’s all! Now you can check the app within the preview pane. Click on on the Choose Picture button and select a photograph from the library. The app will then ship the chosen picture to Google Gemini for evaluation and show the generated textual content within the scroll view.



The tutorial demonstrates how one can construct an AI picture recognition app utilizing Google Gemini APIs and SwiftUI. The app permits customers to pick a picture from their picture library and makes use of Gemini to explain the contents of the picture.

From the code we now have simply labored on, you’ll be able to see that it solely requires just a few strains to immediate Google Gemini to generate textual content from a picture. Though this demo illustrates the method utilizing a single picture, the API really helps a number of photographs. For additional particulars on the way it capabilities, please seek advice from the official documentation.



Please enter your comment!
Please enter your name here

Most Popular

Recent Comments