Google's Image Problem

When Google bought out the British company DeepMind, they also had access to the team that effectively invented generative AI - and yet the company didn't really seem to understand what it had or how that would link to its core search business. As such, many of its tools seemed to languish for a couple of years, allowing an upstart called OpenAI to release a tool you may have heard of: in the very brief period since it was first released, ChatGPT has swept all competitors before it, at least in the eyes of the general public, and when competitors release models that are easily as good such as Gemini, they simply aren't used anywhere as near as much.

Ever since I first used it in 2024, NotebookLM has been a revelation, not so much for its podcast ability - which I played with relentlessly until it became irritating in its lack of customisability - as for providing a pretty decent RAG (retrieval augmented generation) application that tends to just work, in contrast to my local models which seem to need endless fine tuning to do what I really want. It's not perfect, but as an add-on to my research workflow it has become invaluable. Certainly when I introduce it to students, they realise very quickly just how much more flexible it is for their activities than CoPilot (our officially sanctioned university tool).

AI Studio is, in some respects, an alternative interface for Gemini. It's simplicity when you first log is certainly more streamlined than NotebookLM, but this assumes that you have selected the option to work with Gemini directly, rather than the APIs which make this a powerful tool for developers. We'll return to the potential for creating applications later.

One bonus in AI Studio is the inclusion of Veo 2, Google's latest video generator.

How does it work?

Sticking with the UI rather than API mode for a moment, the streamlined interface offers a chat window for Gemini in default mode, with four other tabs down the side: Stream, Video Gen, Starter Apps and History. Chat is the one that most (non-developer) users will be familiar with: a text box with some prompt suggestions to get you started. On the right, however, is a selection of options, the most important of which is the ability to select different models, whether Gemini 2.5, 2.0, or even 1.5 and Pro or Flash versions, the former thinking models and the latter offering lower latency and thus better performance. It is also possible to adjust other settings such as model temperature and whether the model is effectively censored or not (which is a welcome improvement over Gemini which tends to treat all users with kid gloves).

Gemini is a multimodal model, meaning that it can handle text, image and video, and the various formats can be particularly helpful when it comes to users wanting a more accurate, if sometimes slightly slower, version or one that zips through uploads and datasets more speedily. For non-developers, there's no real reason to switch away from the 2.5 Pro model (yes, it can be slower, but not to an unreasonable extent), but for those wanting to integrate Gemini into a web site or app the older models have one huge bonus: they are cheaper - sometimes much, much cheaper. If you stick to the UI version, AI Studio remains free, at least for the present (and in contrast to Gemini which, at the time of writing, costs £18.99 for the advanced upgrade). Clearly Google aims to gain revenue from developer integration, but is also pricing the API keenly to encourage more developers to use it.

Of the other features, Video Gen is good fun, making use of Google's Veo 2 engine to generate video, whether from a text prompt or remixing an image. It is capable of producing up to 8 seconds of video at HD (720 dpi) resolutions. With tweaks of the prompt, it can be used to create quite good looking content although, as with most of video AI at the moment, it doesn't quite feel up to professional standards - yet. What did impress me, however, was the speed: no video took longer than sixty seconds and most were up and running in less than 30. There isn't any indication of limits that I've found - yet - for creating video in the UI and files are saved to your Google Drive (which can be accessed from the History tab).

While Video Gen was what I spent most time playing around with at first, Stream is actually the one that feels most magical. Effectively, it's Google's answer to ChatGPT's advanced voice mode, allowing you to chat naturally to Gemini and have it understand and answer your queries using natural language. And it's good. Really good. But even that didn't prepare me for Gemini's ability to work with objects that you show it via your webcam, demonstrating how image recognition has not only developed in recent years but that AI can handle it with ease. At present, I am as hooked on Gemini Stream as I was with the podcast feature in NotebookLM, my only frustration being the ten-minute time restriction that is in place for each session.

For Developers

The UI features of AI Studio alone are reason enough to choose it over the competition, but it's also quite clear that Google is really pushing the Studio at developers. Using the API is as simple (almost) as clicking the button Get API Key, but in practice you'll also want to check documentation to ensure that you can integrate it into your apps correctly. Also, as indicated previously, Google is quite up front about the various costs for using different models of Gemini, ranging from a few cents to $10 per million tokens.

For experienced developers, access to the API and documentation will be most of what is required (and I didn't have an opportunity to test collaborative features, so can't comment on those here). But to encourage new users to coding via machine learning there are some sample starter apps which, like Video Gen, are good fun as showcases for some of Gemini's capabilities. I think my particular favourite is the Video Toys section, which generates quirky web apps from YouTube urls, and which I used to create a Star Wars original v. prequels trivia quiz game as well as one to learn basic piano chords. While the starter apps can soon run into limitations, they are also a sign of the real potential of generative AI far beyond vibe coding, repurposing content online for a whole range of interactive applications with a simple description.

AI Studio is not perfect, but some of its quirks (limited video capabilities, time restrictions on natural language processing) are as much due to the current state of AI models as any flaws in the programme itself. Gemini is definitely an LLM that deserves much more use and attention than it gets at the moment, and even casual users can take advantage that Google's desire to see more people turning to it from ChatGPT has led it to offer a number of tools for free in the UI version. It has become my go-to choice for AI testing and development at the moment, mainly because it is so much easier to use than many comparable apps, while the fact that it is free to use in UI format will encourage a lot of people to experiment.

Rating

Features

Ease of use

Value

Overall

Where man is not, AI is barren.

Google's Image Problem

How does it work?

For Developers

Rating

Google AI Studio

Sign up to the Human Digest.