Gen AI is Infrastructure that Constantly Changes

Generative AI is in an interesting place right now. Things are evolving quickly, at least for now, with new releases and updates being released with high frequency.

Let’s take the recent Google Gemini kerfuffle as an example. Users found that when they used Gemini to create images of certain historical contexts, the results were not as expected. Concretely, the people in the generated images were much more diverse than the historical context would allow for (non-white German WWII soldiers, non-white Founding Fathers, etc., see The Verge for some examples).

Of course the culture warriors jumped right on this. This was to be expected, but it’s non-news. That’s a tempest in a teapot situation, so I’m not going to get into that.

It’s important to understand what’s going on under the hood: Gemini (and other text-to-image generative AI models like ChatGPT/DALL-E, etc.) use a technique called prompt transformation. That just means that they take the text that users type to describe the image they want, and then they rewrite it to get to better results. This usually is a kind of quality control mechanism — users tend not to be great at prompting. It also is supposed to cut down on costs for the AI companies: Every image generation they run costs them money, and if users keep running their requests over and over again to get to a decent result, it’s a financial strain on the company. So they try to shortcut the process to get to better results more quickly.

Which is where things can get a little spicy. Because better is subjective and one of the things that’s done at this stage is to also counter some of the training data bias that lives on in the system. For example, the training data — and hence, the generative AI model that creates these images — is biased in quite predictable ways: A doctor is more likely to be pictured as male, a nurse is more likely to be pictured as female, for example. By rewriting the prompt behind the scenes (along the lines of “make the people more diverse”) you add some level of additional quality control. It’s not censorship, it’s a duct-taped attempt to fix some AI bias. But this can backfire, like when the system generates potentially offensive images like Asian, female Wehrmacht soldiers or native American looking Founding Fathers. Maybe this is the type of imagery a user wants, maybe it’s not, but it’s tricky terrain for sure.

So, the culture war BS aside, prompt transformation and continuous service updates is where things get interesting.

So there are 3 aspects here that I think warrant a closer look:

  1. AI is currently SaaS: AI companies continually tweak their services, including how they use prompt transformation. They do this based on new learnings, new user behaviors, new use cases, etc. If you use these tools in your work at production level, you essentially use it as infrastructure. But this is Software-as-a-Service (SaaS) in its purest form, and as such the outputs will keep changing. It’s hard to produce based on software that will continuously product different types of outputs.
  2. User expectations vs the State of Play: Generative AI is still in its infancy. These products are barely out of beta, they are evolving rapidly. They are, for anything but the most trivial or low-risk tasks, not ready for prime time. Yet, they have been proliferating so quickly and made it into the mainstream that many users simply aren’t aware of this. And the chasm between the State of Play (experimentation and learning!) and user expectations (professional, mature services!) appears to be growing rapidly.
  3. Search and generative AI have a complicated relationship: Generative AI is quickly finding its way into search engines. Google and Bing both started incorporating generative AI into their search engines. However, this is not as obvious a fit as maybe it seems at first glance. Can AI summarize articles? Sure. Can it find answers quickly? Under certain circumstances, it can and it does. But it also hallucinates, and creates images of BIPOC Founding Fathers, etc. — it’s just not entirely clear if and under what circumstances generative AI tools can navigate the incredibly nuanced demands of sometimes working within historically accurate contexts and when to break out of those confines. And while I cannot guarantee it, I’d be surprised if this is a type of problem that can be solved through engineering. As we all know well, complex social problems are rarely solvable through technology.

Anyway. Generative AI: It’s complicated. Careful where you step.