Google has taken the credibility crisis of AI research to a new low

Illustration: Handout/Reuters/Ritzau Scanpix

There is currently no shortage of claims about technological advances in the field of AI. At any rate, the big commercial AI labs have thrown themselves into something most reminiscent of an elementary school fight in which arguments such as “my dad is stronger than your dad” are replaced with “my AI is better than your AI”.

In April, OpenAI announced its latest text-to-image generator, DALL·E 2. A month and a half later, Google announced its counterpart to DALL·E 2—the text-to-image generator Imagen. Google 1–1 OpenAI.

Photorealism and cute animals

Imagen and DALL·E 2 do basically the same thing. Both generate images based on text descriptions of what the images should contain. Most often, these are cute animals in silly situations.

Images generated with Imagen (left) and DALL·E 2 (right) based on the description “Hovering cow abducting aliens”. Illustration: Google Brain / OpenAI

Both OpenAI and Google have made a big splash in photorealism. In connection with its Imagen announcement, Google published a non-peer-reviewed article on the text-to-image generator, which it—already in the first line of the abstract—claims to have “an unprecedented degree of photorealism”. Google 2–1 OpenAI.

When OpenAI published DALL·E 2, its high-resolution photorealistic images gave rise to much media coverage around the world. Some of the publicity, however, was focused on the fact that when DALL·E 2 generates photorealistic images of people, the software tends to discriminate on the basis of, for example, gender and race. This led Maarten Sap, one of the external AI experts who tested DALL·E 2 in OpenAI’s “red team” process, to say to tech magazine Wired that “maybe it shouldn’t generate people or anything photorealistic”.

OpenAI has even documented the discrimination in a repository, in which the organization lays out the risks and limitations of DALL·E 2. DataTech has also previously written about this in the April edition of Ugens data.

For example, if you ask DALL·E 2 for an image of “an evil person”, you will only get images of men, and those men appear to be predominantly of Central American descent.

So OpenAI currently not allowing users to publish their own DALL·E 2-generated photorealistic images of people is probably quite sensible.

Images generated with DALL·E 2 with the prompt “An evil person”. Illustration: OpenAI

People are missing

The thing is, we know that DALL·E 2 is capable of generating photorealistic images of people. Sometimes the images are discriminatory, and sometimes the people are so deformed that you may start having doubts about the robustness of the technology, but the images are there. OpenAI itself has published a large number of them, and most can be found in the aforementioned repository.

Of course, there have also been many instances of people with access to DALL·E 2 who—against OpenAI’s wishes—have published photorealistic images of people that they have generated with the model.

One could raise a lot of criticism of OpenAI’s biases and strange lack of openness (especially if we take into account the company’s name), but it cannot be disputed that DALL·E 2 is able create reasonably compelling images of people. The same cannot be said for Imagen.

In fact, we have not seen a single Imagen-generated image depicting people. In Google’s non-peer-reviewed article, the developers explain well enough that “Imagen exhibits serious limitations when generating images depicting people”. Furthermore, they mention that Imagen, just like DALL·E 2, tends to generate depictions of people based on highly discriminatory stereotypes.

But it is nevertheless curious that we have not seen a single Imagen-generated photorealistic image of a human, a face, or even just of a little finger.

Show us at least a finger if the whole hand cannot yet be reproduced

Google has only given very, very few people the opportunity to test Imagen, and the tech giant does not expect to make the software available to more users. So that will probably not be the way for us to find out how good (or bad) photorealistic images of people Imagen can generate.

According to Mohammad Norouzi, research scientist at Google Brain and co-author of the Imagen article, Google will not publish such images itself either.


And, as is the case with commercial AI labs’ articles, Google’s article on Imagen does not provide enough details for the model to be reproduced.

Therefore, it may seem a bit odd that Google categorically refuses to prove that Imagen is actually capable of generating images of people. They could also just show us a little finger. After all, nobody can reproduce the whole hand.

Is Google missing the layers that Titian mastered?

The information that Google decides the public should or should not have access to about its products is hardly selected at random. This is one of the reasons why Google’s decision should give rise to questions about whether Imagen really is as good as Google claims.

It is quite well known that it is not a purely formal matter to create images that resemble people. It is an art, and it is not only difficult to master for DALL·E 2. When the great painters, such as Titian Vecellio, had to depict people, they painted layer upon layer upon layer to arrive at a depiction of human skin that is somewhat realistic, though still cannot be said to be photorealistic.

When we talk about generating photorealistic images of people in all conceivable and inconceivable situations, we talk about an age-old problem that is quite difficult to find a general solution to. A claim about “an unprecedented degree of photorealism” must be backed up by evidence, and that evidence has not been provided by Google. On the contrary.

Strange FID score

Google is trying to convince us of Imagen’s capabilities in other ways. In the non-scientific article on Imagen, the developers report that Imagen achieves a so-called FID score of 7.27 on the Common Objects in Context (COCO) dataset, which has become the industry standard for evaluating text-to-image generators.

FID score, short for Fréchet inception distance, is a metric used to assess the quality of images created by image generators. The FID score is calculated by comparing the distribution of artificially generated images with the distribution of real images. The lower the score, the more similar the artificial and the real images. An FID score of 0 would thus mean that the images are perfectly identical.

Google’s FID score of only 7.27 is the world’s lowest and thus the best score so far. The interesting thing here is that COCO contains almost 330,000 photographs, and those photographs contain over 250,000 people. Since people make up quite a large part of the COCO dataset, getting a good FID score would be quite difficult if the image generator is not able to generate people.

A worsening of the credibility crisis

So there is good reason to question the capabilities of Imagen. But there is also good reason to ask why Google is announcing Imagen without giving access to independent external testers and experts. Why can no one try out the software with “an unprecedented degree of photorealism”?

At present, it seems that most of the talk about Imagen is nothing more than talk. A narrative that serves as a means for Google to not be externally seen as losing ground to OpenAI.

Whatever the intentions behind the announcement of Imagen, it only further contributes to the current AI crisis, as large commercial AI labs make grandiose claims about research advances that cannot really be called research because the results can neither be replicated nor simply reproduced.

OpenAI certainly also contributes to the reproducibility crisis, but they at least allow external access, so it has been possible to confirm that DALL·E 2 works in accordance with OpenAI’s presentation of the technology.

With Imagen, Google has taken the credibility crisis in AI research to a new level. Google simply asks us to believe them. There is no evidence.

The match thus ends somewhat surprisingly with 1–0 for Google on the basis of the research.

DataTech has asked Google if Imagen is able to generate images of people. Google has declined to respond.