An AI-generated image of a 50s-style comicbook robot painting ones and zeroes.

It’s not art, it’s data

Up to this point in the Digital Detox, we’ve been exploring text-based generative AI tools. In today’s activity, we’ll be working with image-generating AI tools. If you’ve heard about these tools in the news, you’ve likely heard about their tendency to generate weird images (“this reindeer has five legs!” or “how many fingers does that person have?”) or about the controversies (and now legal cases) surrounding their use of artists’ intellectual property as data for what they generate.

AI bias?

You may have even heard about how these image-generating AI visualize bias, both in the data they draw from and in what they produce. Last year, Bloomberg used the generative AI image maker, Stable Diffusion, to produce 300 images for 7 “high paying” jobs and 7 “low paying” jobs, and three categories related to crime. Their analysis of the 5,100 images generates showed “that image sets generated for every high-paying job were dominated by subjects with lighter skin tones, while subjects with darker skin tones were more commonly generated by prompts like ‘fast-food worker’ and ‘social worker.’” They also found that, “most occupations in the dataset were dominated by men, except for low-paying jobs like housekeeper and cashier.” (this entire article is worth a read!) These issues aren’t surprising given that the models were trained on data from the Internet and then refined by people with various biases. The troubling thing with most LLMs is that we know that there’s bias in the training data, but we can’t see or understand the scale and scope of that data. We also can’t see what additional bias is built into the human-side of the training process or built into the algorithms themselves. It’s important to remember that, despite the mathematical abstraction involved, these processes remain very flawed in strangely human ways in terms of what they produce.

A DALL-E image of a sea otter in the style of the Girl with a Pearl Earring and then outpainted in Photoshop to contain otter paws holding an oar.

AI image generation basics

It may be helpful to have a short primer on how AI image generation works. Without getting into too much technical detail (though if that interests you, read this Wired article–and don’t miss the comparison between DALL-E 1 to DALLE-2 versions of hedgehogs using calculators), image generators are “fed” a diet of billions of images, paired search terms, and image descriptions/captions collected from the internet. A model is created and trained by using deep learning algorithms to help the model identify shapes and patterns in those images. Once the model has been trained, it can generate new images based on the patterns and images it has learned. Image generating models can vary quite a bit from one another, based on the data inputted, the training model, and how they’re programmed to receive inputs (see this comparison between the Stable Diffusion model and the GANS model). Even if you don’t read the article, you can get a hint of some of the complexity here. It’s not hard to imagine how our legal system (not to mention judges and politicians) will struggle to understand, let alone regulate, AI-related activities for many years to come.

As these image generators and their models become more sophisticated, the kinds of outputs they can generate also become more sophisticated. For example, some generators now allow:

Outpainting: Takes a base image and expands the boundaries beyond what currently exists. You can see that in DALL-E’s expansion of Vermeer’s “Girl with a Pearl Earring”.
Inpainting: Allows you to modify an existing base-image. You can add or remove content in sophisticated ways. You can see this in Stable Diffusion’s example where they give Vermeer’s “Girl with a Pearl Earring” different hair styles. (While we can’t be sure why people involved in AI are obsessed with this painting, ChatGPT’s answers seemed reasonable
Variations: Rather than prompting with text, or solely with text, you can use an image as a prompt that will create more images in that same style or create different styles of the original image (yet another “Girl with a Pearl Earring” example). This type of generation tends to create the most concern from artists.

Multiple techniques and software options can be combined to create increasingly divergent images like a sea otter in the style of “Girl with a Pearl Earring” from DALL-E and then expanded (outpainted) to include the otter holding an oar (as seen on the left) in Photoshop.

What is art?

With all of these capabilities at users’ fingertips, you can imagine why artists and photographers are concerned. Several lawsuits have been filed, including a class action lawsuit filed by a group of artists against several AI companies for alleged Digital Millennium Copyright Act (DMCA) violations and a lawsuit filed by Getty Images against the company Stable Diffusion for copyright infringement. The artists’ lawsuit alleges that “When used to produce images from prompts by its users, Stable Diffusion uses the Training Images to produce seemingly new images through a mathematical software process. These ‘new’ images are based entirely on the Training Images and are derivative works of the particular images Stable Diffusion draws from when assembling a given output. Ultimately, it is merely a complex collage tool.” Yet others argue that “if trained properly, latent diffusion models always generate novel imagery and do not create collages or duplicate existing work—a technical reality that potentially undermines the plaintiffs’ argument of copyright infringement, though their arguments about ‘derivative works’ being created by the AI image generators is an open question without a clear legal precedent to our knowledge.” Copyright law was murky and complex before AI. The rapid rise in AI capabilities, and the technological complexity behind them, will only compound these issues.

Questions about how we define art and how we understand the creative processes that lead to art lay at the heart of these controversies. Even though humans have long used technologies to support the creation of art (paintbrushes, cameras, software, and yes, even AI), generative AI seems to further separate the human from the act of creation. At the same time, AI relies absolutely on humans to create the foundation of the process by creating the massive amount of original works required to train these models. Does writing a prompt for an image generator constitute an act of creation? Or does art require more than a prompt? It’s not a simple question and it grows more complex every day.

Technology, math, and art have been interwoven throughout history. Islamic geometric art is an early example of algorithmically-derived art that has strong parallels to today’s algorists and the mathematical foundations driving the art they create. Similar algorithms are at the heart of AI image generation. At the time of its invention, debates flared about whether photography could be considered art due to the technical, rather than human, aspect of the process. Charles Bauldelaire claimed that “[i]f photography is allowed to supplement art in some of its functions, it will soon have supplanted or corrupted it altogether, thanks to the stupidity of the multitude which is its natural ally” (Révue Française, Paris, June 10-July 20, 1859). It would be interesting to get his opinion on modern issues around monkey selfies, whether photos of paintings are creative works, how to think through photography in virtual worlds, or the kind of image generation that’s now possible with AI. Whatever his opinion, it has become harder and harder to determine the creative act and to separate that act from the technology it’s entwined with.

Though there are no straightforward answers here, these discussions are worthwhile. We may decide, as New York Times’ Jason Farago suggested, to adopt a higher standard for human creativity, rather than reducing our expectations to what generative AI can produce. He writes, “Insofar as A.I. threatens culture, it’s not in the form of some cheesy HAL-meets-Robocop fantasy of out-of-control software and killer lasers. The threat is that we shrink ourselves to the scale of our machines’ limited capabilities; the threat is the sanding down of human thought and life to fit into ever more standardized data sets.”

An AI-generated image of a robot painting a picture of a robot.

Activity: Creating increasingly-complex generated images

We’ve found that different image generators perform very differently with the same prompts and that different systems have very different options. This activity will help you get the lay of the land.

Activity: Exploring AI-image generation

Step 1: Pick your tool

Choose at least 2 AI image generators to use for this activity. Need some options? The following are all free options (note: “free” usually means you’re paying with your data and that your capabilities will be somewhat limited. We encourage you to read privacy policies for these tools.).

Stable Diffusion has free and paid options. You are able to try it out without an account and the company states “We don’t collect and use ANY personal information, neither store your text or image.” Make sure you check out the advanced options for more control.
Craiyon also has free and paid options. You can try it without an account. Warning: There are a lot of ads!
Bing’s image generator is based on DALL-E 3. You will need a Microsoft account to generate images. Tom has found it to be pretty high quality. If you’re a Middlebury faculty, staff, or student, you will have to use a personal email address.
Adobe’s Firefly requires a free account. If you’re a Middlebury faculty, staff, or student, you can use your Midd credentials.
Midjourney is on the higher end of the quality scale, but it’s also on the higher end of hassle scale. Their product runs in Discord which requires you to download an app and create an account (directions). Tom has never had any luck creating free images here (it’s always too busy), but you can see what other people are generating, the prompts they’re using, etc. The scale and complexity is pretty impressive.

And, of course, check out the Nope! options if you’re looking for alternative ways to participate.

Step 2: Build your base

Establish a solid base prompt. On the two tools you chose, refine your prompt to generate a fairly simple image that you’re happy with. Try requesting, for example, an image of someone iconic (who is likely to be well-represented in the data being used by the model). For example, you could ask for an image of Marilyn Monroe or Superman. Remember our prompting activity?

Step 3: Add some flair

Now that you have a recognizable image on both platforms, let’s add some detail. Rather than just asking for “Michael Jackson,” consider adding contexts to your prompt to help the image generator with its computations (e.g., something like “close-up photo of singer Michael Jackson wearing his iconic red jacket”). Think about the same language you’d use to find an image like you want to create.

You may have issues with the Code of Conduct blocking you for more modern celebrities. Bing can be particularly sensitive.

For a number of the platforms, you’ll also have built in menu options that offer particular styles, aspect ratios, etc. For some, like MidJourney, you need to learn the command language.

Step 4: Transform it

One particularly intriguing aspect of AI image generation is the ability to quickly and easily create really strange images. Now is your chance to mix and mash styles or to take that base prompt and create something you just aren’t going to find on the Internet. Consider asking for “Marilyn Monroe in the style of Mona Lisa” or “Taylor Swift in the style of Edvard Munch’s The Scream” but don’t be afraid to wander into even stranger territory. Maybe you’ve always wanted to see Abe Lincoln riding a unicorn (But you might have to settle for “An image of a tall, presidential man with a beard and a stovepipe hat riding on a glorious unicorn with rainbows.” because Bing keeps blocking you.) Have fun with the language.

Most services have a “top pics” or “recent images” type of page that allows you to see the image and the prompt that generated it. This is a great place to browse for keywords and techniques to improve your own images. Stable Diffusion even has a searchable database of images and prompts.

Step 5: Share your output

Share the image you felt was the worst. What did you ask for? How would you describe what you got?

Share the image you generated that was most impressive. What made this one work?

Reflection: Where do you perceive AI image generation as a threat to artists?

You can do this below.

Note: Posts will be moderated, so you should see your post appear within 24 hours of submission.

Step 6: Dig deeper

AI-generated image submission

Submissions

Nope! Options

There are a variety of ways to run image generators on your own computer. The easiest path we found on a Mac was to use Diffusion Bee. For PCs, you might want to try NMKD Stable Diffusion GUI. Depending on what your concerns are (and your technical interest), you can also browse the available Hugging Face text-to-image models. From models trained just on Mickey Mouse stills from 1928 to models made solely to create Lego mini-figures. There’s even a model trained solely on public domain images. There’s likely a model out there that meets your needs and some of these models will also let you try them out on the page (please do consult Hugging Face’s privacy page).

It’s not art, it’s data

AI bias?

AI image generation basics

What is art?

Activity: Creating increasingly-complex generated images