On April 9th, I saw the images from openai’s DALL-E 2 marketing, showing that high quality text to image was possible. I was blown away! I quickly scoured reddit and signed up for the DALL-E 2 beta program. Then, I waited. After waiting for a couple of weeks, I started looking for alternatives that were available now. There was something called Mid Journey, but it was also closed beta. And then, I discovered an open source project called Disco Diffusion. It is a python notebook that you can run for free in Google Colab.

I watched a couple of YouTube videos by Quick Eyed Sky, and started my journey learning what I could about all the knobs you can turn in Disco Diffusion. The best part about it is that it is open source, and configurable. I can script it, switch the models it uses, turn on and off flags. But despite all of the configurations, the most important thing is to choose a good prompt.

Prompts

A prompt is a string of text that the model translates to an image. The model is trained on captions and alt tag text associated with images scraped off the internet. The model tries to predict which of thousands of captions is best associated with the image. Surprisingly, this means you can sometimes append click-baity text like “Look at that detail!” and get great detail in your image.

Text to image explorers who started before me discovered that putting “trending on artstation” on the end of the prompt drastically increased the quality of the resulting image. I’ve found this to be true. You’ll get more detail, better compositions, and better color schemes.

Here’s the first prompt I tried in Disco Diffusion: “A beautiful painting of a green glowing ghost gliding through a dark fantasy forest with mystical creatures, Trending on artstation.” It worked! A couple of hours later, using a free K80 GPU on Google Colab, and I had this image:

I wish I could end it there, but I actually spent the next month learning to use this software and sometimes, make interesting pictures with it. As you change the words you use, you can get closer to your original vision. There is a random aspect to it, where sometimes you write a good prompt, but got an unlucky starting seed, and the image develops poorly. The AI can be frustratingly literal at times too. A prompt for “rain forest” is likely to render umbrella shaped trees for example.

But there is nothing more rewarding than seeing a beautiful image take shape, exceeding your expectations in some ways, but always a way to improve it. A successful prompt will often always generate a pleasing image, rarely producing a dud.

I subscribed to Google Colab Pro which allows me to use a slightly faster GPU in the cloud and make pictures in approximately 20 minutes to an hour. I quickly settled on a resolution to use, 1152 x 832, a good resolution for printing them on cards at 300 DPI.

Links

Here are some resources to learn more about how to choose artists, and art styles, for your prompts.