My main interest in DALL-E-2 is understanding how good it really is. What can it do? How well? Well enough to be useful? Consistently? This leads me to this weekend’s question: are all those great DALL-E images people keep showing us reproducible?
You probably already know that DALL-E generates 10 images per prompt, and those 10 often look very different from each other. Some are better, some are worse, often each one looks like it was created by a different person. So I’m not going to ask if DALL-E can generate an image identical to an earlier one. Rather, can DALL-E generate an image that is as impressive and as useful as the earlier one, given the same prompt?
For today’s experiments, I worked mostly from the r/dalle2 subreddit, my favorite place to browse great DALL-E images. I found great images, then thought about what impressed me about them. I’ll comment on those aspects as we go. I issued each prompt once and chose the best reproduction from the 10 results.
Let’s start with “Painting of a family of tiny hippos inside of an old fashioned vintage suitcase” from u/Wiskkey.
Great painting! Aren’t they fun? Love the facial expressions, the vintage suitcase, and the consistent painting style.
I rated my best reproduction 4/10. Some of the hippos are poorly drawn, but I am impressed that there’s a family of hippos there at all, because I’ve seen DALL-E struggle with complexity. Their facial expressions have some character, but they aren’t as good as the original. I also had to choose a less fun vintage suitcase, and overall the painting has less style.
Food photography of “chili pepper and cucumber cheesecake”, 85mm f1.2, extremely detailed is my favorite of u/cench’s ongoing series of great DALL-E food photographs. It looks yummy, and very well prepared, with the pepper flakes on cucumber slices positioned all about, and it is, as specified, extremely detailed.
My best repro get 6/10. It’s accurate and looks like a photo, but it’s a far less pretty cheesecake, and I don’t love the choice about how much blur to include.
Another image was much prettier, but is more pistachio-raspberry than cucumber-chili.
“Fantasy alchemist lab wallpaper 4k hd” from u/UnderSampled has a wonderful green foreground object and lots of supporting pieces. I wish DALL-E could create images like this all the time.
With this prompt, it mostly does. My best reproduction (7/10) gets most of the original’s virtues most of the way there. Its biggest problem: what might be an unrealistic crack on the front of the main flask.
”Hot water bottle in the shape of a brain” from u/danielbln does a great job of looking like a hot water bottle and a brain.
One of my good repros is here. I’d rate it 9/10. It doesn’t reproduce the cloth exterior that he didn’t ask for, of course. It is a flaw that the cap is clipped off. This is our best reproduction so far.
“rare steak with ice cream and cherries, cookbook photo”, also from u/danielbln, is another great food photograph. While the side of the steak and the ice cream are not completely convincing, the top of the steak, the cherries, and the sauce look great.
What happens when we try to reproduce? Again we do very well. It also has ice cream that’s not completely convincing, and it skips showing the raw cross-section, but those were both weak points of the original, so I’ll score this 10/10. Bravo.
Let’s close with “Unicorn on a moon digital art” from u/callidoradesigns. This is a fun cartoon, with a nice unicorn positioned nicely on a nice moon.
My reproduction is almost as fun. The felt moon is fine (and weird!). Like in the original, the unicorn isn’t quite facing us, and might be a asleep, but let’s score it 9/10.
To sum up, we got scores of 4, 6, 7, 9, 10, and 9, for an average of 7.5 on our arbitrary scale of image quality as reproduced from an English prompt. This is impressive. I literally chose these as the most impressive images I found on /dalle2 this week, so we might expect them to be very hard to reproduce. Common flaws included lack of prettiness and lack of detail, and the biggest single failure was the my family of hippos were not well drawn. I have to say, I went into this experiment expecting much lesser results. Great stuff.
If you see a DALL-E image you’d like to see reproduced from its original prompt, leave a link in a comment.
More Reading on DALL-E
A good short post on How to use Inpainting to specify style. “I started with a sticker illustration that Dalle made that I liked the style of, and erased parts of it leaving only strokes of color. This did much better.”
The “Secret Language” tweet and paper. Reply: no there’s no secret language
Yes, Randall Munroe uses DALL-E. But can DALL-E use Randall Munroe? Let’s end with Unicorn on a moon in the style of xkcd. Not quite xkcd, but at least on the same planet.