DALL-E 2: Animals Riding Bicycles
A real task for DALL-E: painting 'playable' jigsaw puzzle images
In 1965 Minsky supposedly said (surely apocryphal?) “we’ll have a grad student solve the robot vision problem over the summer.” Just 40 short years later, the robot vision problem was solved, at the 2005 DARPA Grand challenge. Not a decade too soon!
AI innovation in the last decade has been much brisker. Convolutional Neural Nets became very accurate (AlexNet, 2012), Attention is All You Need was published (2017), Protein folding is predicted by AlphaFold-2 almost as accurately as experimental measurements (Jumper, Tunyasuvunakool, 2020), and GPT-3 (2020) and other large language models do human-like extrapolation of arbitrary text.
The latest milestone is DALL-E 2 (#dalle), which arrived on April 6th with a bang! It extends GPT-3 to generate images from short text descriptions. Like this one DALL-E and I generated on Friday, with the input “giraffe mother and baby in the style of van gogh”:
That’s not Van Gogh’s style, but it is a fun style, and the giraffes are expressive. This is how generative AI programs amaze us: by creating things we don’t expect computers to create.
Let’s go a step further: can DALL-E do a useful task? As it happens, Maya Gupta and I, who co-founded the AI recommendation site Didero, also run the wooden jigsaw puzzle manufacturer Artifact Puzzles,
and to make a great jigsaw puzzle you need the right kind of painting. A fun, ‘playable’ jigsaw puzzle starts with an interesting image that tells a ‘story’ or makes the viewer curious, is artistically appealing, has a variety of colors, and has interesting features throughout without boring parts (like the huge empty skies). Can DALL-E paint such images? Let’s find out!
Let’s start with a few simplifications. Often, stories in real paintings can be highly complex — think of Noah’s Ark, or The Garden of Earthly Delights. DALL-E, at least in my hands, is at its most brilliant with descriptions that have two interacting elements, so let’s start there. Also DALL-E is not for commercial use, and its images have only enough resolution for a 3”-3” puzzles (1024x1024). Let’s set those issues aside. Painting a good puzzle image is a meaty real-world challenge, even if we don’t plan to produce physical puzzle.
A troop of animals on bicycles would be a fun image. Let’s simplify that. Can DALL-E generate good animal-on-bike images one by one?
With DALL-E, I generated 12 sets of 10 images. My favorite is this bear. He is wonderfully huggable, with a storybook expression of uncertainty and teddy-bear legs, and the red bike pops nicely.
The bear painting is mostly three color regions, though, so it wouldn’t be ideal for a large puzzle.
I got a more puzzle-friendly image with a stained glass monkey. Those bright colors will make a fun puzzle, and they create a fun artistic mood in stained glass that fits well with the outlandish subject matter. And with at that friendly smile! Not sure how DALL-E managed such a specific facial expression in stained glass.
Notice how much heavy lifting the words “in stained glass” do. I happened to want more local color variation, and stained glass happens to be an artistic technique the relies on small patches of color.
Stained glass also worked well with “a hippo riding a bicycle, in stained glass” Stained glass Mr Hippo looks majestic. In contrast, paintings with hippos on bikes all looked a bit goofy.
I continued working to get DALL-E’s natural painting styles to work with large animals on bikes. My favorite is this “a painting of a giraffe riding a bicycle”. He does look pretty normal! Just riding his bike to the rural church where he’s minister!
This image of “a painting of a pelican riding a bicycle” is another favorite. Great pelican, and the added element of the gull is wonderful. But it does have way too much yellow sky to make a great puzzle.
Let’s close today by using DALL-E’s Edit and Variations features to add more story and more areas of color. Let’s put a pelican on his bike in front of a row of houses. I got the following image in three steps: (1) asked DALL-E for a “painting, colorful houses across a street” (2) used the Edit feature and asked it to add "a pelican riding a bicycle" (3) used the Variations feature. I generated 8 more sets of 10 images along the way, and this is my favorite.
This would make a great puzzle! The image reaches all four of the goals I laid out at the outset: it tells a story, has artistic appeal, is colorful, and has no boring parts. It’s playful, consistent in tone, and (unlike most of the images I didn’t choose to show) draws the foreground and background in a way that’s easy for the human eye to separate. If I had this puzzle in front of me, I’d be delighted to put it together. DALL-E has accomplished today’s task. DALL-E is not just astounding, it is ready to be useful.
This exercise has taught me a lot about DALL-E’s strengths and limits as a painter, and about what words and phrases work best to communicate the task, and I’m anxious to learn more. I give DALL-E another task next week. If you have ideas for next week’s task, leave a comment on substack or reply by email.
Dalle is fantastic. I’d love changing the car-ridden streets of iconic city centers of famous cities around the world to get rid of cars and turn the roads to a pedestrian area, a park, or a garden, using DALLE. It would awake people to the unique feeling of good pedestrian-oriented urban planning.
I understand you can select a region from an image to make something change. People seem to not be using that too much, but apparently it works really well. So start with a real picture of the city and add the park or garden!
• e.g. NY’s Times square being a park
• e.g. Rome’s colosseum surrounded by a garden
• e.g. London’s Oxford Circus pedestrianised