29 Nov Teaching AI to Doodle
In July 2018, Google released its Wechat Mini Program of Quick Draw in China —— 猜画小歌 (“Guess My Sketch”). It awakened the hidden artists in a lot of hearts to sketch objects for AI to guess. See Quick Draw’s English page: https://quickdraw.withgoogle.com
Sure, it’s impressive that AI can guess our drawing. Can AI draw something creatively like a human?
Yes. In Oct 2018, Christie’s sold the first auctioned painting by artificial intelligence, for $432,500. It was created by a machine using Generative Adversarial Network (GAN), emulating the spirit of 15,000 human paintings from the 14th century to the 20th.
Similar to this painting, Zhechao Huang, a summer 2018 Yimian intern, did a project to teach AI how to doodle cars, axes, baseballs, and even the Great Wall of China!
This blog will give a brief explanation of how Generative Adversarial Network (GAN) works, present the AI doodles, and share some other interesting applications of GAN.
1. From “guess my sketch” to “draw me a sketch”
Asking AI to “guess my sketch” and teaching AI to “draw me a sketch” are two different problems.
Guessing the motif of a sketch is about recognition, or so-called “discriminative algorithm“. AI observes the data and identifies the label associated with it. However, drawing a sketch based on a given motif is about creation. AI needs to create data with a given label, hence the name “generative algorithm“.
Generative is much more difficult than discriminative. Creation calls for understanding. Deep understanding. In our own learning experience, only after we understand a class of objects thoroughly, could we start to create one.
It was well put by the theoretical physicist, Nobel Prize winner, Richard Feynman: “What I cannot create, I do not understand.”
Fortunately, recent development in Generative Adversarial Networks can help us teach AI to create doodles of a given subject by emulating the style of human works.
In a nutshell, a GAN includes two neural networks. One is a generative “student”, who diligently generates different drawings; the other is an adversarial “coach”, who judges whether or not the student’s work resembles the input data with a given theme. Through multiple iterations, the student-coach pair will create output dataset that is increasingly similar in style to the input dataset.
2. Let’s GAN it!
Zhechao trained a Generative Adversarial Network to doodle cars. Below, you can see two sheets of 50 cars each, one drawn by humans, and the other by AI using GAN. Can you tell which is which? Let’s hold the answer until the end of the blog.
The GAN reached its above doodling skill through many iterations.
After only 100 iterations, it was drawing like this:
After 1,000, it was getting better:
The final result took GAN 100,000 iterations.
Of course, besides drawing cars, Zhechao taught AI to draw other objects.
For example, The AI generated “axes” quite successfully.
The “baseballs” look decent too.
However, if the training dataset that human paint is unreliable, AI can’t create a better result. See the amusing rendering of “the Great Wall of China”.
3. More GAN applications
In many fields other than drawing, researchers have found impressive applications of GAN.
For example, the authors of a CVPR 2018 paper (Progressive Growing of GANs for Improved Quality, Stability, and Variation) used GAN to create human faces. The images created are extremely realistic and high-resolution, but none of them actually exists in the world.
In another example, SR-GAN was used to increase the resolution of a picture, or to remove the Mosaic.（See Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network, by Christian Ledig, et al）
GAN can also learn two different styles, and transfer a picture from one style to the other. In the paper Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks (Cycle-GAN), researchers empowered AI to learn arts. It could transfer a realistic photo into Monet style.
Now, let’s reveal the answer of the earlier question about doodled cars. Picture A is GAN’s work, and Picture B is the training set drawn by humans. Did you get it right? Admittedly, both A and B have clear lines, structured composition, and varied patterns, with the unmistakable theme of “cars”.