Over the last few years Deep Learning was applied to hundreds of problems, ranging from computer vision to natural language processing. In many cases Deep Learning outperformed previous work. Deep Learning is heavily used in both academia to study intelligence and in the industry in building intelligent systems to assist humans in various tasks.
The goal of this post is to share amazing applications of Deep Learning that I've seen. I hope this will excite people about the opportunities this field brings, as well as remind us that every new technology carries with it potential dangers. I believe the latter is especially true about Deep Learning, and I hope that by exposing people to all these amazing results I can encourage more discussion on the topic.
There are many different applications and this list below is in no way exhaustive. So if you know of other cool applications I would appreciate it if you can mention them in the comments. To keep this easier to follow I organized the different applications by category:
- Deep Learning in computer vision and pattern recognition
- Deep Learning in computer games, robots & self-driving cars
- Deep Learning creating sound
- Deep Learning doing art
- Computer hallucinations, predictions and other wild things
- The future of AI
Computer vision and pattern recognition
1. Deep Learning reenacts politicians in real-time
A group from Stanford created Face2Face, a system that captures a face and reenacts it into YouTube videos in real-time. The video shows quite striking examples of known politicians. Elements of the same method can be used for 3D reconstruction of scenes from videos.
2. Restore colors in B&W photos and videos
Don't like black and white images? No worries, "Let there be color!" is a computer system that can automatically restore colors in B&W photos. You can read more about it here and see plenty other examples here.
A similar approach can even be used to colorize old B&W films:
The Deep Learning network actually learned patterns that naturally occur in photos - the sky is usually blue, clouds are often white/gray and grass is typically green in order to restore these colors. But it did so itself from its past experience without human intervention. It makes mistakes sometime, but they are pretty hard to spot. For example, here are two black & white photos with both what the computer guessed and the right answer to it. What do you think is the real image? Feel free to write your guess in the comments below.
3. Pixel restoration CSI style
In the show CSI they often zoom into videos beyond the resolution of the actual video. This seemed completely unreliable and there are even a few videos on YouTube like the one below where people explain they don't watch CSI because that is unrealistic.
Well, it was unrealistic until Deep Learning. Early in 2017, Google Brain researchers trained a Deep Learning network to take very low resolution images of faces and predict what each face most likely looks like. They call the method Pixel Recursive Super Resolution which enhances resolution of photos significantly. In the image below you can see the original 8x8 photos, the ground truth (which was the real face originally in the photos) and in the middle the guess of the computer. Obviously it is not perfect, as it cannot be, but it is pretty unbelievable that the computer can guesstimate so well many of the features of the person in the photo.
4. Real-time multi-person pose estimation
Deep Learning networks can now greatly aid animators in estimating the poses of people. Nowadays they can even do it in real-time. A work by Zhe Cao et al taught a neural network to estimate the position of human's skeleton. In the video below you can see over a dozen people dancing, while the network knows where they are and how they move. This is done without having any devices on them, only by analyzing the video!
5. Describing photos
We are all used to see computers automatically classify our photos. For example, Facebook can automatically tag your friends. Similarly, Google Photos can automatically label your photos for an easier search. In fact, take a state-of-the-art network and train it on ImageNet, the biggest database of labelled image and it will be able to classify objects better than a PhD student who trained on the same task for over 100 hours.
But these are just labels, and Deep Learning allows taking it several steps forward and describe all the various elements in a photo. In a work by Andrej Karpathy and Li Fei-Fei, they trained a Deep Learning network to identify dozens of interesting areas in an image and write a sentence to describe what happens in each area. This means that the computer not only learned to classify the elements in the photo, but to actually describe them with English grammar. You can play with hundreds of other examples in this demo.
6. Changing gazes of people in photos
This one is a little weird. Imagine you have a photo of someone, like a friend or a relative. In DeepWarp, Ganin et al trained a Deep Learning network to change the gaze of the person. You can even try it with your own photos here (warning: link doesn't always work).
7. Real-time analysis of behaviors
So Deep Learning networks know how to recognize and describe photos and they can estimate people poses. DeepGlint is a solution that uses Deep Learning to get real-time insights about the behavior of cars, people and potentially other objects. This is an application of Deep Learning that is on the sketchy side, but it is worth being familiar with.
8. Iterating photos to create new objects
A work by Nguyen et al let a Deep Learning network synthesize novel photos from existing ones. The results are beautiful and show how the network iteratively creates new photos of objects that were not in any way in the image before.
The network created gorgeous photos of erupting volcanoes as well as flowers, birds, faces and much more.
9. Generating photos of galaxies
We don't really have to stop in terrestrial object when studying the natural world using Deep Learning. Astronomers are now using Deep Learning to create photos of galaxies as well as volcanoes.
Google Translate app can now automatically translate images with text in real-time to a language of your choice. Just hold the camera on top of the object and your phone runs a deep learning network to read the image, OCR it (i.e. convert it to text) and then translate it. Languages will gradually stop being a barrier and we will be able to communicate with other humans universally.
11. Saving whales and classifying Plankton!
As we've seen, Convolutional Neural Networks are a Deep Learning architecture that learns to classify images amazingly well. This has thousands of applications from biology, astronomy, food and more. For example, by classifying photos of whales we can better study populations of endangered whales.
Other examples are Plankton classification and plant classification.
12. Create new images
The same idea as in "Let there be color!" can be used to for a Deep Learning network to create other types of new images. In Pix2Pix, Isola et al taught a Deep Learning network to perform multiple tasks : create real street scenes from colored blobs, create a map from a real aerial photo, turn day scenes into night and fill out the colors between edges of objects.
The last example is pretty cool, in many cases the computer gets pretty creative about the designs of the objects.
13. Reading text in the Wild
Oxford Visual Geometry group used Deep Learning to "read text in the wild". This is an attempt to read text from photos and videos to extend Google so we can search for for text from BBC News videos. Try it out by tapping on the search below.
14. Estimate solar savings potential
Google Sunroof uses aerial photos from Google Earth to create a 3-D model of your roof. The project uses Deep Learning neural networks to separate your roof from surrounding trees and shadows. It then uses the sun's trajectory and weather patterns to predict how much energy can be produced by installing solar panels on your roof.
Computer games, robots & self-driving cars
15. Winning Atari Breakout
Google's DeepMind used a Deep Learning technique called Deep Reinforcement Learning to teach a computer to play the Atari game Breakout. The computer wasn't taught or programmed in any way specific to play the game. Instead, it was given control of the keyboard while watching the score, and its goal was to maximize the score. Initially, it sucks as the movements are mostly random. After two hours of playing the computer is an expert. After four hours of playing the computer realized that digging a tunnel through the wall is the most effective technique to beat the game.
16. Beating people in dozens of computer games
Don't like Breakout? The Deep Learning community is currently (March 2017) in a race to train computers to beat people at almost any game you can think of, including: Space Invaders, Doom, Pong, Gathering and dozen of other games. In the majority of these games Deep Learning networks already outperform experienced players. The computers were not programmed to play the games, instead they just played the games for a few hours and learned the rules by themselves.
17. Playing Doom : example of violence in Computer Games
But there are also some red flags when we let Deep Reinforcement Learning Networks play computer games. For example, when playing the game Doom, the computer kills twice better than a human player and gets killed much less. Not to get overly apocalyptic about this, but it somehow reminded me of the 1992 film Universal Soldier where Van Damme and Dolph Lundgren were reanimated after getting killed in Vietnam. Deep Learning networks also adopted to be manipulative and aggressive in certain cases. When playing Gathering, a red and blue agent compete on collecting apples (in green) while they can also shoot at each other. When the apples are scarce, one of the agents becomes aggressive and constantly shoots towards the the apple to prevent the other agent from collecting it.
18. Self-driving cars
Everybody heard about this one, and now you can actually see them in action. In this video a Tesla electric vehicle drives without human intervention. Notice how it distinguishes different type of objects, including people and road signs.
Deep Learning is also heavily used in robotics these days. This is a field of itself which I won't get into, but at least two examples of my favorite robots by BostonDynamics: SpotMini and Atlas. The robots react to people pushing them around, they also get up when falling, and can even take care of pretty elaborate tasks that require gentle and care, like unloading a dish washer.
20. Try it yourself!
Several of Silicon Valley's most renown entrepreneurs recently launched a non-profit called OpenAI with the goal of democratizing AI and Deep Learning technology. They launched Universe, an open source platform that lets you test Deep Learning on hundreds of games and websites, so now you can train a Deep Learning network to play hundreds of different games by yourself!
21. Voice generation
Last year Google released WaveNet and Baidu released Deep Speech, both are Deep Learning networks that generated voice automatically. You may ask what's the big deal? Siri and Alexa can talk as well. To date, text2voice systems were not completely autonomous in the way they created new voices, they were (manually) trained to do so. The systems created today learn to mimic human voices by themselves and improve with time. When letting an audience try to differentiate them from a real human speaking, it is much harder to do so. While we are not there yet in terms of automatic voice generation, Deep Learning is taking us a step closer to giving computers the ability to speak like humans do.
If you like music, then it doesn't end here and in this work by Merlijn Blaauw and Jordi Bonada they even taught a Deep Learning network to sing!
22. Music composition
The same technology used for voice recognition can also be used to train a Deep Learning network to produce music compositions. Below is one example by Francesco Marchesani who trained the computer to compose music like my favorite classical composer Chopin. After the computer learns the patterns and statistics that are unique to the music of Chopin, it creates a completely new piece!
23. Restoring sound in videos
It almost sounds like it should not be possible to restore sound in muted videos, but remember there are people who can read other people's lips. In a work by Owens et al. a Deep Learning network was trained on videos in which people were hitting
and scratching objects with a drumstick. After several iterations learning, the scientists muted the video and asked the computer to regenerate the sound it expects to hear - and the results are impressive:
If this is not enough for you, how about making computers read lips? This is what LipNet can do, in a work by Oxford and Google DeepMind scientists. LipNet reached 93% success in reading people's lips where an average lipreader succeeds 52% of the time.
24. Transferring style from famous paintings
A 2016 paper by Gatys, Ecker and Bethge experimented with the following creative idea. Take your favorite work of art and let a Deep Learning network study the patterns in the strokes, colors, and shading. Plug into the network a new image and the network can transfer the style from the original artwork into your image.
The web is loaded with new creative ways of applying this technique in new ways. For example, @genekogan decided to go the other way around and applied style transfer to modify the Mona Lisa according to styles learned from Egyptian hieroglyphs, the Crab Nebula, and Google Maps. You can explore more of his and other artistic experiments.
The method of style transfer go far beyond art and can even be used for photography. In this paper by Luan et al they transformed photos of building, flowers and landscapes. The results you see below are stunning (more in this link), the photos are organized left to right (left=original, middle=style origin, right=result).
Want to try style transfer yourself? DeepArt.io create apps that use Deep Learning to learned hundreds of different styles which you can apply to your photos.
25. Automatically writing Wikipedia articles, math papers, computer code and even Shakespeare
Another architecture of Deep Learning is called Long Short-Term Memory (LSTM) and performs amazing well on textual input. In an appropriately titled blog post called "The Unreasonable Effectiveness of Recurrent Neural Networks" by Andrej Karpathy, Karpathy let a Deep Learning network "read" Shakespeare, Wikipedia, math papers and computer code. The results? The computer wrote like Shakespeare and also wrote Wikipedia-like articles (note that the yahoo link doesn't really exist an the computer "hallucinated" it). The computer was also capable of writing fake math papers, and even computer code! This is a computer program that writes computer programs. Note that the text, code and math the computer writes doesn't necessarily make sense all the time, but it is only reasonable to expect it will get there.
I keep showing the computer generating digital text or art, but today the computer can also handwrite. Alex Graves from the University of Toronto taught a computer to have its own handwriting in a wide variety of styles. Tap on the handwriting below to write your own text in whichever style you like.
Computer hallucinations, predictions and other wild things
27. Predicting demographics and election results
Gebru et al took 50 million Google Street View images and explored what a Deep Learning network can do with them. The results are outstanding, as the computer learned to localize and recognize cars. It detected over 22 million cars including their make, model, body type, and year. Why stop there? The model was actually able to predict the demographics of each area by the car makeup. There were many insights that are beyond the scope of this blog post, but just a cool association it found which is a fun example: "if the number of sedans encountered during a 15-minute drive through a city is higher than the number of pickup trucks, the city is likely to vote for a Democrat during the next Presidential election (88% chance); otherwise, it is likely to vote Republican (82%)."
28. Deep dreaming
This next example is going to mess your brain up, so my apologies in advance. In late 2015 Google researchers found a way to use Deep Learning to let the computer enhance features in images. This technique can be used in different ways, one of which is called Deep Dreaming, which lets the computer hallucinate on top of an existing photo. The scientists called it Deep Dreaming because the photos that are generated often resemble dreams.
For example in this photo the computer hallucinated structures and buildings on top of a mountain. The hallucination vary depending on what the neural network was exposed to before, and there are hundreds of examples online where the computer is dreaming animals, cars, people, buildings. Some of the Deep Dreams are actually Deep Nightmares and can be very disturbing.
YouTube is packed nowadays with videos of the computer Deep Dreaming Fear & Loathing in Las Vegas, Alice in Wonderland, imaginary cities, Vincent Van Gogh and even Donald Trump. But my two favorites and also potentially the wildest ones are the Pouff - a trip to the Grocery Store:
and this video about a Journey on the Deep Dream, which zooms further into other imaginative visions (play with sound).
29. AI invents and hacks its own crypto to avoid eavesdropping
Google Brain created two neural networks for security purposes, one that creates its own cryptographic algorithm to protect their messages and the other network is trying to crack it. The network performed very well at devising new crypto mechanisms but not as good at hacking them.
30. Deep Learning networks creating Deep Learning networks
Neural complete is a deep learning code that can generate new deep learning networks. It is not only written in Python, but also is trained on generating Python code. Super cool and saves time to other (lazy =)) Deep Learning developers.
31. Predicting earthquakes
Harvard scientists used Deep Learning to teach a computer to perform viscoelastic computations, these are the computations used in predictions of earthquakes. Until their paper, such computations were very computer intensive, but this application of Deep Learning improved calculation time by 50,000%. When it comes to earthquake calculation, timing is important and this improvement can be vital in saving life.
The future of AI
I hope this post excited you about the applications of Deep Learning and about its potential to help solving some of the problems humanity is facing. At the same time it is important to remember and respect the fact that every new technology brings with it potential dangers. AI safety is really a huge topic that deserves its own blog post that I will hopefully write in the future. For now, I would just like to mention that there are a lot of people collaborating to ensure AI is used in a way that will benefit humanity. I highly recommend following ventures like OpenAI, Partnership on AI, Allen Institute for Artificial Intelligence, as well as to be well aware of the concerns regarding AI safety as well as the optimistic vs. pessimistic views about it.
Feel free to join the discussion about AI and Deep Learning in the comments below.