Exclusive: The Tesla AutoPilot – An In-Depth Look At The Technology Behind the Engineering Marvel


A Brief Overview of DNNs and Autopilot Sensors

tesla feature 1I think a (very) basic introduction on general Autopilot technology and DNNs is in order. Those already familiar with these can skip this portion. Advanced Driver Assist Systems (ADAS) are becoming more and more prevalent in our cars - but most of them are hidden just out of sight. While some autonomous vehicles like Google's Self Driving car have obtrusive sensors on their roofs, not every semi-autonomous car is made the same.

Looking at the various approaches to achieving autopilot

An automobile car tech consists of three different components: the sensors, the hardware back-end and the software back-end. There are three broad categories of sensors, various types of processors and various types of software back-ends as well - and we will be focusing primarily on Tesla's approach.

Lets begin by comparing the different array of sensing devices: these are the RADAR/Ultrasonics, LIDAR and your average camera. All approaches have their own advantages and disadvantages. Until previously the LIDAR approach was the most popular one, albeit costly; but the trend has gradually begun shifting to a camera based approach for various reasons.

Lets start with the RADAR. This piece of equipment can easily detect cars and moving objects, but unfortunately it is unable to detect lanes or motion-less objects. This means that it is not very good at detecting pedestrians and stationary humans. It is a very good sensor to have as a redundant device - but not the ideal primary sensor.

The LIDAR can not detect lanes but is able to detect humans reasonably well – but comes at a much higher cost. The expensive piece of equipment has a large foot print and can break the bank for some price points. Models with high enough resolution to offer high reliability are usually even more expensive.

A Google Self-Driving car with a LIDAR mounted on top. @Google Public Domain.

The last (and latest ) approach on the other hand is the camera system. This is the primary sensor (in conjunction with a front facing Radar) used in Tesla vehicles. A camera system is your average wide angled camera equipped on the front or in a surround configuration on the car. Unlike a RADAR and LIDAR a camera sensing equipment is only as good as the software (the camera resolution matters but not as much as you would expect) processing the inputs - a primary component of which is a DNN. You can think of DNNs as the virtual "brain" on the chip - which interprets results from the camera and identifies lane markings, obstructions, animals and so on. A DNN is not only capable of doing pretty much everything a  RADAR and LIDAR can do but is also able to do much more - like read signs, detect traffic lights, road composition etc et all. We'll cover this in more depth in the later sections.

A short (high level) introduction to DNNs

Now lets talk about Deep Neural Networks or DNNs for short.  The way a neural network works is low level code, so I can only provide a very simplified explanation. Neural Networks were thought of first as a way to perfectly simulate the Human and Animal nervous system where a neuron fires for any object ‘recognized’. The reasoning went so: if we could replicate the trigger process with virtual ‘neurons’ we should be able to achieve ‘true’ machine learning and eventually even Artificial Intelligence. The first DNN was created by Google.

The project was called Google Brain and consisted of around 1000 Servers and some 2000 CPUs. It consumed 600 000 Watts of power (drops in the ocean that is server level power consumption) and cost 5 Million dollars to create. The project worked. The objective was successful. Within the course of a few days the A.I. learned to tell humans apart from cats. It did this by watching Youtube videos. For three days. The project was eventually shelved due to very high costs of scalability. Oh it worked, but it was too slow.

In more recent times, Nvidia managed to accomplish, what Google did, with just 3 servers. Each Server only had 4 GPUs running, thats 12 GPUs in total. It consumed 4000 Watts and  cost only 33, 000 Dollars. This is a setup that an amateur with deep pockets can recreate easily. Or an a low funded research lab. Basically you could now get Google Brain’s power 100 times cheaper with 100 times less power consumption, with the added benefit of scalability.

Slides courtesy of Nvidia. @Nvidia Public Domain

But how exactly does a DNN function? Well, the human brain recognizes objects through its edges, it doesn’t see pixels, it sees edges. A DNN tries to recreate how a Human Brain functions by programming it to only recognize edges. A ton of code is added and then begins the Unsupervised ‘Machine Learning’ time period. In this, the DNN is given material, which is either images or videos or data in any other form.

One by one, the virtual neurons are created, unsupervised and unprogrammed, that recognize a specific edge. When enough time has passed it can distinguish between whatever the DNN was told to look out for. The ‘intelligence’ of the DNN depends on its processing power and the time spent ‘learning’. Now that we have that out of the way, lets move on to the insides of the Tesla Model X and S.