Traditional approaches to autonomous vehicles (AVs) rely on using millions of miles of driving data in conjunction with even more miles of simulated data as inputs to supervised machine learning engines to create AI based perception, localization, path planning and vehicle control capabilities. Waymo (and prior to that its parent company, Google) has been doing this over the past decade, and reached these impressive milestones with billions of $ of investments. However, scalability is a significant issue – to date, in spite of these efforts, Waymo offers driverless ride hailing in limited number of markets like San Francisco, Phoenix, Austin and Los Angeles (all fair weather locations). Expanding to other markets will require more training miles and suck up even more financial resources and time.
Data intensive approaches are incredibly expensive and make it prohibitive for smaller players to innovate in the AoT™ (Autonomy of Things) arena. Alternate approaches to solve the scalability problem and levelling the playing field are required. For example, DARPA’s TIAMAT program (Transfer from Imprecise and Abstract Models to Autonomous Technologies) aims to learn and transfer autonomy capabilities from diverse platforms by leveraging their shared semantics or operational protocols (similar to how humans learn driving). The emphasis is on using small amounts of real data and larger amounts of simulated data to to locate corner cases and extract the underlying semantics to refine physics-based simulation models. This closes the SIM-REAL gap, making possible faster implementations of the driverless stack. TIAMAT’s goals are aggressive – aim for a rapid transfer of autonomy from simulation to reality within days rather than months or years with traditional approaches.
Supervised vs Unsupervised Learning
Supervised learning entails using labelled training data sets as inputs and outputs to a DNN (Deep Neural Network) which essentially produces a multi-dimensional curve fit for these data sets. The over-fitting naturally creates brittleness – if it encounters a situation that it has never seen before (like a corner case), it does not know how to react or reacts in unpredictable ways. Also, human curation of these data sets is required (hence “Supervised”), which is time consuming, error prone and expensive.
Unsupervised learning eliminates the need for human input in creation of the AI engine. It uses unlabeled data and derives the underlying semantics and patterns which are then used to make decisions. This is the approach followed by Helm.ai, a California-based AI software company that was established in 2016 and is focused on L3 (conditional autonomy) and L4 (full autonomy in a designated ODD or operational design domain) autonomous driving stacks. Waymo is apparently also on this bandwagon as discussed in this recent research article. This reinforces the non-scalability argument for supervised learning and use of large driving data sets.
Helm.ai Approach: Unsupervised Learning and Generative AI
Helm has raised $102M to date, has a staff of 100 people and is led by CEO Vlad Voroninski. Dr. Voroninski is a mathematician with a Ph.D. from The University of California at Berkeley and M.I.T. faculty experience (also an avid rock climber in his spare time!). He founded Helm in 2016 to focus on solving AV scalability issues using unsupervised learning and offers this summary: “Helm.ai has pioneered a highly efficient approach to unsupervised machine learning called Deep Teaching™, which we used to develop the world’s first foundation model for semantic segmentation in 2017. In recent years, we have combined Deep Teaching™ with innovation in generative AI architectures to create a form of generative AI with improved scaling laws, achieving greater accuracy per dollar and enabling higher-fidelity AI-based simulation for autonomous driving”.
There are 2 major components to Helm’s proposition. The first is the use of generative AI techniques to synthesize life-like simulations of sensor data, traffic and pedestrian flow, road infrastructure and obstacles (other vehicles or stationary objects). Figure 1 shows examples of AI generated sensor data and path planning.
Helm’s WorldGen-1 software synthesizes sensor and perception data and predicts the behavior of the ego-vehicle and other agents in the driving environment. The use of generative AI enables simulation of realistic data sets and makes it more efficient in terms of capturing corner cases that are difficult to encounter by physically driving a fleet of cars across different geographies and traffic conditions. It also allows the AI system to train on “relevant” data rather than huge amounts of data collected during driving, most of which is irrelevant or duplicative.
The second piece of Helm’s proposition is Deep Teaching™, a highly efficient unsupervised training technology that relies on Helm’s proprietary math and compressive sensing-based algorithms. WorldGen-1 is trained via these algorithms on thousands of hours of diverse driving data, covering every layer of the autonomous driving stack including vision, perception, lidar, and odometry. This allows it to predict (based on simulated sensor data) the behaviors of pedestrians, vehicles, and the ego-vehicle in relation to the surrounding environment. In essence, it can predict multiple minutes worth of temporal sequences for a given traffic situation. Different scenarios can be simulated along with corresponding path planning and control actions for the ego-vehicle. Adapting learnings from one geography (terrain, driving behavior, traffic laws, weather) to another is also much faster and resource efficient.
Helm currently works with multiple car manufacturers who do not have large fleets of cars equipped with sensors to validate L3 and L4 autonomy. The business model is to license the WorldGen-1 software and work with the car companies to customize it for their particular sensor and compute stack. Honda is a key flagship customer and recently announced the launch of L3 autonomy capability as part of its Honda O initiative: “original Honda AI technology that combines the unsupervised learning technology of the U.S.-based Helm.ai and the behavior models of experienced drivers, which enable AI to learn with smaller amounts of data, and provide highly accurate driver assistance”.
The autonomy revolution is progressing. As with any movement of this scale, there are varied approaches by entrenched players with enormous financial resources as well as leaner and more nimble players passionate about scaling, deployment speed and resource efficiency.