Self-driving cars and humanoid robots that can walk, talk and work alongside us are just two of the amazing ways in which AI promises to change the world in the near future.

But in order to be able to operate safely and effectively, these physical AI tools and applications have to be able to understand the world.

At this year’s Consumer Electronics Show in Las Vegas, NVidia just announced the launch of its Cosmos platform, designed to accelerate the development of physical AI systems.

Described as a “ChatGPT moment for robotics,” Cosmos is capable of generating huge amounts of synthetic data. This is data that, despite being artificially created, is close enough to the real world that robots, self-driving cars, and other physical AI algorithms should be able to learn from it.

However, some people believe that no amount of synthetic data will ever be able to fully simulate every real-world scenario that machines will need to be prepared for. This is why Tesla, for example, has spent many years collecting real-world data with its sensor-packed cars. CEO Elon Musk tweeted, “Two sources of data scale infinitely: synthetic data, which has an ‘is it true?’ problem and real-world video, which does not.”

The argument is that synthetic data lacks the chaotic unpredictability and complexity of the real world and that this is essential for building comprehensive and safe AI systems. Let’s look into this in a bit more detail.

Synthetic Vs Real-World Data

In autonomous driving systems, visual data (pictures) are used to train algorithms that determine how vehicles will react to different conditions and situations on the road. This data can be captured with cameras attached to vehicles (real-world data). It can also be generated by AI algorithms according to rules learned from studying real-world data (synthetic data).

There are advantages and disadvantages to both methods.

Synthetic data can often be collected much more quickly and cost-effectively than real-world data. No one has to actually go out and gather it – it is simply generated by machines.

This can also have safety benefits. Testing self-driving cars on roads, for example, clearly comes with some element of risk, which can be eliminated if journeys are simply simulated.

Situations, environments and many other variables can also be customized, rather than having to wait for the ideal circumstances to gather data to emerge in the real world. For example, researchers can simulate rare weather events, test autonomous vehicles in dangerous scenarios, or model complex manufacturing defects without real-world risks or delays.

Additionally, generating synthetic data can also reduce or eliminate concerns around privacy and data protection that might apply in the real world, as there’s no danger of sensitive personal data inadvertently being stored or compromised.

This could happen when collecting real-world data. Car license plates captured on camera by autonomous cars could conceivably be connected to their owner and used to identify and track them, for example.

Real-world data, on the other hand, as Musk points out, has the undeniable advantage of being more authentic. Chaotic and hard-to-predict human behaviors that are difficult to generate synthetically are more likely to be accounted for in the data.

Regulation could also be an issue. Laws around AI are evolving quickly, and it may be the case that regulators require certain models or applications to be trained on real-world data at some point in time or in some jurisdictions for safety reasons.

Weighing The Options

In truth, both real-world and synthetic data are likely to be vitally important to training the upcoming generation of physical AI vehicles and robots.

Both offer distinct advantages and challenges and adopting a hybrid approach is likely to be the best path to success.

The trick will be identifying which is most appropriate for specific use cases. For example, it’s possible that synthetic data will be more useful for tasks or applications involving the processing of sensitive information or operating in dangerous conditions.

Real-world data, on the other hand, might be best when it comes to capturing dynamic human behaviors, or there is a likelihood of encountering chaotic unforeseen events.

This means that AI projects that adopt a balanced approach, led by those who understand how synthetic and real-world information can complement rather than compete with each other, are more likely to create real business value.

Share.

Leave A Reply

Exit mobile version