Amey Banarse, VP of Solutions Engineering at YugabyteDB.
In the current landscape of pervasive connectivity, data has become an indispensable asset for enterprises, particularly within the automotive sector. According to a 2019 McKinsey & Company report, connected cars are projected to generate up to $750 billion annually by 2030, highlighting the immense potential of leveraging real-time data in the automotive sector. Real-time data generation allows organizations to derive actionable insights, enhance user experiences and create new revenue streams.
However, the automotive industry faces significant challenges, including the need for real-time telemetry, predictive analytics for maintenance and seamless integration of heterogeneous data sources. This transformation of data from passive storage to dynamic, real-time streams has led to the emergence of connected vehicle platforms that integrate advanced telematics, predictive maintenance and intelligent mobility solutions—thus offering unprecedented opportunities for innovation.
I’ll highlight the experiences of building a modern data fabric for connected vehicles that effectively addresses challenges related to data integration, supports high-throughput streaming, and provides the scalability and resilience essential for large-scale automotive enterprises.
Essential Attributes Of A Connected Vehicle Platform
A robust platform relies extensively on streaming and event-driven data generated by vehicle sensors, onboard diagnostics (OBDII), IoT devices and other sources. This data captures critical metrics such as diagnostics, geolocation and environmental conditions.
The architecture of a modern platform necessitates a distributed, horizontally scalable and resilient data foundation leveraging technologies such as cloud-native databases and container orchestration to manage large-scale event streams and ensure continuous data availability.
Key components of the platform include:
• Event Sources: Connected vehicles and IoT sensors act as event sources, continuously generating data streams.
• Data Ingestion And Processing: Messaging systems are used for ingestion, processing, filtering, enrichment and data aggregation. For instance, raw telemetry data can be enriched with geospatial information or aggregated to derive fleet performance insights.
• Persistent Storage: Distributed SQL databases store both raw event data and processed insights, ensuring data consistency, scalability and fault tolerance.
• Data Consumers: Mobile applications, third-party services and analytics systems use this processed data for various purposes, including predictive maintenance, telematics and user-centric services.
The platform architecture must support high write throughput and data accuracy to meet industry demands. Additionally, it must ensure scalability and resilience to handle large-scale event streams effectively. Ensuring stateful data resides within robust distributed databases, while applications operate on stateless layers, is crucial for the success of connected vehicle deployments.
Challenges In Building Streaming Data Systems
Architecting a resilient streaming data platform involves numerous challenges, especially when legacy databases can’t handle the demands of modern data environments. Popular technologies often fall short in addressing the critical requirements of write throughput, scalability, data consistency and fault tolerance—all of which are essential for a real-time and seamless user experience.
An effective streaming data system must be capable of managing millions of events per second, maintaining strong consistency and availability, and providing robust fault tolerance. Any system failure could result in critical data loss, which is unacceptable in safety-critical applications.
Building A Modern Data Fabric For Connected Vehicles
Technology leaders must start by laying the right internal foundation. Here are some actionable steps to consider:
1. Assess Current Infrastructure And Personnel
Look for gaps in data handling capabilities such as storage limitations, scalability constraints and integration challenges.
Additionally, identify whether current teams have expertise in distributed data architectures, real-time processing and cloud technologies. Organizations can offer internal training programs or form partnerships with educational institutions to ensure the workforce is equipped to manage distributed data platforms.
2. Establish Cross-Functional Teams
These teams should involve IT, data engineering, operations and business units to ensure alignment on data strategy. Effective communication across these teams is crucial for implementing a cohesive data fabric.
3. Define Data Governance Policies
It’s important to develop comprehensive guidelines for data security, privacy and compliance. Given the complexity of connected vehicle ecosystems, a clear data governance framework ensures data remains secure and accessible. This includes ensuring compliance with regulations, data privacy and defining access controls.
4. Pilot Projects And Phased Implementation
Start with smaller proof-of-concept projects to validate the chosen data fabric solution. When researching distributed data platforms, organizations should prioritize features like scalability, resilience, multi-region support and ecosystem integration.
Avoid solutions that have single points of failure, have vendor or cloud platform lock-in, require extensive manual tuning, or do not guarantee regional resilience.
Real-World Automotive Applications
A leading global automotive manufacturer employs a modern data fabric for managing data from over 20 million connected vehicles. The architecture leverages Kafka for streaming data ingestion, Spark Streaming for real-time data processing and our distributed database as the system of record. This enables seamless telematics, real-time monitoring and long-term data retention, powering a wide array of vehicle services ranging from health diagnostics to remote control functions.
The use of streaming data within these vehicles enables a range of industry-specific applications:
• Predictive Maintenance: Real-time analysis of vehicle health data facilitates the early identification of potential issues, allowing for proactive maintenance and the reduction of unplanned downtime.
• Intelligent Mobility: Leveraging data analytics enhances navigation, optimizes route planning and reduces fuel consumption, contributing to a more intelligent and efficient mobility ecosystem.
• Telematics And Usage-Based Insurance: Continuous data collection enables personalized insurance offerings based on driving behavior, improving risk assessment for insurers and providing cost-effective premiums for consumers.
Summary
Modern connected vehicles necessitate a data architecture capable of managing the complexity inherent in real-time data streaming and analysis. A 2018 Gartner Inc. report predicted that by 2025, 75% of all enterprise-generated data would be created and processed at the edge, emphasizing the need for distributed data platforms in managing connected vehicle ecosystems.
By implementing an effective data fabric, automotive companies can significantly enhance safety, improve user experiences and capitalize on new business opportunities.
Forbes Technology Council is an invitation-only community for world-class CIOs, CTOs and technology executives. Do I qualify?