Assessing car health from OBD dataset
The central promise of Zoom’s car rental offering is to provide our customers a great ride. Having a car in good working condition is critical for us to fulfil that promise. This poses a very interesting problem from a maintenance scheduling perspective. Unlike a privately owned car, our cars are driven by several different people each day. From a young college student going for a joyride, to a well planned 5 day road-trip. Each journey exhibits different driving patterns and puts unique stresses on the car.
So how do we figure out which of our cars are doing well? One obvious way is to get a mechanic look at the cars at the end of each trip and give a basic tune-up. But this becomes unscalable when you have a fleet of over 2,000 cars completing thousands of trips per day. What if we can use the the data from the onboard diagnostic device (OBD) to assess performance outliers?
Enter the OBD
All Zoom vehicles are kitted out with an OBD device that funnels back data to us across several parameters such as location, engine parameters, vehicle speed, etc. This is a unique dataset that allows us to identify a ‘performance signature’ for each car on the platform. We conferred with our in-house car-experts to understand if there are any driving patterns that may be indicative of good/bad car performance. We narrowed the field down to detecting a few specific behaviours:
- More time spent hard accelerating/ hard decelerating = more vehicle deterioration
- More time spent idling (engine speed>0, vehicle speed = 0) = more vehicle deterioration
- Less time changing gears = less time riding the clutch = better clutch health
Detecting hard acceleration and idling
What you see below is a small snapshot of how the vehicle was driven over a month. Notice the sharp spikes up and down in the vehicle velocity over time. This is what we need to capture.
So how do we do this? Let’s go back to high-school physics:
But we don’t need just acceleration, we need harsh acceleration, so let’s take another derivative of the acceleration.
Since we’re dealing with discrete data (timestamped entries for engine speed and vehicle speed), we just need to take a successive ratio.
Viola! We now have the jerks that the vehicle experienced throughout the drive. Assuming this is normally distributed, any jerks above 2 sigma is hard acceleration and below 2 sigma is harsh deceleration.
Idling is a far simple problem to solve - all the events with engine speed >0 and vehicle speed = 0 can be assumed to be when the car was idling. Some idling is expected (e.g. waiting at traffic signals) but a lot of idling indicates people using car electronics (e.g A/C) while not moving - this may adversely impact the battery and should be kept in mind.
This is how the summarized driving behaviour looks like:
Gears of rawr!
The OBD port is a wonderful little development in modern vehicles. But this data is non-standardized and varies from OEM to OEM. Out of the several things we get to know about the vehicle, the one thing we don’t get explicitly is the gear that is being used while driving. But we do know that a change in gear alters the relationship between engine RPM (engine speed) and vehicle speed (roughly same RPM, higher gear = higher speed). Let’s plot this and see if we see any structure in the data.
Awesome! It appears that there are linear relationships being formed for some values. If we ignore the intercept, we can see that there are 5 lines being formed at a (nearly) constant slope. Guess how many gears does a typical car have? So let’s take a ratio of vehicle_speed to engine_speed and see how that looks.
Lovely! Now only if we can automatically find breaking points so that we can split this continuous graph into individual gears and the transient periods in-between. To solve this problem, we looked at the super nifty kernal density estimation to plot out the driving patterns.
If we split the above graph at all the local minimas (highlighted in red) we have a 5 partition split - i.e. our 5 gears. But how about the quality of gear shift? We can assume that a well driven car will have very low variance in the slope i.e. for a given detected gear, the gini coefficient will be low for a well driven car.
This is what the detected gear chart looks like along with the Gini Coefficients
We can see that for this car, people had higher variance in lower gears, but higher gears were a lot smoother. Also, given how much of the ride time was spent in which gear, the weighted average of ‘gear quality’ is about 0.02.
Putting it all together
We now have three objective measures of ride quality:
- What % of the trip was spent during hard acceleration/ deceleration?
- What % of trip was spent idling
- What was the ‘quality’ of gear shifts
The three measures allow us to compare the performance of different cars and assess which ones need our help most urgently. So in case somebody asks you, “How am i driving?”, remember, you now have a mathsy answer to that question!