Using Extended Kalman Filters to Estimate Battery SoH on a 2G Budget

Graph showing battery state-of-health curves over discharge cycles, dark background

Why Not Just Use the BMS Reading

Every lithium battery pack has a battery management system that reports state-of-charge — a percentage indicating how much energy remains. Most fleet operators read this number and assume it's accurate. It isn't, or at least it isn't after a pack has been through several hundred cycles of real-world use in challenging conditions.

BMS SOC estimation uses one of two methods: Coulomb counting (integrating current over time from a known starting charge state) or open-circuit voltage lookup (mapping measured voltage to SOC from a characterization curve). Both methods accumulate error. Coulomb counting drifts because current sensors have small calibration offsets that integrate over thousands of cycles. Voltage lookup degrades because the SOC-to-voltage curve itself changes as a pack ages — a 60% SOC reading from a 500-cycle pack corresponds to a different actual energy reserve than a 60% reading from a 50-cycle pack with the same rated capacity.

The result is that BMS-reported SOC on aged packs in our monitoring fleet has an average error of ±9.4% when compared against load-test ground truth. For a 60Wh pack, ±9.4% is a ±5.6Wh error — enough to mean a driver runs out of battery 4–6 km earlier than expected. The BMS telling a driver they have "38% remaining" when they actually have 29% is the direct cause of mid-route cutouts.

What We Need: State-of-Health, Not Just State-of-Charge

State-of-charge tells you how much energy is in the pack right now. State-of-health tells you how much energy the pack can hold compared to when it was new. A pack at 90% SOC with a 0.70 SoH has 90% of 70% of its original capacity available — 63% of original rated energy. That distinction matters enormously for route planning and for predicting when the pack needs replacement.

Estimating SoH accurately requires observing the pack's behavior over time, not just reading its current voltage. A pack's SoH manifests in three observable ways: reduced maximum capacity (can't hold as much charge as it once could), increased internal resistance (voltage drops faster under load), and degraded power delivery (can't sustain high current draw without significant voltage sag). None of these are directly reported by a standard BMS — they have to be inferred from the telemetry data.

The SEM-1's onboard SoH estimation uses a dual Extended Kalman Filter approach: one EKF estimates instantaneous SOC correcting for BMS drift, and a second slower EKF uses the SOC estimates from individual discharge events to build an evolving model of the pack's SoH over time. Here's how that works in practice.

The Battery Equivalent Circuit Model

To run an EKF, you need a system model — a mathematical description of how battery state evolves given inputs (current, temperature) and how that state maps to observable outputs (terminal voltage). We use the second-order RC equivalent circuit model, which represents a battery cell as an ideal voltage source (open-circuit voltage, a function of SOC) plus two RC pairs (representing the electrochemical diffusion dynamics at two different time constants) plus a series resistance.

The model has five parameters per temperature operating point: series resistance R0, and two pairs of R1/C1 and R2/C2. For LFP chemistry at 25°C, typical values are R0 ≈ 15mΩ, R1 ≈ 8mΩ, C1 ≈ 2000F, R2 ≈ 5mΩ, C2 ≈ 20000F. At higher temperatures, resistance values decrease (better conductivity) but recovery dynamics change. At lower temperatures, resistance increases significantly — relevant for Nairobi nights in the rainy season when ambient temperature can drop to 12°C.

The EKF state vector contains SOC plus the two RC pair voltages (V1, V2), making it a 3-dimensional state. The observation is terminal voltage — measured directly by the TI BQ76952 monitor IC in the SEM-1 every 100ms. The Kalman gain computation involves a 3x3 matrix inverse at each time step, which is the most computationally expensive operation in the algorithm.

Making It Run on 256KB RAM

The ARM Cortex-M4F in the SEM-1 has 256KB of RAM and runs at 120MHz. It shares that RAM with the operating system (FreeRTOS), the modem driver, the BLE stack, the telemetry ring buffer (described in the offline-first post), and the various other tasks running on the device. The EKF algorithm gets approximately 48KB of working memory.

The 3x3 matrix operations in the EKF are implementable in fixed-point arithmetic using 32-bit integers with Q16.16 format (16 integer bits, 16 fractional bits). Fixed-point avoids the overhead of the FPU context switch on every matrix operation and reduces RAM usage compared to double-precision floating point. The Cortex-M4F has a hardware FPU for single-precision, so we use single-precision float for temperature-dependent parameter lookups (where the range of values requires the dynamic range of floating point) and fixed-point for the state estimation matrices.

The EKF runs at 1Hz (one state update per second), not at the 10Hz rate at which the BQ76952 samples voltage. We downsample the voltage measurements to 1Hz using a median filter over 10 consecutive samples. The median filter also reduces the impact of brief voltage spikes caused by high-current motor controller switching transients, which show up as single-sample outliers in the raw voltage stream and would corrupt EKF state estimates if not filtered.

The Outer EKF: Tracking SoH Over Cycles

The inner EKF estimates SOC within a single discharge event. The outer EKF — which runs at the timescale of complete discharge cycles rather than individual seconds — estimates SoH by tracking how the inner EKF's model parameters need to be adjusted over time to explain observed terminal voltages.

As a pack ages, its series resistance R0 increases. This increase is gradual — roughly 0.2mΩ per 100 cycles for LFP chemistry in normal operating conditions, faster under thermal stress. The outer EKF tracks R0 as a state variable and updates its estimate at the end of each discharge event using the residual between the inner EKF's predicted terminal voltage and the actual terminal voltage across the discharge curve.

The outer EKF's SoH estimate converges after approximately 15–20 observed discharge cycles per pack. During the first 15 cycles, the SoH estimate uses the population baseline (chemistry-specific aging curves from our training dataset) with high uncertainty. After 15 cycles, the pack-specific R0 trajectory is established, and the uncertainty bounds narrow significantly. This is why the SEM-1 installation guide recommends 15–20 monitoring cycles (typically 5–7 days of operation) before treating the SoH estimate as actionable for replacement decisions.

Accuracy Against Load-Test Ground Truth

We validated the dual-EKF SoH estimates against load test results on 47 packs across the pilot fleet. Load testing is the ground truth method: a pack is fully charged, then discharged at a fixed rate (C/5 for capacity measurement) while measuring total energy delivered until the pack voltage drops to the BMS cutoff voltage. The ratio of measured energy to rated energy is the true SoH.

Against this ground truth, the dual-EKF system achieves a mean absolute error of 4.1% for packs with more than 20 observed cycles, and 7.8% for packs in the 10–19 observed cycle range. The ±4.1% MAE for converged estimates is a meaningful improvement over BMS-reported SOC error of ±9.4%, and it's measuring a different (more useful) quantity: SoH captures the pack's degradation state, not just its current charge level.

The cases where the EKF estimate diverged significantly from load-test truth were almost always associated with abnormal operating events: a pack that had been externally shorted at some point in its history (externally visible damage on the case, not a normal degradation event) and two packs that had been deep-discharged below 2.5V per cell, which causes irreversible lithium plating that our model doesn't capture. Both of those failure modes are identifiable by the SEM-1 event log — which flags sub-cutoff-voltage excursions as fault events — so the EKF uncertainty can be flagged and elevated for affected packs.

Filed under: Engineering, Battery Technology · Back to Blog