Rohit Dwivedi

Introduction: Chasing AI Ghosts in the Glucose Machine

Let’s get one thing straight: managing glucose isn’t chess. In chess, you have perfect information. This is poker. You’re playing against a deck stacked with biology, behavior, and chaos. The latest paper from Cichosz et al., “Personalized Forecasting of Glycemic Control”, is a perfect example of academics trying to play chess with a poker hand.

Their goal was to take four machine learning models, CatBoost, XGBoost, AutoGluon, and tabPFN, and use them to forecast a person’s week-ahead glucose metrics from their continuous glucose monitor (CGM) data. It’s a well-intentioned, mathematically sound effort. It’s also a complete dead end. Chasing predictive accuracy on a chaotic biological signal is like trying to predict the exact shape of a wave a minute before it hits the shore. You’re forecasting the symptom, not the storm.

Research NoteFor those who enjoy the technical details...

The root cause of type 2 diabetes isn’t unpredictable glucose; it’s insulin resistance. The future isn’t in building ever-more-complex models to forecast the symptoms of a disease. It’s in delivering causal cures that reverse the disease itself.

— Rohit Dwivedi

This isn’t an academic critique. This is a breakdown from the trenches: an entrepreneur’s look at the business reality, the biological truth, and the only scalable path forward.

The Business Reality Check: Burning Compute for Crumbs

The paper’s own conclusion is that the fancier models, AutoGluon and the transformer-based tabPFN, offer “modest accuracy gains.” Let’s translate “modest” from academic-speak into dollars and sense: it’s a terrible return on investment.

When you’re building a product to serve millions, inference time (how fast the model makes a prediction) is money. It’s server cost, it’s energy, it’s your entire operational budget. I’ve put their numbers in a table, using the Mean Absolute Error for Time-in-Range (T1DM patients) as our accuracy benchmark, so the business malpractice is crystal clear.

Performance vs. Price: A Terrible ROI

Model	Accuracy (Lower is Better)	Inference Time (per 1,000 cases)
tabPFN	6.72	699 seconds
AutoGluon	6.77	2.7 seconds
CatBoost	6.81	0.04 seconds
XGBoost	6.91	0.04 seconds

Let’s do the math. To get a clinically meaningless improvement in Mean Absolute Error from 6.91 to 6.72, you have to use tabPFN, a model that is over 17,000 times slower than XGBoost. Deploying that would mean spending millions in compute costs to achieve an accuracy gain that no patient or doctor would ever notice. This isn’t a marginal trade-off; it’s the difference between a real-time system that can serve millions of users and a research project that can serve one. This is the kind of “optimization” that looks good in a paper and bankrupts a company in the real world. It’s a classic case of academic metrics failing a basic business reality check.

The Bio-Tech Mismatch: Predicting a Hurricane with a Barometer

The deeper flaw here is philosophical. The models are trying to predict the behavior of a hurricane using only a barometer. They take a week of CGM data, a lagging indicator, and try to guess the next week’s data.

The authors even admit the fatal flaw in their “limitations” section, stating the feature set “omitted contextual predictors with potential additional predictive value (for example, precise meal timing, unlogged insulin changes, physical activity, or acute illness).” This isn’t a limitation; it’s the whole game. You cannot predict a biological system by ignoring its primary inputs.

The paper’s own data proves this catastrophic failure. Look at the error rates for Time Below Range (TBR), or hypoglycemia. This is the single most dangerous metric for a person with diabetes. And it’s where the models are the most blind.

T1DM Patients: The mean MARD for predicting Time Below Range (TBR) was approximately 48%.
T2DM Patients: The mean MARD for predicting Time Below Range (TBR) was a staggering 78%.

Being wrong by 78% when predicting if someone is about to have a life-threatening low is not a “modest gain.” It’s a total system failure.

The models are effectively guessing precisely when patients are in the most danger. This is the predictable, inevitable outcome when you ignore biology and chase correlation.

My Scalability Verdict: A Fleet of Lab Toys

Based on the paper’s own findings, here’s my verdict on whether these models could ever leave the lab and help a real person.

Verdict: Red (Lab Toy) - TabPFN & AutoGluon The paper itself points to their “markedly higher computational cost and slower inference.” These models are academically interesting but commercially useless for this problem. No sane company would deploy a system with these economics for gains described as “small to medium.” They’re destined to remain on a researcher’s laptop.

Verdict: Yellow (Pilot Purgatory) - CatBoost & XGBoost They’re fast, I’ll give them that. But speed doesn’t fix a broken premise. They are still built on the wrong foundation: forecasting a symptom. They might perform okay in the clean, “internal validation” world of a curated dataset. But that “unlogged insulin change” or “acute illness” the paper admits it ignores? That’s not an edge case; that’s Tuesday for a real patient. These models aren’t just brittle; they’re designed to fail as soon as they leave the lab.

The Pivot: Stop Forecasting, Start Reversing

This is where we, as an industry, need a hard pivot. The goal should not be to get 5% better at predicting hyperglycemia. The goal must be to eliminate the root cause of that hyperglycemia: insulin resistance.

Forecasting is a defensive game of symptom management. We are focused on an offensive game of disease reversal. Instead of feeding an AI a stream of lagging indicators, we build a causal model of an individual’s metabolism: AI that understands how a specific person responds to fat, protein, and stress, not just their last 7 days of glucose data. We use AI to understand the why behind their numbers and deliver interventions that fix the underlying biological dysfunction.

Our approach at Sterlites AI has already shown what’s possible. In a recent non-clinical AI pilot with 6 willing acquaintances under medical consultation, we saw an average 12.7% drop in HbA1c, an outcome that forecasting can’t even dream of. We didn’t predict their disease; we guided them toward reversing it.

It’s time for the AI and diabetes community to mature. We must stop celebrating marginal improvements in predictive accuracy on flawed premises and start demanding AI that understands and reverses the underlying disease. We need to stop burning money on better forecasts and start investing in scalable cures. It’s time to build AI that delivers cures, not just correlations.