Meet AMIE: An AI That Manages Patients Over Time — and What It Means for ObGyn
Google’s medical AI matched 21 doctors across 100 cases over three visits each — but the doctors were judged against guidelines they do not use and limited to a text box. Here is what the study shows
A patient comes back for her second visit. Her symptoms have shifted. Her lab results are in. A good doctor remembers what happened last time, reads the new numbers, and adjusts the plan. That thinking over time is the hard part of medicine. The one-shot diagnosis is the easy part.
A new study in Nature asked a simple question: can an artificial intelligence do the hard part?
AMIE stands for Articulate Medical Intelligence Explorer.
It is a medical AI system built by Google, running on their Gemini model. Earlier work showed it could take a patient history and suggest a diagnosis. This new paper pushes further, into what doctors call management — deciding what tests to order, what to prescribe, what to watch, and how to change course as a patient returns over weeks.
The design is clever, and worth understanding in plain terms. AMIE is built from two parts that work like the two speeds of human thinking.
One part is a fast talker. It chats with the patient, sounds warm, and keeps track of the conversation.
The other part is a slow thinker.
While the conversation continues, it quietly reads through hundreds of pages of clinical guidelines and writes a careful plan, with every recommendation tied to the exact guideline it came from. The idea comes straight from Daniel Kahneman’s Thinking, Fast and Slow: quick intuition on the surface, slow reasoning underneath.
To test it, the researchers ran a blinded, randomized exam. They wrote 100 made-up cases, each unfolding over three visits a day or two apart. The cases spanned five specialties, and obstetrics and gynecology made up 20 of the 100. Trained actors played the patients. For each case, the same patient was seen once by AMIE and once by one of 21 primary care doctors, all by text chat. Specialists then graded both, without knowing which one was the machine.
AMIE did at least as well as the doctors on overall management, and clearly better on two things. First, it was more precise. Instead of “prescribe an antibiotic,” it named the drug, the dose, the route, and the duration. On treatment precision it scored around 95 out of 100; the doctors scored in the 60s. Second, it stuck closer to the guidelines and showed its work, citing the source for each recommendation. When graders did prefer one over the other, they picked AMIE far more often than the doctor — though in about half the cases they saw no real difference at all.
Here is the part the headlines will skip. The doctors were set up to lose. They came from Canada and India but were judged against United Kingdom guidelines they do not use every day. They worked through a text box, not the phone or video they normally use. The cases had clean, correct answers, which real patients rarely provide. And the visits were one or two days apart, not the weeks the cases described — so human memory was tested in a way that does not match real care. AMIE, meanwhile, was built and tuned for exactly this setup. This is what I call the missing comparator. When you compare a system to people working with one hand tied, “the AI won” is not the finding. The finding is narrower, and still useful: the AI is very good at reading guidelines and writing precise plans.
Why should an ObGyn care? One in five of these cases was ours. The things AMIE did best — remembering the last visit, citing the right guideline, writing a precise plan — are exactly the things that get sloppy on a busy, tired labor floor. A tool that never forgets and always cites the source has real value there. But the things AMIE cannot do are also ours: the woman in front of you whose values, fears, and body do not match the guideline. The study’s own authors say plainly that AMIE is not ready for real patients. They are right.
My take. This is careful, honest work, and I do not say that lightly about an AI paper. The two-speed design is smart, grounding the plan in guidelines is the right instinct, and the authors were more open about their limits than most. But “AI matches doctors” is not what this shows. It shows that AI beats a distracted human at the one thing computers have always been better at: holding everything in memory and quoting the manual exactly. That is useful. It is not judgment. For ObGyn the lesson is neither fear nor worship. Let the machine carry the memory and the guideline lookup, and keep the thinking about the specific woman where it belongs — with you. The real risk is not that AMIE replaces us. It is that we stop practicing the parts it does well and forget how to do them ourselves.
One footnote, for the attentive reader. The study was funded by Google, and the authors may own Google stock. I mention this not as an accusation but as arithmetic.
AMIE is real, it is impressive, and it is a research demonstration, not a doctor. Read the result, not the headline.
References
1. Liévin V, Palepu A, Weng W-H, Saab K, Stutz D, Cheng Y, et al. Towards Conversational AI for Disease Management. Nature. 2026. doi:10.1038/s41586-026-10764-5.
2. Tu T, et al. Towards conversational diagnostic artificial intelligence. Nature. 2025;642:442–450.



