Talk to AI Like a Consultant, Not Like Google

Jul 05, 2026

In a randomized trial, doctors using GPT-4 did no better than doctors without it. The AI alone beat both groups. The machine was not the problem. The questions were.

A colleague stopped me at a meeting last month.

He had tried one of the AI chatbots.

He typed in "preeclampsia management," got back something that read like a textbook page, and closed the tab.

"Overrated," he said. "I'll stick with UpToDate."

He is a superb clinician. And he had just done something he would never do in his own hospital: he asked for a consult without giving a history.

Let me define the term everyone uses and nobody explains.

A "prompt" is simply the question you type into an AI tool like ChatGPT or Claude. That is all it is. There is nothing technical about it. No coding, no settings, no manual. But how you ask determines almost everything about what you get back. And most doctors are asking the way they type into Google: two or three keywords, then judgment.

The evidence on this is uncomfortable. In 2024, researchers at Stanford, Beth Israel Deaconess, and the University of Virginia ran a randomized trial with 50 physicians working through difficult diagnostic cases. Half got GPT-4 plus their usual resources. Half got the usual resources alone. The doctors with the AI scored 76 out of 100. The doctors without it scored 74. No meaningful difference. Here is the finding that should keep us up at night: the AI by itself, with no doctor attached, scored about 16 points higher than the doctors using their standard tools.

The machine was not the weak link.

The way the doctors talked to it was.

Most typed a few words, took the first answer, and moved on. A follow-up trial by the same group, this time with physicians who got basic instruction in how to work with the tool, showed the opposite result: doctors using the AI scored about 6 points higher than doctors without it. Same technology. Different conversation.

I published the first paper on ChatGPT in obstetrics and gynecology in a major journal, in March 2023. I have spent the three years since then using these tools daily. What I have learned is that talking to AI well is not a technical skill. It is a clinical skill you already have. It is history-taking, run in reverse. Instead of extracting the history from the patient, you give the history to the machine.

Think about the difference between a lazy curbside and a real consult.

"Thirty-nine weeker, induce or not?" gets you a shrug. A real consult includes the parity, the cervix, the comorbidities, the patient's wishes, and the question you actually need answered.

AI works exactly the same way. The quality of the answer tracks the quality of the history you provide. Garbage in, garbage out is not a computer science principle. It is a triage principle.

So here is how to talk to it, in plain language, no technicalities required.

First, say who you are and who the answer is for. "I am an obstetrician" produces a different answer than no introduction at all. "Explain this so I can discuss it with a colleague" is different from "explain this so I can counsel a frightened patient at an eighth-grade level."

The tool cannot see your diploma. Tell it.

Second, give it the goal, not just the task. Do not ask "summarize this paper."

Say why: "I need to decide whether this study should change how I counsel patients about induction at 39 weeks. Tell me whether the evidence supports that." When the tool knows what decision you face, it stops writing book reports and starts helping you think.

Third, and this is the single most useful sentence I can give you: end your question with "Before you answer, ask me everything you need to know to get this right."

This one line turns a monologue into a consult. The AI will come back with questions, often surprisingly good ones, about the details you forgot to mention. You answer, and then it answers. Every clinician reading this has watched a good consultant do exactly that. Now the consultant is available at 2 a.m. and does not sigh.

Fourth, push back the way you would with a resident.

The first answer is a draft, not a verdict. Ask: "What is the strongest argument against what you just said?"

Ask: "What am I missing?"

Ask: "How certain are you, and where is the evidence weakest?"

These tools are agreeable by nature. They will tell you what you want to hear unless you order them not to. You would never accept a resident's first differential without a single question. Extend the machine the same courtesy.

Fifth, verify anything you would stake your name on.

AI tools can invent references that look flawless: real journal, real authors, plausible title, fabricated paper. Treat the tool like a brilliant new fellow who has read everything and still needs supervision. The fellow does the memory work. You do the judgment. That division of labor is the entire point.

One last habit for those who use these tools regularly: if you find yourself typing the same introduction every time, most tools now let you save standing instructions so you only say it once. Worth five minutes of setup.

My take: the doctors dismissing AI after one lazy question are not grading the machine. They are grading their own history-taking, and the grade is not flattering.

We spend years teaching medical students that the history is 80 percent of the diagnosis, then we sit down at a tool with most of the medical literature in reach and give it three keywords. Our patients are already having long, detailed conversations with these tools before they ever reach our offices. A physician who cannot do the same is not protecting the profession. He is falling behind it. Learning to ask well is fast becoming part of professional responsibility, the same way learning to read a fetal monitor once was. The tool is ready. The question is whether we are.

Bottom line: stop typing keywords.

Give a history, state the goal, ask it to ask you questions, push back on the first answer, and verify what matters. You already know how to do this. You have been doing it on rounds for your entire career.

If this was useful, subscribe to ObGyn Intelligence at obmd.com. This is where I write what the data show, whether or not it is what the profession wants to hear.

References

1. Grünebaum A, Chervenak J, Pollet SL, Katz A, Chervenak FA. The exciting potential for ChatGPT in obstetrics and gynecology. Am J Obstet Gynecol. 2023;228(6):696-705.

2. Goh E, Gallo R, Hom J, et al. Large language model influence on diagnostic reasoning: a randomized clinical trial. JAMA Netw Open. 2024;7(10):e2440969.

3. Goh E, Gallo RJ, Strong E, et al. GPT-4 assistance for improvement of physician performance on patient care tasks: a randomized controlled trial. Nat Med. 2025;31(4):1233-1238

ObGyn Intelligence: The Evidence of Women’s Health

Discussion about this post

Ready for more?