Insights

The Messy Frontier: Why Open Surgery Is Where Surgical AI Will Actually Grow Up

Date

May 21, 2026

Author

Prashanth Ray

I've been asking a quiet question inside surgical AI circles for a while now: who else is seriously thinking about open surgery video?

Robotic gets most of the attention and for good reason. Clean optics, structured data, controlled environments. It's a natural fit for the first wave of surgical AI. But over the past year, I've been watching a parallel push gain real momentum: POV (Point of View)/FPV (First-Person View)/Egocentric capture for open cases, overhead rigs, step recognition across unstructured fields, temporal modeling in environments where nothing is guaranteed. And I think this is where surgical AI is going to do some of its most important growing up.

The Structural Advantage Nobody Talks About

Open surgery is messy. The field shifts. Instruments pass in and out of frame. Tissue behaves differently depending on the patient, the angle, the hour into the case. There's no robotic arm translating motion into clean coordinate data.

A model that learns to recognize a surgical step in a robotic cholecystectomy has learned something. A model that learns to recognize the same decision under the conditions of an open complex abdominal wall reconstruction, variable lighting, dynamic field of view, analog instrument handling has learned something harder and therefore more transferable.

Open surgery is the stress test. The models that survive it generalize better. That's been a quiet thesis of mine for a while, and I'm seeing more researchers arrive at the same place.

What's Actually Being Built

I've been developing a high-definition POV capture protocol for complex general surgery cases partly out of clinical curiosity, partly because I believe the data from these cases contains reasoning signals that structured surgical environments simply cannot replicate. Along the way, I've been approached for collaborations across a surprisingly wide range of use cases.

The questions I get asked range from the grounded to the genuinely ambitious:

"Can this help build more robust step recognition for teaching programs?"
"Could this feed multimodal training pipelines for surgical decision support?"
"Could we map surgical decisions back to long-term patient value like outcomes, complications, recovery?"
"Could this help train general-purpose embodied AI?"

The breadth of that question set tells you something. Open surgery video sits at an interesting intersection: it's clinically rich, temporally complex, and structurally unconstrained. That makes it hard to work with and uniquely valuable at the same time.

The Problem Isn't the Data. It's the Pipeline.

Here's the gap I keep running into: the limiting factor isn't interest in open surgery AI. It's capture infrastructure and data readiness.

Most surgical AI development has relied on laparoscopic and robotic video because the camera is already there, built into the system, producing clean output. For open surgery, you have to build the capture layer deliberately. And most attempts at this either sacrifice image quality, disrupt surgical workflow, or produce footage that's clinically unusable for downstream AI work.

Getting the capture layer right: head-mounted, high-definition, workflow-integrated, annotation-friendly is a harder problem than it appears. And it's the prerequisite for everything else. You cannot build serious open surgery AI on poor footage any more than you can build serious computer vision on blurred images.

This is the infrastructure gap that interests us most at Nuevata. Before you can model a decision, you have to faithfully capture it.

Why Multimodal Reasoning Changes the Stakes

The cases I find most compelling aren't pure computer vision problems. They're reasoning problems.

Open surgery is not a sequence of discrete, classifiable steps. It's a continuous stream of micro-decisions made in response to tissue behavior, patient anatomy, bleeding, time pressure, and accumulated intraoperative judgment. Modeling that well requires systems that can integrate visual input with procedural context: what just happened, what's expected next, what the deviation from expectation signals.

True multimodal reasoning for open cases is still early. But the groups working seriously on temporal modeling and decision-mapping in open surgery are, I think, working on some of the most interesting problems in surgical AI right now. Not the most marketed. The most interesting.

Open Surgery Isn't Going Anywhere

It's worth saying clearly: open surgery is not a legacy modality waiting to be replaced by robotics. For a large and growing category of cases: complex reconstruction, resource-constrained settings, emergency presentations, anatomically demanding procedures, open surgery is the gold standard and will remain so.

Building AI that serves these cases isn't a niche play. It's clinical coverage for the majority of the world's surgical volume.

And in the process of building it, the models trained on this data may turn out to be among the most capable we produce, precisely because the environment demanded it.


Nuevata is building surgical intelligence infrastructure for open surgery: Iris, a head-mounted 4K capture system, and an AI-powered platform for surgical video, analytics, and training.

Connect on LinkedIn · nuevata.com