Frame of Reference: The AI Problem & What it Actually Takes to Solve It

Author: Kedar Ganta, Chief Product, Technology, and AI Officer, Caregility

In my post on Navigating Polanyi’s Paradox, I argued that AI cannot replace tacit knowledge but can augment it. A camera can see but does not understand. This raises the natural question of what we can do to bridge the gap.

Many of us are familiar with the Abbott and Costello routine, Who’s on First, that made audiences laugh for nearly a century.

What makes it work is not that either man is wrong. Both are reasoning perfectly. The problem is that they are operating from entirely different frames of reference, and neither knows it.

The same dynamic is now showing up in AI systems.

For instance, suppose a patient hasn’t moved in fourteen minutes. Computer Vision system generates a fall alert. The clinical staff respond, only to see the patient in the restroom.

The system reasoned correctly within its own frame of reference. The problem is that the frame is now decoupled from the world it was supposed to describe. The camera saw no presence. It had no concept of where the patient had gone. It processed the signal correctly but interpreted it incorrectly.

We have spent an incredible amount of time asking whether AI makes errors.

We have spent considerably less time examining something more subtle and consequential.

AI may be reasoning correctly from a frame of reference it cannot fully share with us.

And when it fails, we often end up diagnosing the wrong problem.

This is not a hallucination problem. It is not a bias problem, at least not in the conventional sense.

It is a frame of reference problem. It may be the most underestimated design challenge for leaders building AI products.

Signals Are Not Meaning

A frame of reference error occurs when a measurement or observation is made from the wrong reference point, producing an interpretation that is internally consistent but factually wrong in the world where it has to operate.

This idea shows up across different domains.

In physics, motion is always relative. A person walking inside a moving train appears slow relative to the train, and fast relative to someone standing on the platform. Neither observer is wrong. But confuse the frames, and your conclusion becomes meaningless.

In Computer Vision, a camera detects a position in pixel coordinates: an x and y in image space. But the real question is spatial: is the patient in bed, on the floor, out of the room?

If the camera shifts slightly, for example, during cleaning, or if its tilt changes, the same pixel coordinates no longer correspond to the same physical reality.

The AI system continues to reason correctly within its frame of reference. But that frame has changed from the reality it is supposed to describe. The result is false alerts, missed events, and an erosion of trust in the AI system that is very difficult to diagnose if you don’t know what you are looking for.

Large language models operate at a more abstract level of the same condition. They have processed more human-described experiences than any individual will encounter in a lifetime. The model generates a fluent, confident account of situations it has never experienced. The words are not attached to anything felt. Follow the chain of reasoning far enough, and it never quite touches reality.

The Inverse of Polanyi

Michael Polanyi observed that we know more than we can tell. The experienced machinist, craftsman, or clinician cannot fully articulate what guides their hands. Tacit knowledge, the kind which is embedded in the body and accumulated through years of practice, cannot be explained in simple language.

AI inverts this paradox almost perfectly.

AI can tell more than it will ever know. It produces a detailed and confident account of situations it has never seen, decisions it has never made, and consequences it will never bear.

Polanyi showed us that language falls short of describing experience. The ‘frame of reference’ problem shows us that language, no matter how sophisticated, cannot replace experience.

This is the paradox your organization is building on.

This is not a flaw to be patched.

It is a structural condition of the technology. Not a model failure. Not a data quality problem, but rather a frame of reference misalignment.

It shapes what AI can and cannot be trusted to do, regardless of benchmark performance. This is a fundamental condition in which AI operates. The question is what we do about it.

Where Leaders are Already Feeling This

You probably haven’t labelled it a ‘frame of reference’ problem. You may have called it something else.

The AI tools that you are using aced different benchmarks and confused your users at scale – that was a frame misalignment. We have evaluated our Computer Vision in our AI lab. In other words, it was evaluated by people whose context matches its output. When the Computer Vision is deployed into context where that alignment was gone, it processed every signal available to it in a specific context.

Stop Designing as If the Gap Does Not Exist

The response to ‘frame of reference’ problems is not a smarter model. It is a more honest system design that treats reference frame alignment as a first-class engineering problem rather than an assumption.

Here is what we built to address the problem.

Make the frame explicit. In the spatial orientation, we looked at the field of view and transformed the camera sensor output before drawing inferences. In physics, this is a coordinate transformation. In Computer Vision, this is calibration and homography. Using Homographic transformation, we handled the 2D mapping from image plane to floor plane, accounting for camera height, tilts, and resolution.

Build the translation layer. We created a mechanism for moving between different frames of reference by combining multiple sensing modalities. Consider a scenario where radar and computer vision operate together to monitor occupancy. Radar detects presence in room coordinates, while computer vision detects a person in image coordinates.

In a restroom, for example, radar can confirm continued presence where camera coverage is intentionally absent for privacy reasons. When these signals are properly aligned, they tell a coherent and actionable understanding of the situation. When they are not aligned, the opposite happens. The systems are effectively speaking different spatial languages, with no translation between them. The result is noise with alerts that lack context, which erodes clinical confidence and increases alarm fatigue.

Recalibrate proactively. Frames drift as the camera shifts, or something else happens. Using computer vision, we take operational control to automatically recalibrate the camera position and restore an accurate field of view before any interpretation.

This is not an algorithm problem. It is a systems problem of establishing a shared canonical frame into which every sensor output is transformed before any inference is attempted.

Those who navigate this well will not be the ones who wait for that question to be resolved. They will be the ones who design explicitly for the gap, who treat reference frame alignment as infrastructure rather than assumption, and who recognize that the most dangerous AI failure mode is not the one that looks like an error.

The Shift

We are moving from model-centric AI to system-centric AI. In this next phase, success will not be defined by who has the most advanced model or the largest dataset.

Success will be defined by who can align signals across modalities, ground outputs in real-world context, and operate reliably within the correct frame of reference

AI does not fail simply because it lacks intelligence.

It fails because it interprets the world without sharing our frame of reference. Bridging that gap is not just a technical challenge. It is a product challenge and a systems challenge.

And solving it is what will separate AI that works in controlled demonstrations from AI that works reliably in the real world.

Frame of Reference: The AI Problem & What it Actually Takes to Solve It

Sign Up for Our Newsletter