NIST AI RMF Decoded: Map 3.1 – Assessing Intended AI Benefits

NIST AI RMF Map 3.1: "Potential benefits of intended AI system functionality and performance are examined and documented"

Share
NIST AI RMF Decoded: Map 3.1 – Assessing Intended AI Benefits
A double-exposed photograph showing Tesla in his Colorado Springs laboratory

This famous 1899 photo of Nikola Tesla sitting amidst millions of volts of electricity sent one clear message: "Look how much power this machine can generate - and how safe it is!"

But if Tesla had actually sat there while that machine was running, he would have been killed instantly. The photo was a double exposure - a publicity stunt designed to showcase his ambitious work on wireless energy transmission.

In my recent post on Map 2.3 of NIST AI Risk Management Framework, I broke down construct validity and the risks of flawed data. Today, we move to the next milestone: Map 3.1:

"Potential benefits of intended AI system functionality and performance are examined and documented."

Most organisations think they have this covered: "Our model optimises workflow, increases efficiency, cuts costs. These are the benefits - Map 3.1 checked!"

But the NIST AI RMF is not looking for a glossy marketing pitch. If you document AI benefits based solely on controlled lab data without accounting for real-world chaos, you are doing exactly what Tesla did: creating a beautiful illusion.

NIST AI RMF Map 3 full description

🛑 The biggest trap: Documenting theoretical benefits without real-world verification

Never assume high performance in a lab automatically translates to benefits in production.

During the pandemic, hundreds of predictive AI tools were built to diagnose COVID-19 from lung scans. On paper, the benefits were massive. In reality, almost none were clinically useful.

Why? The "lazy" models were gaming the data: instead of learning medical pathology, they learned to recognize specific hospital X-ray machine fonts, text labels, and even patient positions! Their documented "benefits" were completely meaningless in practice.

Source

🟢 The solution: grounding functionality in human impact

Examining benefits is a continuous process where technical metrics must be balanced against real-world human impacts.

Using an automated tool designed solely to optimise labor efficiency might result in unpredictable shifts and back-to-back "clopenings" (forcing an employee to close a store late at night and open it a few hours later). A responsible team can re-examines the core benefit, realise that operational efficiency cannot come at the expense of workforce stability, and update the model's constraints to explicitly prioritise employee well-being.

This is no longer just a best practice - it is fast becoming the law.

In 2026, New South Wales became the first Australian jurisdiction to regulate algorithmic management and worker surveillance by inserting Section 21A into the Work Health and Safety Act 2011 (NSW). This turns abstract compliance checks into a legally binding enforcement mechanism to protect workers from the unintended fallout of automated dispatch and performance metrics.

Lessons from the field: The PreHaRM project

PreHaRM project (Predictive Harm Response Management) - a predictive tool co-developed in South Australia to forecast patient falls, medication errors, and workplace violence. While the benefits looked impressive on paper, the real-world rollout highlighted massive institutional hurdles:

  • Governance bottlenecks made the team face a 443-day journey to clear ethics approvals, driven by institutional unfamiliarity with AI risk.
  • Progress stalled because the technical and medical teams worked in complete isolation. Breakthroughs only happened midway through the project when the two groups merged their meetings into a single, collaborative unit.

🛠️ Your Map 3.1 action plan

  1. How do we verify the benefit empirically?

You cannot claim an AI "improved" things if you didn't measure the starting point. Establish baselines: for example, if your AI promises to reduce patient falls, your baseline is the historical number of falls per month.

But if an AI successfully lowers patient falls but does so by constantly screaming high-stress alerts, it triggers alert fatigue. Nurses burn out, call in sick, and patient care actually degrades. Calibrate the algorithm to trigger high-level alerts only during genuine, high-risk dangers.

2. Does the functionality survive real-world noise?

Lab datasets are pristine. Real-world data is messy, filled with typos, late entries, and missing fields. Intentionally inject broken or missing data during testing to see if the model holds up.

Additionally, run a "silent trial" where the AI processes live data feeds in the background without showing alerts to staff, verifying real-world performance before going live.

More on silent trials in clinical AI apps: https://pmc.ncbi.nlm.nih.gov/articles/PMC9424628/pdf/fdgth-04-929508.pdf
💡
For more papers on silent trials, check out this scoping review: A scoping review of silent trials for medical artificial intelligence
  1. What happens when benefits conflict?

Address the trade-offs - because maximizing a primary benefit (like speed or cost-cutting) often degrades a secondary factor (like safety or employee retention).

  1. Implement participatory approaches - engage end-users and gatekeepers early

Don't build in a vacuum. Work directly with system end-users to understand and document the system’s potential benefits, efficacy, and the interpretability of the AI’s task output. Crucially, as learned in the PreHaRM project, extend this human-centered framing to include administrative gatekeepers, data custodians, and institutional approvers. Treat them as active co-contributors and co-creators of AI systems, rather than peripheral compliance checkpoints.

Iterative co-design process for the AI system's graphical user interface
Workshop activities (https://www.researchprotocols.org/2023/1/e47717/PDF)
Screenshot from a web-based survey: https://www.researchprotocols.org/2023/1/e47717/PDF
  1. Upskill your team

Hire or train internal team members to gather feedback from users and operators, and turn those insights directly into AI design requirements.

  1. Actively dismantle organizational silos

Mirror the PreHaRM framework by bringing your developers and engineers together with your front-line operators and domain heads into unified, joint working structures to foster shared understanding and agile risk-mitigation.

  1. Establish active communication mechanisms

Create formal, interpersonal mechanisms for regular, real-time communication between relevant AI actors and stakeholders regarding system design or deployment decisions, replacing passive email chains with direct dialogue.

  1. Enforce transparency and documentation

To fully satisfy Map 3.1, your organisation should explicitly document and answer the following three governance questions:

  • Communication: Have the benefits of the AI system been clearly communicated to end-users?
  • Enablement: Have the appropriate training material and disclaimers about how to adequately use the AI system been provided to end-users to avoid automation bias?
  • Systemic risk: Has your organization implemented a risk management system to address risks involved in deploying the identified AI system (e.g., personnel risk or changes to commercial objectives)?

Mapping your benefits is not about writing a corporate wish list for your AI: it is about defining the objective boundaries of its success. True responsible AI innovation requires realizing that early engagement, structured cross-functional dialogue, and trust are not optional administrative hurdles - they are the core infrastructure that enables a safe system of work.

What about Tesla?

On paper, his Colorado Springs lab was a perfectly isolated testing ground. But when he finally cranked his transmitter to full capacity, the real world pushed back, as he ignored the interconnected grid around him. The massive high-frequency feedback surged straight into the local infrastructure, setting the town’s primary generator on fire and plunging all of Colorado Springs into pitch darkness.

Colorado Springs Laboratory (Image source)

Down the hill, sparks jumped from the dirt to pedestrians' feet, water taps sent sparks, and horses panicked as current traveled through their metal shoes.

NIST AI RMF Map 3.1 reminds you: build for the ecosystem, not the lab.