The alert fires. The supervisor reads it. Figures out what it means. Opens a different system. Creates a work order. Notifies the operator. Tries to remember to document it. Sometimes does.

That sequence, data to human to decision to action, is where most manufacturing AI investments stop.

Most discrete manufacturers' first wave of AI investments were focused on producing better dashboards, clearer anomaly alerts, and more data visible from more places. But the same supervisors still find themselves doing the same manual triage every morning, now with a longer queue to work through.

The problem is architectural. When AI surfaces an insight but a human must still translate that insight into every subsequent action (opening a work order, checking the material cert, notifying the operator, logging the response), the AI has improved the information layer, but has left the execution layer unchanged.

In industries like Aerospace, Medical Devices, and Automotive, where a single untracked deviation can cascade into a full CAPA investigation or a recall review, that gap can quickly become a compliance exposure.

Agentic AI has the potential to close that gap. The deciding factor the becomes where in the stack the agent lives.

Going From Insight Engine to Digital Worker

For many, Generative AI acts as a digital consultant. It’s effective at answering questions, producing summaries, surfacing recommendations, and drafting documentation when you ask it to. The output lands in a human's hands. The human decides what to do. The chain of action starts and ends with a person.

Agentic AI has the ability to go much further. Given a defined goal and a set of operational guardrails, an agentic system perceives conditions, evaluates options, executes across connected systems, and logs the outcome. No human decision required at each step.

In manufacturing, that distinction is sharp. An MES with “AI capabilities” may see a torque deviation at Station 4, and respond: "Last three units measured 22.3 Nm against a nominal of 24–26 Nm. Recommended action: review recent assemblies." The operator then reads the alert, decides whether to hold the work order, logs the deviation, and maybe emails a quality technician.

An agentic system, seeing the same deviation, has the ability to identify the last conforming unit by serial number, holds the downstream work orders that depend on this lot, routes a rework instruction to the quality technician's interface with the evidence package pre-assembled (measurements, material certificate, operator ID, revision-controlled work instruction), and logs its own action in the traceable execution record.

The distinction is what happens between "the system detected something" and "the problem was addressed."

Deloitte projects a fourfold increase in manufacturing agentic AI adoption, from 6% today to 24% by 2027. The adoption gap reflects less a shortage of interest than a shortage of clarity on what production-ready agentic deployment requires.

Early AI adopters are focused on deploying solutions that tell manufacturers what’s happening. Few are exploring the opportunities that exist with solutions that have the ability to actually take action.

Why Agentic AI Falls Short When It Sits Too Far from the Work

ERP-level agents (SAP Joule, Oracle's embedded AI, Infor's Industry AI Agents) have the ability to address real planning problems. They can adjust production schedules in response to supply disruptions, optimize capacity allocation, and surface order management decisions with more context than a static dashboard could ever provide. Their deployment layer is the enterprise planning system.

That layer sits above where the work happens. These solutions are not designed to capture frontline execution context in real time.

When an agent operates on ERP data, it has the ability to make decisions about plans. When an agent operates on frontline execution data, it makes decisions about work. In discrete manufacturing, the compliance record (the as-built/as-inspected record) reflects the work as it was performed.

The Data the Agent Needs

For an agentic system to make a safe, autonomous decision in a complex manufacturing environment, it needs more information than can be found on just a work order. It needs the material lot that was pulled, the data collected at the station during execution, the operator identity and qualification, the work instruction that governed the step, and the downstream assemblies that depend on this unit passing.

This operational context lives at the execution layer: the guided workflow, the point-of-work data capture, the step-level records a frontline platform assembles as work proceeds. Without it, an agent reasons about conditions it doesn’t fully see. Its decisions may be fast. They may be usually right. But they cannot be defended in a regulatory audit, because the evidence chain that supports the decision doesn't exist.

The gap between where ERP-layer agents operate and where defensible autonomous decisions require being made is the architectural problem most agentic AI deployments haven't yet solved. What closes it is the context graph: the real-time web of operational data that connects every action to the material, the workflow, the operator, and the specification it was held to.


The Context Graph: Why Frontline Data Makes Agent Decisions Defensible

A context graph is the live web of operational data that links a specific unit of work to the things that matter for execution and compliance: the material lot, process parameters, operator identity and qualification, active work-instruction revision, timestamps, and downstream dependencies.

In Tulip's platform, it's assembled continuously as work proceeds: every step a guided workflow captures, every parameter a sensor records, every material lot an operator scans, every decision point logged with its timestamp and context. By the time an agent needs to act, the context it's reasoning from reflects the live state of that specific unit, on that specific line, as of this moment.

A well-instrumented frontline platform captures six dimensions that help determine whether an agent's autonomous decision holds up under scrutiny:

  • Material certifications and lot traceability linked to the specific serial or lot number

  • Process parameters measured at the station in real time (the actual data captured)

  • Operator identity and qualification status for each step performed

  • Work instruction revision that was active and governed at the time of execution

  • Step-level timestamps for every action in the sequence

  • Downstream dependencies, specifically which assemblies or operations rely on this unit passing

When an agent has access to this level of detail, its decisions are grounded.

The decision trail follows from the data. An auditor reviewing a flagged work order doesn't have to reconstruct what happened; the context graph shows the conditions that triggered the decision, the data the agent acted on, and what it did.

That is what distinguishes AI at the execution-layer from AI at the planning-layer.

Evidence by Default

In Aerospace, Medical Device, and any ISO- or FDA-regulated discrete manufacturing environment, compliance documentation is never "done". Every non-conformance is a potential record request. Every work order completion is a potential as-built document. Audit readiness is a posture you hold continuously, and the question to ask is always the same: can you show exactly what happened?

When AI agents operate outside of the execution layer (in ERP, BI platforms, or standalone AI tools), the answer to that question becomes harder to give.

The agent's actions don't automatically appear in the as-built record. Someone has to log them manually, or they're not there. In practice, that means the record has gaps wherever the agent touched something without a structured capture mechanism beneath it.

When agents operate inside the execution layer, the record builds itself. Every action the agent takes (enforcing a quality check, pausing a downstream work order, routing a rework instruction, flagging a deviation) is captured in the same traceable execution record as the operator's own steps. Not in a separate AI log that compliance has to reconcile later. In the record. Linked to the triggering condition, the data the agent acted on, and the outcome it produced.

This is what "evidence by default" looks like. Compliance documentation becomes a byproduct of normal production.

Tulip's Composable AI Agents are built around this model. Every agent action (the triggering event, the decision, the execution, the outcome) is recorded within the same platform that captured the underlying execution data.

The regulatory stakes around AI use have never been higher. The EU AI Act, now in force, imposes penalties up to €35 million or 7% of global annual turnover for violations involving high-risk AI applications. Manufacturing quality control systems are likely in scope. "The AI did it" is not an audit explanation. "Here is every decision the agent made, the data it acted on, and the outcome it produced" is.

Human-in-the-Loop: Governing the 90% So You Can Trust the Rest

What happens when the agent gets it wrong, and the team didn't know it was making the decision? That question stops more agentic AI pilots from reaching production than any capability gap.

It's a legitimate concern. According to Deloitte, only 21% of companies currently deploying agentic AI have a mature governance model. The risk in most early deployments is less the agent itself than the absence of a defined boundary around what the agent is and isn't authorized to do.

The 90/10 Model

Most of what an agent encounters on a discrete manufacturing floor is routine. A measurement within a known parameter range. A scheduling adjustment within established capacity limits. A shift summary compiled from structured execution data. These conditions follow predictable patterns, have bounded decision logic, and produce outcomes the compliance team has already reviewed. Call this the 90%: situations where the agent has sufficient context, clear rules, and the authority to act.

The 10% is where escalation belongs. Novel failure modes the agent hasn't encountered in its configured scope. Decisions that cross compliance thresholds requiring an authorized signature. Actions with significant downstream production impact (scrapping a batch, stopping a line). Any decision that generates a regulatory record requiring human attestation. These don't belong in autonomous execution, and a well-configured agent knows the difference.

In practice, defining the 90/10 boundary means three things:

  1. which decisions can the agent make without notifying anyone;

  2. which conditions trigger a human review before the agent acts; and

  3. which decisions require supervisor or quality technician approval before execution.

These configurations live in the platform, not the model. They are set by the team responsible for the process, reviewable by compliance, and adjustable as the operation matures.

Tulip's Composable AI Agents are configurable across all three. Teams define the scope, the escalation triggers, and the interface through which human decisions surface. When a decision requires human input, it comes through the operator's Tulip interface, not a separate AI portal the operator has to check in parallel. The workflow is the governance mechanism.

What Good Governance Looks Like

McKinsey's November 2025 State of AI report found that organizations with the strongest agentic AI deployments share one common characteristic: human-in-the-loop oversight frameworks where humans supervise, validate, and intervene.

That looks like:

  • A complete audit trail: every agent decision logged with its triggering condition, the data it acted on, the action it took, and the outcome

  • Human override at the operator level: any agent action reviewable and reversible by an authorized user

  • Scope controls: agents configured to specific action sets; any expansion requires an explicit configuration change

  • Escalation through the operator interface: decisions requiring human judgment surface in the workflow the operator is already using

  • Periodic performance review: agent behavior evaluated against quality and compliance metrics on a defined cadence

The teams that deploy agents confidently are the ones who designed the governance model first and the automation second.

How Agentic AI Integrates With Your Existing Tech Stack

If you run a 15-year-old SAP installation, you are not replacing it to deploy AI agents.

Manufacturers with deeply customized ERP or legacy MES systems have too much operational logic embedded in those platforms to remove them on a timeline that makes business sense.

Agentic AI at the execution layer doesn't require rip and replace.

Agents connect to existing systems via APIs. A quality agent working within Tulip can query SAP for the material certification linked to the current work order, pull the control limits from the work instruction specification, read the measurement from corresponding equipment, compare them, and act. All in a single orchestrated workflow. No system replaced. No core infrastructure migrated. The ERP still holds the records it was designed to hold; the execution platform captures the frontline context ERP was never designed to see; the agent bridges the two at the decision point.

McKinsey named this the "great AI agent and ERP divide." The finding is that the divide closes through augmentation, not replacement. Platforms like Tulip are the bridge.

For IT architects evaluating this decision, the relevant questions are:

What APIs exist for the ERP and MES systems currently in production?

How structured is the data those systems expose?

What does governed integration look like?

Adding a layer that reads from and writes to production systems requires security review and change management, even when it doesn't require rearchitecting the core stack.

Tulip integrates with SAP, Oracle, historians, PLCs, and a range of MES systems through standard APIs. The integration work is real. The infrastructure replacement isn't.

Agentic AI Use Cases in Discrete Manufacturing

The difference between how agents are described and how they perform in production gets clearer in operational terms: what condition triggered the agent, what it did, and what ended up in the record. Below are some practical examples of what this looks like in practice:

Quality Deviation Response

An operator measures three consecutive torque readings below the lower control limit on Work Order 4421, material lot TL-2209-A. On a line without agents, that measurement fires an alert on the quality dashboard. The quality technician reviews it during the next monitoring cycle, opens the MES, decides whether to hold the lot, notifies the operator, and tries to log the response in their QMS. The time between detection and containment is measured in hours. The audit trail, if it exists, is assembled from emails and notes.

When shifted to the execution layer, the agent identifies the last conforming unit by serial number. It holds the downstream work orders that depend on this lot. It routes a rework instruction to the quality technician's interface, pre-assembled with serial numbers, measurements, the material certificate, and the operator ID. The agent action is logged in the traceable execution record alongside the original measurement. By the time the quality technician reaches the queue, the evidence package is already there.

Response time drops from hours to minutes. The as-built record doesn't need reconstruction.

NPI Transfer

New product, 47-step assembly process, BOM changes from the previous revision. The three previously separate work instructions need reconciliation before first article. Without agents, process engineers do that manually: comparing revision histories, building training packages, finding gaps during first articles rather than before them.

With an agent, the delta between the new BOM and the active work instruction revision is identified before production begins. Changed steps are surfaced for engineer review and approval. The updated guided workflow reflects the new revision from Unit 1. Operator training points to the specific changed steps, not the full 47-step procedure. The traceable record of which work instruction revision governed which unit is established at the first article, not retroactively.

Shift Handover

Three open deviations. Two machines with maintenance flags. One work order behind schedule. The outgoing supervisor's written summary may or may not capture all of it, depending on how the shift went. The incoming supervisor's first fifteen minutes are usually spent getting up to speed rather than managing.

An agent compiles the handover automatically from execution data: open deviations with status and timestamp, machine flags with last known condition, production status against plan. The outgoing supervisor reviews and approves. The incoming supervisor sees a structured summary before reaching the floor. Open items don't fall through the transfer.

Validation Documentation Generation

New application built for a regulated production line in a medical device facility. GxP validation guide required before deployment. Without an agent, the validation team reviews the application structure manually, documents intended use, maps risk controls, writes test cases. Depending on application complexity, this takes days to weeks.

Tulip's Validation Guide Generator reads the application structure, maps it against GxP documentation requirements, and produces a draft guide automatically. The validation team reviews, adjusts, and approves. The work shifts from documentation generation to documentation review.

Evaluating Agentic AI Platforms for Discrete Manufacturing

Most agentic AI vendors will tell you their system is purpose-built for manufacturing. A few questions help separate platforms designed for the execution layer from those built for planning, documentation, or generic process automation.

Where does the agent operate in the stack? An agent that lives in ERP improves the insight pipeline. An agent that lives at the execution layer, where work happens, where process parameters are captured, where operator steps are recorded, can close the execution loop. Ask vendors to walk through a specific quality deviation scenario: where the agent gets its data, which systems it touches, and what ends up in the record.

What operational context can the agent see? Quality decisions in discrete manufacturing require material lot data, real-time process parameters, work instruction revision history, and operator-level execution records. If the agent is reasoning from ERP data that syncs nightly, it's working from yesterday's picture of a situation that may have changed in the last hour. Ask what the agent's data latency looks like and whether it can access frontline execution records.

What does the governance model look like in practice? Any vendor can describe a human-in-the-loop approach. Ask them to show it. Can your compliance team configure exactly when agents act autonomously and when they escalate? Is every agent decision logged with its triggering condition and outcome in a format that satisfies a regulatory review? Can an operator override an agent action at the floor level? If the governance model requires a separate portal to review, it will be ignored in practice.

Does it require new infrastructure, or does it connect to what exists? Integration work is unavoidable; wholesale platform replacement is a different conversation. Ask vendors to describe specifically which APIs they use to connect to ERP and MES, what the integration approach requires, and what the maintenance model looks like when the underlying enterprise system updates.

These aren't gotcha questions. They're the questions your compliance team will ask before any production deployment, and vendors who can't answer them clearly are telling you something.

The Difference Between a Pilot and a Production-Ready Deployment

Agentic AI adoption in manufacturing is projected to quadruple within the next two years. Most of that growth will produce pilots.

What separates pilots from production-ready deployments is the architecture: where the agent lives, what context it can see, what governance model bounds its actions, and whether the record it produces can survive regulatory scrutiny.

The execution layer is where the answers to those questions live. The context graph it maintains, the evidence it captures by default, the human-in-the-loop governance model it configures and enforces. These are the conditions under which autonomous action becomes defensible. Without them, agents are just sophisticated alert systems with extra steps.

Tulip's Composable AI Agents are built for this environment: composable, transparent, human-in-the-loop by design, and operating at the frontline execution layer where the work and the evidence live.

If you're ready to explore how Tulip's Composable Agents are helping discrete manufacturers automate their production processes, reach out to a member of our team today!

Orchestrate shop floor decisions with agentic AI

Use Tulip to connect data and workflows so AI agents execute tasks in context, improving coordination and real-time performance.

Day in the life CTA illustration
Frequently Asked Questions
  • How is agentic AI used in discrete manufacturing?

    Discrete manufacturers use agentic AI solutions like Tulip to autonomously execute multi-step responses to production conditions: detecting a dimensional deviation, tracing it to the specific work order and material lot, pausing downstream operations, and routing a rework instruction to the appropriate operator, all without a supervisor manually connecting each step.

    Unlike generative AI, which produces recommendations for humans to act on, agentic AI closes the loop between detection and response within the production workflow itself.

  • What is the difference between agentic AI and generative AI in manufacturing?

    Generative AI creates content: summaries, reports, recommendations, documentation.

    Agentic AI takes action. In manufacturing, generative AI might produce a shift summary or suggest a corrective action. Agentic AI detects the triggering condition, executes the corrective workflow, and logs the action in the traceable execution record, autonomously, within defined operational boundaries. The practical difference is loop closure: generative AI requires a human to act on its output; agentic AI executes the action itself.

  • What does human-in-the-loop governance mean for manufacturing AI agents?

    Human-in-the-loop governance in manufacturing AI means defining precisely when agents act autonomously and when they must escalate to a human. For routine, well-defined decisions (enforcing a standard quality check, rescheduling a production task within established parameters, compiling a shift handover), agents act independently. For decisions that cross compliance thresholds, involve novel failure modes, or require regulatory sign-off, the agent surfaces the decision through the operator interface for human review before acting. Teams configure the boundary; agents respect it. Every decision, whether autonomous or escalated, is logged with its triggering context and outcome.

  • Can Tulip's Composable AI Agents work with existing ERP and MES systems?

    Yes. Tulip's composable agents don't require replacing your existing ERP or MES infrastructure.

    Agents connect to legacy systems via APIs, reading work order data, material certifications, and BOM structures from SAP or Oracle, and process parameters from MES systems and historians, while executing actions within frontline workflow platforms. Tulip acts as an orchestration layer that bridges enterprise data with real-time execution context, giving manufacturers agentic capability without a wholesale infrastructure replacement program.

  • How do AI agents create audit-defensible records in regulated industries?

    When AI agents operate within a frontline operations platform like Tulip, every action they take (enforcing a quality check, pausing a work order, routing a rework instruction, generating a validation document) is captured in the same traceable execution record as the operator's own actions.

    The as-built and as-inspected record accumulates as work proceeds, rather than being assembled retrospectively before an audit. This "evidence by default" model means compliance documentation is a continuous byproduct of production, not a separate preparation task.

  • What manufacturing use cases are most deployment-ready for agentic AI today?

    The highest-value, most operationally bounded agentic AI use cases in discrete manufacturing are:

    • quality deviation response (detecting out-of-spec conditions and automatically triggering rework workflows)
    • production scheduling adjustment (rerouting work orders in response to equipment downtime or capacity changes)
    • validation documentation generation (auto-producing GxP or ISO validation guides from application structure)
    • shift handover automation (compiling production status from execution data)
    • new product introduction workflow setup (aligning work instructions to updated BOMs before first article production).

    These use cases share well-defined trigger conditions, bounded decision logic, and clear compliance value.

  • What context does an AI agent need to make safe decisions on the shop floor?

    Safe autonomous decisions in discrete manufacturing require agents to access operational context that ERP alone cannot provide: the specific material lot used at the station, the process parameter readings taken in real time, the revision of the work instruction that was active, the operator who performed the step, the step-level timestamps, and the downstream work orders that depend on this unit.

    This interconnected web of frontline data, what Tulip calls the context graph, is what grounds agent decisions in operational reality and makes them auditable. Agents operating without this context are reasoning from incomplete information, and their decisions cannot be defended in a regulatory review.