What is a workflow audit for agent readiness?

A workflow audit decomposes an HR process into its individual decision points and classifies each as human-only, rule-based, or AI-eligible. The output is at once an agent design map, a works-council agreement template, a Decision Layer specification, and the EU AI Act documentation.

How many decision points does a typical HR workflow have?

Time-tracking validation 4 - 6, sick-leave processing 10 - 15, onboarding 20 - 30, multi-entity payroll 40+. Experienced HR staff make most of these decisions unconsciously and systematically underestimate the count.

What is the difference between a rule-based and an AI-eligible decision?

A rule-based decision follows deterministic if-then logic from a law, collective agreement, or works agreement. An AI-eligible decision interprets unstructured information against known categories - classify, extract, match. Under the EU AI Act both require the same audit trail because an affected person must be able to challenge either.

How long does a workflow audit take?

A single workflow 2 - 4 hours with three participants: a process expert, a domain owner for the business rules, and a documenter. Top-10 workflows in roughly two weeks.

How does the audit relate to the EU AI Act?

Recruiting, performance management and promotion decisions are high-risk systems under Annex III of the EU AI Act. Under current law they require documented risk assessment, human oversight, and challengeability of every individual decision from 2 August 2026, though following the provisional Digital Omnibus agreement of 7 May 2026 that deadline is set to be postponed to 2 December 2027 (formal adoption still pending, as of June 2026). The audit provides the fact base. Without classification, EU AI Act documentation is guesswork.

How to Audit an HR Workflow for Agent Readiness

“Which HR processes are AI-eligible?” is the wrong question

An HR leader sits in the steering committee of her AI program. The slide says: “Phase 1 - automate sick-leave processing.” Six weeks later the pilot is paused. The reason: “Doesn’t work reliably enough.” What happened? The project treated the workflow as a single unit. But sick-leave processing is not a unit. It is a sequence of twelve decision points - and two of them should have remained with the human.

That is not an anecdote. It is the pattern. AI projects in HR rarely fail at model quality and almost always at the granularity of the question. Whoever asks “Which workflows do we automate?” gets unusable answers. Whoever asks “At which decision points within a workflow?” gets a buildable architecture.

This methodology describes how to ask the right question. It is technology-agnostic - no model preference, no vendor, no stack. It produces four artefacts from one audit: an agent design, a works-agreement template, a Decision Layer specification, and the EU AI Act documentation due for the high-risk deadline of 2 August 2026 (which the provisional Digital Omnibus agreement of 7 May 2026 is set to move to 2 December 2027, formal adoption still pending).

McKinsey estimates that 60 - 70 % of administrative HR activities are automatable with existing technology. Adoption sits at 3 %. The gap is not skepticism. It is the absence of a method that breaks workflow level down to decision-point level.

At a Glance

"Automating the workflow" fails. Classifying individual decision points produces an auditable architecture.
A typical HR process has 10 - 30 decision points. Experienced staff make most of them unconsciously - which is why they are missing from the requirements document.
Every decision point is human, rule-based, or AI-eligible - with three explicit tests, not gut feel. Ambiguity is the most common audit trap.
The audit output is simultaneously agent design, works-agreement template, Decision Layer specification, and EU AI Act documentation. Four stakeholders, one document.
Challengeability is architecture, not compliance afterthought. Bolting it on later means no auditable AI - regardless of model quality.

A typical HR process has twelve decision points, not one

Mapping precedes classification. It fails because of a quirk of experienced employees: they make many decisions unconsciously. “Check whether the sick note has a start date” is not perceived as a decision. They just do it. To an agent, every such check is an explicit decision that must be specified.

The method that surfaces these implicit decisions is the “what could go wrong” technique: for each step, ask what could fail or require a different action. Each answer reveals a decision point.

Before mapping, draw the workflow boundary. A workflow has a trigger, a sequence of processing steps, and one or more endpoints. Common mistake: drawing it too wide. “Onboarding” is not one workflow - it is a collection of five to eight sub-workflows (contract generation, IT provisioning, compliance documentation, workspace setup, training enrollment, buddy assignment, probation tracking). Audit each sub-workflow separately.

Take sick-leave processing - trigger: employee submits a sick note. Endpoints: SAP record updated, manager notified, payroll adjusted, return-to-work action scheduled if threshold exceeded. What most organizations describe as one step is a chain of twelve:

#	Decision point	Question the process answers
1	Document receipt	Sick note, doctor’s certificate, rehabilitation notice, or something else?
2	Completeness check	Employee name, diagnosis period, doctor’s name, doctor’s signature present?
3	Employee identification	Which employee does this belong to?
4	Entity assignment	Which legal entity is the employee in?
5	Collective agreement lookup	Which collective agreement applies?
6	Continued pay eligibility	6-week period § 3 EFZG (DE) satisfied? Waiting period?
7	Duration assessment	Single day, short absence, or extended absence?
8	Pattern detection	Threshold for BEM obligation (DE) crossed?
9	Payroll impact	Overtime cancellation, shift differential, bonus proration?
10	System update	What changes in SAP/SuccessFactors?
11	Notification routing	Who is informed (manager, HRBP, payroll, works council)?
12	Follow-up scheduling	Return date, BEM invitation, occupational-health referral?

Twelve decision points in a process most organizations describe as “employee submits sick note, we process it.” The gap between perceived simplicity and actual complexity is typical - and the reason “we automate sick leave” as a project goal produces no architecture.

Three decision types, three tests, one classification

Each decision point falls into exactly one of three types. Classification is binary. Ambiguity points to an audit error, not a special case.

Type H - human decides. Empathy, individual judgment, legal risk if automated, works-council co-determination mandate, or ethical sensitivity. Test question: “Would two different experienced professionals reliably reach the same conclusion on the same case?” If no - Type H. The human stays where law requires it. Not because they are better at it.

Type R - rule-based, deterministic. The rule exists in writing (law, collective agreement, works agreement, documented procedure). Inputs are structured. The output is deterministic. Exceptions are themselves rule-based - or they are separate Type-H decision points. Test question: “Could I write this decision as a spreadsheet formula?” If yes - Type R.

Type A - AI-eligible, probabilistic with bounds. The task is classification, extraction, or matching - not generation, judgment, or evaluation. The set of outcomes is known and finite. The result is verifiable. A confidence threshold can be set; uncertain cases escalate. Test question: “Am I interpreting information against known categories, or judging a unique situation?” If categories - Type A.

The sick-leave table classifies as follows:

#	Decision point	Type	Reasoning
1	Document classification	A	Unstructured input, classified into known categories, confidence-scoreable
2	Completeness check	A	Field extraction with known required fields, verifiable
3	Employee identification	A	Name matching with fuzzy match, confidence-scoreable
4	Entity assignment	R	Employee-ID → entity, deterministic lookup
5	Collective agreement lookup	R	Entity + employee category → CA, deterministic
6	Continued pay eligibility	R	Start date + history + § 3 EFZG, pure calculation
7	Duration assessment	R	Calendar arithmetic
8	Pattern detection	R	Threshold calculation (response to threshold is a separate H point)
9	Payroll impact	R	Absence type + pay rules
10	System update	R	Execution step from 4 - 9
11	Notification routing	R	Routing rules from entity, type, threshold
12	Follow-up scheduling	R	Rule-based (the conversation itself is H, separate workflow)

Eight Type R, three Type A, no Type H. The follow-up actions that require human judgment (the BEM conversation, the return-to-work interview) are separate workflows with their own classification - they sit outside sick-leave processing.

The score doesn’t say “if”, it says “where the human stays”

Out of the classification comes a simple ratio: (Type R + Type A) ÷ total × 100. Sick leave scores 91.7 %. That is high. It does not mean “less human.” It means 91.7 % of individual decisions can be taken architecturally - while human oversight focuses on what law and empathy actually require.

The score thresholds are deployment logic, not a scale:

Score	What it says	What to do
> 80 %	High agent readiness	Implement now. Governance design focuses on the few H handoffs.
60 - 80 %	Moderate readiness	Phased. R and A first, H stays manual. Meaningful human-in-the-loop load.
40 - 60 %	Mixed readiness	Automate only the rule-based sub-processes. A full agent doesn’t pay off yet.
< 40 %	Low readiness	Not recommended. Process documentation and standardisation first.

Typical scores by HR domain: payroll & compensation 85 - 95 %, time & attendance 80 - 90 %, expense processing 75 - 85 %, onboarding administration 60 - 75 %, benefits enrollment 65 - 80 %, recruiting screening 40 - 55 %, performance management 20 - 35 %, employee relations 15 - 30 %. Recruiting and performance score lower not because they are technically impossible, but because they are high-risk systems under Annex III of the EU AI Act and human oversight carries more weight.

One audit output, four stakeholders

Per decision point, a structured record is produced: ID and description, classification with reasoning, rule source (for R) with version and validity period, confidence threshold (for A), escalation path with deadline, audit-trail specification, and challenge path. That record has four recipients:

The engineering team reads the classification as architecture spec. Where R, the rules engine; where A, a model call with confidence capture; where H, a task in the human queue. The record directly specifies the Decision Layer - the layer between agent and target system that orchestrates rule-based and AI-based decisions, generates the audit trail, and routes escalations.

The works council reads the same data as a co-determination template. They see which decisions are automated, with what reasoning, with what escalation. A framework works agreement can be derived directly from the audit, often faster than negotiating domain-specific agreements one by one.

The auditor reads rule versions and audit-trail specs as evidence integrity. Which rule applied when, on what data, by whom. The answer is in the record, not in a separate compliance document.

The data protection authority reads the same data as EU AI Act and GDPR documentation. Art. 11 (technical documentation), Art. 12 (record-keeping), Art. 86 (right to explanation), and Art. 22 GDPR require exactly this information.

Four functions from one document. The audit doesn’t produce them as a side effect - it is constructed for it.

Challengeability is architecture, not compliance afterthought

An automated decision about a person is permissible under the EU AI Act and GDPR only if the affected person can challenge it. Challengeability is not a duty bolted on later - it is an architectural requirement planned in from the first audit.

Per decision point, the audit answers four questions that together make a challenge possible. Which rule or model decided, in which version? Without versioning, later reproduction is impossible. What data underpinned the decision - the exact inputs at the time of the decision, not today’s? Who decided - human, rule, or AI - and at what confidence? How does the person file an objection - with concrete address, deadline, and next instance?

A consequence of the architecture: the agent itself does not make “decisions” in the legal sense. It executes operations the Decision Layer has authorised. That is not semantic hair-splitting. It is the foundation that keeps decisions auditable when the AI model is replaced, the vendor is switched, or a flaw in the model is discovered.

The high-risk classification of the EU AI Act applies directly to many HR workflows: recruiting screening, performance management, promotion and transfer decisions, shift routing where it affects personnel benefits (Annex III no. 4 lit. a and b). Art. 26(7) additionally requires informing workers’ representatives and affected workers before deployment - not afterwards. The audit must therefore answer per workflow: does this workflow fall under Annex III? If yes - the obligations from Art. 11, Art. 14, Art. 15, and Art. 26 attach, and the prior-information duty is a precondition for going live.

For organisations outside the EU: the requirements the EU AI Act lists explicitly are already enforceable in nearly every legal system as the interpretation of general duties of care - just without an explicit obligation list. Whoever builds the Decision Layer to be EU-AI-Act-compliant also satisfies what California, Brazilian, and UK data-protection authorities will demand in case of inspection.

Four classification errors distort every audit

A rule with judgment-bound exceptions is classified as Type R. “Overtime at 150 %” sounds rule-based. But: holiday at 200 %? Shift crossing midnight into a holiday? Individual contract overriding the collective agreement? When exceptions require judgment, the exception path is Type H - even if the standard case is Type R. Clean fix: the standard case as R, a separate decision point for exception handling as H or A.

An unverifiable AI output is classified as Type A. “The AI evaluates whether the candidate is a cultural fit.” That is not Type A - there is no verifiable correct answer. Cultural fit is subjective. Type A requires verifiability: “the AI classified this document as a sick note” can be validated by two humans on the same document. Subjective evaluations are Type H - always.

The trigger decision is conflated with the downstream action. “Pattern detection triggers a BEM process” - the detection is Type R (threshold), but the BEM process contains Type H decisions (return-to-work planning). The audit must separate the trigger from the triggered workflow.

Implicit decisions are ignored. “We check the sick note” sounds like one step. It contains at least three decisions: valid document, complete, matches the employee. The audit must decompose until each point has exactly one question and one classification.

What this methodology does not solve

The audit delivers agent readiness at decision-point level and the Decision Layer specification. It does not deliver: sequencing across workflows (that depends on transaction volume, governance complexity, organisational appetite; see the H1-H4 logic in the HR Agent Catalog), technology selection, organisational change, or the works-council negotiation strategy. The concrete implementation of the Decision Layer - rules engine, confidence capture, challenge endpoint - is its own topic.

The audit provides the fact base. Strategy and implementation build on it. Whoever starts with strategy and looks at the fact base later, builds the pyramid on its tip.