“Which HR processes are AI-eligible?” is the wrong question

An HR leader sits in the steering committee of her AI program. The slide says: “Phase 1 - automate sick-leave processing.” Six weeks later the pilot is paused. The reason: “Doesn’t work reliably enough.” What happened? The project treated the workflow as a single unit. But sick-leave processing is not a unit. It is a sequence of twelve decision points - and two of them should have remained with the human.

That is not an anecdote. It is the pattern. AI projects in HR rarely fail at model quality and almost always at the granularity of the question. Whoever asks “Which workflows do we automate?” gets unusable answers. Whoever asks “At which decision points within a workflow?” gets a buildable architecture.

This methodology describes how to ask the right question. It is technology-agnostic - no model preference, no vendor, no stack. It produces four artefacts from one audit: an agent design, a works-agreement template, a Decision Layer specification, and the EU AI Act documentation due for the high-risk deadline of 2 August 2026 (which the provisional Digital Omnibus agreement of 7 May 2026 is set to move to 2 December 2027, formal adoption still pending).

McKinsey estimates that 60 - 70 % of administrative HR activities are automatable with existing technology. Adoption sits at 3 %. The gap is not skepticism. It is the absence of a method that breaks workflow level down to decision-point level.

At a Glance

  • "Automating the workflow" fails. Classifying individual decision points produces an auditable architecture.
  • A typical HR process has 10 - 30 decision points. Experienced staff make most of them unconsciously - which is why they are missing from the requirements document.
  • Every decision point is human, rule-based, or AI-eligible - with three explicit tests, not gut feel. Ambiguity is the most common audit trap.
  • The audit output is simultaneously agent design, works-agreement template, Decision Layer specification, and EU AI Act documentation. Four stakeholders, one document.
  • Challengeability is architecture, not compliance afterthought. Bolting it on later means no auditable AI - regardless of model quality.

A typical HR process has twelve decision points, not one

Mapping precedes classification. It fails because of a quirk of experienced employees: they make many decisions unconsciously. “Check whether the sick note has a start date” is not perceived as a decision. They just do it. To an agent, every such check is an explicit decision that must be specified.

The method that surfaces these implicit decisions is the “what could go wrong” technique: for each step, ask what could fail or require a different action. Each answer reveals a decision point.

Before mapping, draw the workflow boundary. A workflow has a trigger, a sequence of processing steps, and one or more endpoints. Common mistake: drawing it too wide. “Onboarding” is not one workflow - it is a collection of five to eight sub-workflows (contract generation, IT provisioning, compliance documentation, workspace setup, training enrollment, buddy assignment, probation tracking). Audit each sub-workflow separately.

Take sick-leave processing - trigger: employee submits a sick note. Endpoints: SAP record updated, manager notified, payroll adjusted, return-to-work action scheduled if threshold exceeded. What most organizations describe as one step is a chain of twelve:

#Decision pointQuestion the process answers
1Document receiptSick note, doctor’s certificate, rehabilitation notice, or something else?
2Completeness checkEmployee name, diagnosis period, doctor’s name, doctor’s signature present?
3Employee identificationWhich employee does this belong to?
4Entity assignmentWhich legal entity is the employee in?
5Collective agreement lookupWhich collective agreement applies?
6Continued pay eligibility6-week period § 3 EFZG (DE) satisfied? Waiting period?
7Duration assessmentSingle day, short absence, or extended absence?
8Pattern detectionThreshold for BEM obligation (DE) crossed?
9Payroll impactOvertime cancellation, shift differential, bonus proration?
10System updateWhat changes in SAP/SuccessFactors?
11Notification routingWho is informed (manager, HRBP, payroll, works council)?
12Follow-up schedulingReturn date, BEM invitation, occupational-health referral?

Twelve decision points in a process most organizations describe as “employee submits sick note, we process it.” The gap between perceived simplicity and actual complexity is typical - and the reason “we automate sick leave” as a project goal produces no architecture.

Three decision types, three tests, one classification

Each decision point falls into exactly one of three types. Classification is binary. Ambiguity points to an audit error, not a special case.

Type H - human decides. Empathy, individual judgment, legal risk if automated, works-council co-determination mandate, or ethical sensitivity. Test question: “Would two different experienced professionals reliably reach the same conclusion on the same case?” If no - Type H. The human stays where law requires it. Not because they are better at it.

Type R - rule-based, deterministic. The rule exists in writing (law, collective agreement, works agreement, documented procedure). Inputs are structured. The output is deterministic. Exceptions are themselves rule-based - or they are separate Type-H decision points. Test question: “Could I write this decision as a spreadsheet formula?” If yes - Type R.

Type A - AI-eligible, probabilistic with bounds. The task is classification, extraction, or matching - not generation, judgment, or evaluation. The set of outcomes is known and finite. The result is verifiable. A confidence threshold can be set; uncertain cases escalate. Test question: “Am I interpreting information against known categories, or judging a unique situation?” If categories - Type A.

The sick-leave table classifies as follows:

#Decision pointTypeReasoning
1Document classificationAUnstructured input, classified into known categories, confidence-scoreable
2Completeness checkAField extraction with known required fields, verifiable
3Employee identificationAName matching with fuzzy match, confidence-scoreable
4Entity assignmentREmployee-ID → entity, deterministic lookup
5Collective agreement lookupREntity + employee category → CA, deterministic
6Continued pay eligibilityRStart date + history + § 3 EFZG, pure calculation
7Duration assessmentRCalendar arithmetic
8Pattern detectionRThreshold calculation (response to threshold is a separate H point)
9Payroll impactRAbsence type + pay rules
10System updateRExecution step from 4 - 9
11Notification routingRRouting rules from entity, type, threshold
12Follow-up schedulingRRule-based (the conversation itself is H, separate workflow)

Eight Type R, three Type A, no Type H. The follow-up actions that require human judgment (the BEM conversation, the return-to-work interview) are separate workflows with their own classification - they sit outside sick-leave processing.

The score doesn’t say “if”, it says “where the human stays”

Out of the classification comes a simple ratio: (Type R + Type A) ÷ total × 100. Sick leave scores 91.7 %. That is high. It does not mean “less human.” It means 91.7 % of individual decisions can be taken architecturally - while human oversight focuses on what law and empathy actually require.

The score thresholds are deployment logic, not a scale:

ScoreWhat it saysWhat to do
> 80 %High agent readinessImplement now. Governance design focuses on the few H handoffs.
60 - 80 %Moderate readinessPhased. R and A first, H stays manual. Meaningful human-in-the-loop load.
40 - 60 %Mixed readinessAutomate only the rule-based sub-processes. A full agent doesn’t pay off yet.
< 40 %Low readinessNot recommended. Process documentation and standardisation first.

Typical scores by HR domain: payroll & compensation 85 - 95 %, time & attendance 80 - 90 %, expense processing 75 - 85 %, onboarding administration 60 - 75 %, benefits enrollment 65 - 80 %, recruiting screening 40 - 55 %, performance management 20 - 35 %, employee relations 15 - 30 %. Recruiting and performance score lower not because they are technically impossible, but because they are high-risk systems under Annex III of the EU AI Act and human oversight carries more weight.

One audit output, four stakeholders

Per decision point, a structured record is produced: ID and description, classification with reasoning, rule source (for R) with version and validity period, confidence threshold (for A), escalation path with deadline, audit-trail specification, and challenge path. That record has four recipients:

The engineering team reads the classification as architecture spec. Where R, the rules engine; where A, a model call with confidence capture; where H, a task in the human queue. The record directly specifies the Decision Layer - the layer between agent and target system that orchestrates rule-based and AI-based decisions, generates the audit trail, and routes escalations.

The works council reads the same data as a co-determination template. They see which decisions are automated, with what reasoning, with what escalation. A framework works agreement can be derived directly from the audit, often faster than negotiating domain-specific agreements one by one.

The auditor reads rule versions and audit-trail specs as evidence integrity. Which rule applied when, on what data, by whom. The answer is in the record, not in a separate compliance document.

The data protection authority reads the same data as EU AI Act and GDPR documentation. Art. 11 (technical documentation), Art. 12 (record-keeping), Art. 86 (right to explanation), and Art. 22 GDPR require exactly this information.

Four functions from one document. The audit doesn’t produce them as a side effect - it is constructed for it.

Challengeability is architecture, not compliance afterthought

An automated decision about a person is permissible under the EU AI Act and GDPR only if the affected person can challenge it. Challengeability is not a duty bolted on later - it is an architectural requirement planned in from the first audit.

Per decision point, the audit answers four questions that together make a challenge possible. Which rule or model decided, in which version? Without versioning, later reproduction is impossible. What data underpinned the decision - the exact inputs at the time of the decision, not today’s? Who decided - human, rule, or AI - and at what confidence? How does the person file an objection - with concrete address, deadline, and next instance?

A consequence of the architecture: the agent itself does not make “decisions” in the legal sense. It executes operations the Decision Layer has authorised. That is not semantic hair-splitting. It is the foundation that keeps decisions auditable when the AI model is replaced, the vendor is switched, or a flaw in the model is discovered.

The high-risk classification of the EU AI Act applies directly to many HR workflows: recruiting screening, performance management, promotion and transfer decisions, shift routing where it affects personnel benefits (Annex III no. 4 lit. a and b). Art. 26(7) additionally requires informing workers’ representatives and affected workers before deployment - not afterwards. The audit must therefore answer per workflow: does this workflow fall under Annex III? If yes - the obligations from Art. 11, Art. 14, Art. 15, and Art. 26 attach, and the prior-information duty is a precondition for going live.

For organisations outside the EU: the requirements the EU AI Act lists explicitly are already enforceable in nearly every legal system as the interpretation of general duties of care - just without an explicit obligation list. Whoever builds the Decision Layer to be EU-AI-Act-compliant also satisfies what California, Brazilian, and UK data-protection authorities will demand in case of inspection.

Four classification errors distort every audit

A rule with judgment-bound exceptions is classified as Type R. “Overtime at 150 %” sounds rule-based. But: holiday at 200 %? Shift crossing midnight into a holiday? Individual contract overriding the collective agreement? When exceptions require judgment, the exception path is Type H - even if the standard case is Type R. Clean fix: the standard case as R, a separate decision point for exception handling as H or A.

An unverifiable AI output is classified as Type A. “The AI evaluates whether the candidate is a cultural fit.” That is not Type A - there is no verifiable correct answer. Cultural fit is subjective. Type A requires verifiability: “the AI classified this document as a sick note” can be validated by two humans on the same document. Subjective evaluations are Type H - always.

The trigger decision is conflated with the downstream action. “Pattern detection triggers a BEM process” - the detection is Type R (threshold), but the BEM process contains Type H decisions (return-to-work planning). The audit must separate the trigger from the triggered workflow.

Implicit decisions are ignored. “We check the sick note” sounds like one step. It contains at least three decisions: valid document, complete, matches the employee. The audit must decompose until each point has exactly one question and one classification.

What this methodology does not solve

The audit delivers agent readiness at decision-point level and the Decision Layer specification. It does not deliver: sequencing across workflows (that depends on transaction volume, governance complexity, organisational appetite; see the H1-H4 logic in the HR Agent Catalog), technology selection, organisational change, or the works-council negotiation strategy. The concrete implementation of the Decision Layer - rules engine, confidence capture, challenge endpoint - is its own topic.

The audit provides the fact base. Strategy and implementation build on it. Whoever starts with strategy and looks at the fact base later, builds the pyramid on its tip.

Further reading

Bert Gogolin

Bert Gogolin

CEO & Founder, Gosign

AI Governance Briefing

Enterprise AI, regulation, and infrastructure - once a month, directly from me.

No spam. Unsubscribe anytime. Privacy policy