#Score aggregation

Multi-policy evaluations (chains, graphs, and multi-policy batches) roll up a single weighted aggregate score across all contributing policies. Operators declare per-policy weights at authoring time and per-tenant thresholds in engine settings; the engine returns a uniform aggregate block on every response.

At a glance:

For each step that ran successfully:
    score = map(step.mode, step.outcome, step.score_mapping)
                              │
                              ▼
    contribution = score × weight
                              │
                              ▼
weighted_score = Σ contributions / Σ weights      ← renormalised
                              │
                              ▼
threshold = bucket(weighted_score, tenant.thresholds)
                              │
                              ▼
action = tenant.actions[threshold]              ← may be null

Skipped and failed steps are excluded from both numerator AND denominator — the score renormalises across what actually ran. Branches that didn't fire (because their conditional edge wasn't true) don't drag the aggregate down.

#Aggregate response block

Returned on every chain / graph / multi-policy response. Null when no policy contributed (all generate-mode excluded, or all steps failed/skipped).

"aggregate": {
  "weighted_score": 0.84,
  "threshold":      "review",
  "contributions": [
    { "step_id": "privacy_check",  "mode": "validate", "score": 1.0,  "weight": 0.4, "contribution": 0.40 },
    { "step_id": "geo_licensing",  "mode": "score",    "score": 0.85, "weight": 0.4, "contribution": 0.34 },
    { "step_id": "customer_tier",  "mode": "classify", "score": 0.5,  "weight": 0.2, "contribution": 0.10 }
  ],
  "action": {
    "kind":   "queue_for_review",
    "params": { "queue_id": "compliance-tier-2" }
  }
}

#Per-mode score mapping

How a policy's mode-specific outcome maps to a 0-1 contribution score. Operators author overrides on the policy via the admin (or via POST /api/admin/policies with the score_mapping field); defaults apply otherwise.

ModeDefault mappingOperator override
validatepassed → 1.0, failed → 0.0{"type": "passed_to_score", "invert": true} flips polarity
scorePassthrough (outcome.score directly)None needed
decideUnmapped action → neutral 0.5{"type": "decide_to_score", "actions": {"approve": 1.0, "reject": 0.0, ...}}
classifyUnmapped label → neutral 0.5{"type": "classify_to_score", "labels": {"low": 1.0, "high": 0.0, ...}}
generateExcluded from aggregation{"type": "generated_text_present"} → 1.0 if text exists, 0.0 otherwise

Convention: higher score = more compliant / lower risk / better outcome. Operators with inverted policies (risk model where high = bad) flip via the mapping (or use "invert": true for validate).

#Per-tenant thresholds + actions

Stored on the org's settings, dialed via the admin engine-settings page (PUT /api/admin/engine-settings/aggregation).

thresholds:
  pass:   0.9     # score >= 0.9 → "pass"
  review: 0.7     # 0.7 <= score < 0.9 → "review"
                  # below 0.7 → "block"

actions:
  pass:   { kind: "auto_approve",      params: {} }
  review: { kind: "queue_for_review",  params: { queue_id: "compliance-tier-2" } }
  block:  { kind: "reject",            params: { message: "Compliance denied." } }

Default thresholds are pass=0.9, review=0.7. Default actions are empty — aggregate.action returns null until the tenant wires them. The engine never executes actions; the payload is returned for the caller's stack to fire (same "pure reasoning" stance as decide-mode actions).

#Example

A graph with one validate + one score + one classify policy, weighted 0.4 / 0.4 / 0.2:

curl -X POST https://aiengine.velgent.com/api/v1/policies/graph \
  -H "Authorization: Bearer velgent_live_..." \
  -d '{
    "graph_slug": "compliance/transaction-review",
    "inputs": { "amount": 50000, "country": "US", "customer_tier": "premium" }
  }'

Response (abbreviated):

{
  "request_id": "...",
  "steps": [
    { "id": "privacy_check", "status": "ok", "outcome": { "passed": true } },
    { "id": "geo_licensing", "status": "ok", "outcome": { "score": 0.85, "reasons": [...] } },
    { "id": "customer_tier", "status": "ok", "outcome": { "primary_label": "premium", "labels": ["premium"], "confidence": 0.92 } }
  ],
  "aggregate": {
    "weighted_score": 0.84,
    "threshold":      "review",
    "contributions": [
      { "step_id": "privacy_check", "mode": "validate", "score": 1.0,  "weight": 0.4, "contribution": 0.40 },
      { "step_id": "geo_licensing", "mode": "score",    "score": 0.85, "weight": 0.4, "contribution": 0.34 },
      { "step_id": "customer_tier", "mode": "classify", "score": 0.5,  "weight": 0.2, "contribution": 0.10 }
    ],
    "action": { "kind": "queue_for_review", "params": { "queue_id": "compliance-tier-2" } }
  }
}

(customer_tier's score is 0.5 because the classify policy doesn't have an explicit score_mapping — operator would add {"premium": 1.0, "standard": 0.7, "trial": 0.3} to lift the tier contribution.)

Chain vs Graph audit ops

Chain runs write operation: "policy_chain" to audit_logs; graph runs write operation: "policy_graph". Filter your activity dashboards by this if you want to separately measure either. Both ops still write one policy_evaluations row per executed step.


Back to: Policy engine overview →