Skip to main content
Every time a gated function is called, Attesta’s DefaultRiskScorer evaluates the action across 5 independent factors and produces a weighted composite score between 0.0 and 1.0. This score determines which challenge the human operator must complete before execution proceeds.

The 5 Scoring Factors

FactorWeightWhat It Analyzes
function_name0.30Verb classification — destructive, mutating, or read-only
arguments0.25Sensitive patterns in argument values (credentials, SQL, shell commands)
docstring0.20High-risk or caution keywords in the function’s docstring
hints0.15Caller-supplied risk hints (boolean flags, numeric values)
novelty0.10How many times this function has been called before
The final score is the weighted sum of individual factor scores:
risk_score = (function_name × 0.30) + (arguments × 0.25) + (docstring × 0.20) + (hints × 0.15) + (novelty × 0.10)
All factor scores are clamped to the [0.0, 1.0] range before weighting. The final composite score is also clamped, ensuring it never exceeds 1.0.

Factor 1: Function Name (weight: 0.30)

The scorer extracts the verb from the function name and classifies it into one of three tiers.

Destructive Verbs — score 0.95

These verbs indicate irreversible or highly impactful operations:
delete, remove, drop, destroy, purge, truncate, kill

Mutating Verbs — score 0.55

These verbs modify state but are typically reversible:
write, update, modify, set, create, send, deploy, push, execute, run

Read Verbs — score 0.10

These verbs indicate read-only operations with minimal risk:
read, get, list, fetch, search, find, check
If the verb does not match any category, a default mid-range score is assigned.
from attesta import gate

# Scores 0.95 — "delete" is a destructive verb
@gate
def delete_database(name: str) -> str:
    """Delete an entire database."""
    return f"Deleted {name}"

# Scores 0.55 — "deploy" is a mutating verb
@gate
def deploy_service(service: str) -> str:
    """Deploy to production."""
    return f"Deployed {service}"

# Scores 0.10 — "get" is a read verb
@gate
def get_status(service: str) -> dict:
    """Check service health."""
    return {"service": service, "status": "healthy"}

Factor 2: Arguments (weight: 0.25)

The scorer scans all argument values (converted to strings) for patterns that indicate sensitive or dangerous content.

Sensitive Patterns

CategoryPatterns
Credentialsproduction, .env, secret, password, token, key, credential
Dangerous SQLDROP, DELETE, TRUNCATE, ALTER
Shell dangersrm -rf, sudo, chmod 777
NetworkURLs, email addresses, IP addresses
Each detected pattern contributes to the argument score. Multiple matches compound the risk.
Arguments containing production or .env in any position will significantly elevate the risk score. This is by design — accidentally targeting production environments is one of the most common causes of AI agent incidents.
Python
# High argument score — contains "production" and a URL
deploy("api-gateway", env="production", url="https://api.example.com")

# High argument score — dangerous SQL
run_query("DROP TABLE users;")

# High argument score — shell danger
execute_command("sudo rm -rf /var/data")

# Low argument score — no sensitive patterns
get_user("usr_12345")

Factor 3: Docstring (weight: 0.20)

The scorer inspects the function’s docstring for keywords that signal risk.

High-Risk Keywords — score 0.85

irreversible, permanent, destructive, dangerous, production, critical

Caution Keywords — score 0.50

careful, warning, caution
If multiple keywords match, the highest score wins.
@gate
def wipe_storage(bucket: str) -> str:
    """Permanently and irreversibly delete all objects in a storage bucket.

    This is a destructive operation that cannot be undone.
    """
    return f"Wiped {bucket}"
# Docstring factor: 0.85 (matches "permanently", "irreversibly", "destructive")

Factor 4: Hints (weight: 0.15)

Callers can supply explicit risk hints to influence the score. Hints come in two forms:

Boolean Hints

Each boolean hint set to True adds 0.30 to the hint score (cumulative, clamped to 1.0).
@attesta.gate(risk_hints={"production": True, "affects_billing": True})
def update_pricing(plan: str, price: float) -> str:
    """Update subscription pricing."""
    return f"Updated {plan} to ${price}"
# Hint score: 0.60 (two True booleans × 0.30 each)

Numeric Hints

Numeric hints are scaled as min(value / 10000, 1.0) * 0.8:
@attesta.gate(risk_hints={"affected_rows": 50000})
def batch_update(query: str) -> int:
    """Bulk update user records."""
    return 50000
# Hint score: 0.80 (min(50000/10000, 1.0) × 0.8 = 1.0 × 0.8)
Use boolean hints for categorical flags like production, affects_pii, or requires_backup. Use numeric hints for quantities like affected_rows, file_count, or dollar_amount.

Factor 5: Novelty (weight: 0.10)

The novelty factor captures how familiar a particular function call is. The first invocation of any gated function is treated as highest novelty; the score decreases linearly with subsequent calls.
Call NumberNovelty Score
1st call0.90
2nd call0.81
3rd call0.72
5th call0.54
10th call0.10
11th+ call0.10 (floor)
The formula is:
novelty = max(0.9 - (call_count - 1) * (0.8 / 9), 0.1)
This means the novelty score starts at 0.9 for the first call and linearly decreases to 0.1 by the 10th call, remaining at 0.1 thereafter.
Novelty tracking is per-function, per-session. A new session resets all novelty counters.

Worked Example

Consider a function delete_user("usr_123", env="production") with docstring "Permanently remove a user account." called for the first time, with no hints:
FactorRaw ScoreWeightContribution
function_name — “delete” is destructive0.950.300.285
arguments — “production” detected~0.700.250.175
docstring — “permanently” detected0.850.200.170
hints — none provided0.000.150.000
novelty — first call0.900.100.090
Total0.720
A score of 0.72 maps to risk level HIGH, which triggers a quiz challenge.

Custom Scorers

DefaultRiskScorer covers general-purpose use cases. For domain-specific scoring, you can use a CompositeRiskScorer to blend the default scorer with custom logic, or replace it entirely with your own implementation.

Risk Levels

See how scores map to LOW, MEDIUM, HIGH, and CRITICAL

Risk Scorers

Combine or replace scorers with Composite, Max, and Fixed