Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Scenario Overview

Atelier uses behave (BDD) to capture platform decisions as executable specifications. Every scenario answers a concrete question: Does the config load? Can the runtime start? Does the classification pipeline converge?

These aren’t just tests. They’re the design context that connects architectural choices to the deployment realities of Cloudera AI.

Active Domains

155 scenarios across 35 features, 4 domains.

Infrastructure (infra)

Health checks and configuration lifecycle for the services Atelier depends on.

FeatureTagTierScenariosWhat it validates
Config lifecycle@config03HOCON load, CLI override precedence, materialize + validate
PostgreSQL health@postgres12Connection with pgvector extension, migration state
Qdrant health@qdrant11Vector store HTTP health endpoint
PGlite process@pglite02Node.js script existence, npm dependency declarations
Preflight@preflight03Structured deny/warn checks, GPU detection

Deployment

CAI deployment modalities and the runtime profile that catches failures before pushing.

FeatureTagTierScenariosWhat it validates
Runtime profile@runtime-profile06Import chain, script executability, config resolution, migration parsing
AMP lifecycle@amp0 + cai5.project-metadata.yaml structure, task patterns, install + start
Application modality@application0 + 13HOST binding logic, full local stack startup
Studio modality@studio02IS_COMPOSABLE root directory routing
Embeddings integration@embeddings04npm dependency, page component, React Router, preparation script
Naming conventions@naming02User-facing surfaces say “Embeddings”, no Apache Atlas confusion

Gateway

HTTP gateway endpoints, gRPC bridge, and live service integration.

FeatureTagTierScenariosWhat it validates
API endpoints@api0 + 18REST endpoint contracts, response shapes
API testclient@testclient07FastAPI TestClient integration (no running server)
Status endpoint@status0 + 14Aggregated health report, config state
Pipeline integration@pipeline12Classification pipeline via gateway
SPA routes@spa01Client-side routing fallback

Agent

Classification pipeline, DST evidence fusion, ML classifiers, and agent orchestration.

FeatureTagTierScenariosWhat it validates
Classification pipeline@gpu028DST belief, Dempster combination, features, patterns (+ Luhn/IPv4/date/currency validation), name matching, pipeline E2E, Monte Carlo sampling
Bootstrap convergence@bootstrap011LLM sweep, ML validation, targeted revisit, convergence criteria, frontier SVM
Agent convergence loop@gpu066-tool agent loop, conflict reports, convergence, mock client
Agent smoke test@agent06Agent metadata, tool definitions, state formatting
LLM backends@backend08Backend factory, Anthropic/Bedrock/Cerebras/OpenAI clients
ML classifiers@ml04CatBoost + SVM training, inference, virtual ensemble UQ
ML E2E@ml-e2e02Full synth → train → classify → evaluate cycle
Belief path@belief-path03Hierarchical navigation, cautious classification
SAGE importance@sage01Permutation-based feature importance
SHAP explanations@shap02TreeSHAP + PermutationSHAP attribution
Synth generation@synth02Synthetic data + reference-label generation
Synth framework@synth-framework02Generator registry, coverage reporting
Meta-tagging@meta-tagging02META_TO_ICE mappings, coverage
Experimentation@experimentation03Discount tuning, comparative evaluation
Real data@real-data03Production annotation validation (requires build/data/)

By Tier

TierRequiresScenariosPass locally
0Python only~120Yes
1devenv stack~15Yes (with devenv up)
caiLive CAI session~5Skipped (documentation-only)

Additional tags: @slow (~17 scenarios requiring extended runtime), @gpu (GPU detection/acceleration scenarios — run on CPU too, just slower).

Why BDD for a Deployment Platform?

CAI deployment has four modalities — Project, Application, AMP, and Studio — each with different constraints on networking, filesystem layout, and process lifecycle. Traditional unit tests verify module behavior in isolation. BDD scenarios verify that the system hangs together across these modalities.

Consider the Application modality: when CDSW_APP_PORT is set, the startup script must bind to 127.0.0.1 because CAI’s reverse proxy handles external traffic. Bind to 0.0.0.0 instead and you bypass the proxy’s auth layer. This isn’t a bug in any single module — it’s a deployment contract that only a scenario can express clearly:

Scenario: start-app.sh binds to 127.0.0.1 when CDSW_APP_PORT is set
  Given CDSW_APP_PORT is set to "8090"
  When I parse bin/start-app.sh for the HOST variable
  Then HOST is "127.0.0.1"

The scenario is the spec. A colleague reading this knows exactly what the constraint is, why it matters, and can verify it passes with just behave.