Skip to content

Roadmap

A running inventory of what the workspace covers today, gaps surfaced during use, and the near-term plan. Pre-1.0, no deployed users; every item is a judgment call about priority, not a commitment.

Current state

Workspace surface

Package Role
dhis2w-client Async HTTP client, pluggable auth, typed responses via generated models. Retry policy, task awaiter, connection-pool tuning all first-class.
dhis2w-codegen /api/schemas → pydantic emitter + OAS spec-patches framework (synthesises Jackson discriminators upstream DHIS2 omits)
dhis2w-core Plugin runtime + shared services (profiles, CLI errors, task watch, client context)
dhis2w-cli Typer root that discovers every plugin (first-party + entry-point)
dhis2w-mcp FastMCP server that mounts the same plugins
dhis2w-browser Playwright session helpers (auth through the DHIS2 login form)

CLI surface

Sixteen top-level domains: analytics, apps, browser, data, dev, doctor, files, maintenance, messaging, metadata, profile, route, system, user, user-group, user-role. Each plugin shares a service.py between the CLI and MCP sides; the same typed call from both surfaces.

dhis2 metadata has the full workflow surface:

  • Core CRUD: list / get / patch (RFC 6902) / rename (bulk name/shortName/description add + strip prefix/suffix, --dry-run) / retag (bulk ref-field + enum rewrites: categoryCombo, optionSet, legendSets, aggregationType, domainType) / share (bulk apply one sharing block to many UIDs, with --public-access / --user-access UID:access / --user-group-access UID:access, stdin UID input via -, --dry-run) / merge-bundle (import a saved JSON bundle file into a target profile — sibling to the source-profile merge verb).
  • Cross-resource search: dhis2 metadata search <query> — fans out three concurrent /api/metadata?filter=<field>:ilike:<q> calls (id, code, name) and merges by UID. Full UID, partial UID, business code, or name fragment all flow through one verb.
  • Bundle operations: export / import / diff (file-vs-file and file-vs-live) with per-resource filters + dangling-reference warning on export; diff-profiles for staging-vs-prod drift.
  • Authoring sub-apps: options get / find / sync for OptionSet sync; attribute get / set / delete / find for cross-resource AttributeValue workflows; program-rule get / vars-for / validate-expression / where-de-is-used; sql-view list / get / execute / refresh / adhoc; viz list / get / create / clone / delete; dashboard list / get / add-item / remove-item; map list / get / create / clone / delete; legend-sets list / get / create / clone / delete; four full X / XGroup / XGroupSet authoring triples with canonical DHIS2 naming — organisation-units / organisation-unit-groups / organisation-unit-group-sets (plus organisation-unit-levels for per-depth rename), data-elements / data-element-groups / data-element-group-sets, indicators / indicator-groups / indicator-group-sets, and category-options / category-option-groups / category-option-group-sets; plus the program-indicators + program-indicator-groups pair (DHIS2 has no programIndicatorGroupSet). Aggregate data-set surface: data-sets list / get / create / add-element / remove-element / delete + sections list / get / create / add-element / remove-element / reorder / delete. Authoring flip side of maintenance runs: validation-rules {list,show,create,delete} + validation-rule-groups + predictors {list,show,create,delete} + predictor-groups. Tracker-schema authoring complete end-to-end: tracked-entity-attributes + tracked-entity-types (with TETA linkage) + programs {list,show,create,rename,add-attribute,remove-attribute,add-to-ou,remove-from-ou,delete} + program-stages {list,show,create,rename,add-element,remove-element,reorder,delete}. Category-dimension authoring complete end-to-end: categories {list,show,create,rename,add-option,remove-option,delete} + category-combos {list,show,create,rename,add-category,remove-category,wait-for-cocs,delete,build} (the build verb is the one-pass create-or-reuse helper for the full stack, fed a JSON CategoryComboBuildSpec) + read-only category-option-combos {list,show,list-for-combo}.

dhis2 doctor runs ~100 checks on a live instance (20 metadata-health probes + 81 DHIS2 integrity checks + BUGS tripwires).

MCP surface

334 tools across 13 plugin groups (analytics_*, apps_*, customize_*, data_*, doctor_*, files_*, maintenance_*, messaging_*, metadata_* (230), profile_*, route_*, system_*, user_*). Every CLI command has an MCP tool equivalent and vice versa; both share one typed service call.

Typed models shipped

Via /api/schemas codegen (generated/v{40,41,42,43,44}/schemas/):

  • 100+ metadata resources (DataElement, DataSet, OrganisationUnit, Indicator, Program, …) with full CRUD accessors including the RFC 6902 patch(uid, ops) method
  • 77+ StrEnums for CONSTANT properties (ValueType, AggregationType, DataElementDomain, …)
  • A shared Reference with both id and code fields

Via /api/openapi.json codegen (generated/v{N}/oas/, currently populated on v42 + v43):

  • Every components/schemas entry — 562 classes + 260 StrEnums + 104 aliases on v42; 984 classes on v43.
  • Consumers in dhis2w-client: envelopes.py, auth_schemes.py, aggregate.py, system.py, maintenance.py, and generated/v42/tracker.py are all thin shims over the OAS output.
  • Emitter is deterministic + version-scoped; dhis2 dev codegen oas-rebuild --version v{N} regenerates from the committed openapi.json without network.
  • Spec-patches framework for known-upstream OAS gaps (dhis2w_codegen.spec_patches). Each patch is idempotent + carries a bugs_ref pointer; the rebuild log names which gap was worked around. Current patches: *AuthScheme discriminators (BUGS.md #14 — still unfixed in v43).

Remaining hand-written in dhis2w-client (by design):

  • WebMessageResponse subclass + DataIntegrityReport / DataIntegrityResult / Me / Notification — helper methods and client-side convenience shapes that aren't in OpenAPI.
  • AnalyticsMetaData — typed parser helper over Grid.metaData (a bare dict[str, Any] on the wire). Grid / GridHeader come straight from the OAS codegen.
  • TrackerBundle — the POST /api/tracker envelope isn't in OpenAPI under that name. Thin wrapper on OAS tracker models.
  • PeriodType + RelativePeriod StrEnums (24 period frequencies + 45 rolling windows; upstream Java enums the OpenAPI schema doesn't expose — see BUGS.md #28).

Typing posture

The four-PR typing sweep (#71-#74) plus the codegen discriminator synthesis (#76) eliminated every dict[str, Any] signature that crosses module boundaries outside the explicit HTTP-boundary carveouts. Every service-layer function returns a typed pydantic model; MCP tools dump at the edge via _dump_model; CLI handlers dump for JSON output or Rich tables. The CLAUDE.md "no dict[str, Any] across module boundaries" rule is enforced workspace-wide.

Runtime features

  • --profile/-p global override + ~/.config/dhis2/profiles.toml or ./.dhis2/profiles.toml auto-discovery
  • --debug/-d global flag → stderr HTTP trace lines via dhis2w_client.http logger
  • --watch/-w on job-kicking commands (analytics refresh, maintenance dataintegrity run) + standalone maintenance task watch with Rich progress UI
  • --json opt-in on every write command; concise one-line summary by default
  • Typed Dhis2ApiError.web_message parses the envelope on 4xx so the CLI surfaces conflicts[] / importCount / rejectedIndexes[] detail
  • Client-side UID generation (generate_uid, generate_uids); no /api/system/id round-trip
  • External plugin loading via importlib.metadata.entry_points(group="dhis2.plugins") — see examples/plugin-external/ for a minimal runnable reference
  • Retry policy with exponential backoff + jitter + Retry-After header honouring. Idempotent-only by default; opt in for POST/PATCH per policy. Threads through Dhis2Client(retry_policy=...) and open_client(profile, retry_policy=...).
  • Library-level task awaiterclient.tasks.await_completion(task_ref) blocks until DHIS2 reports completed=True; iter_notifications for streaming renderers.
  • Connection-pool tuningDhis2Client(http_limits=httpx.Limits(...)) / open_client(profile, http_limits=...) for sizing against the real DHIS2 capacity.
  • Data-integrity streaming iteratorclient.maintenance.iter_integrity_issues(...) yields IntegrityIssueRows (issue + owning check's name / displayName / severity) as a flat stream.
  • Files plugindhis2 files documents {list,get,upload,upload-url,download,delete} + dhis2 files resources {upload,get,download}.
  • System metadata cache — TTL-bounded per-client in-memory cache on client.system for info() / default_category_combo_uid() / setting(key). 300 s default TTL.
  • Bulk delete on client.metadatadelete_bulk(resource_type, [uids]) + delete_bulk_multi({...}) wrap POST /api/metadata?importStrategy=DELETE.
  • client.metadata.search — cross-resource UID / code / name search; three concurrent /api/metadata?filter=<field>:ilike:<q> calls merged client-side with UID dedup. Typed SearchResults(query, hits: {resource: [SearchHit, ...]}, total).
  • client.visualizationsVisualizationSpec typed builder (chart type, data elements, indicators, periods, relative periods, legend set, placement overrides) + create_from_spec / clone / list / delete accessor. RelativePeriod StrEnum covers the 45 rolling windows upstream OpenAPI exposes as boolean flags.
  • client.mapsMapSpec + MapLayerSpec typed builder with indicators, legend_set, thematic / boundary / facility layer kinds; parallels the viz accessor.
  • client.dashboardsDashboardSlot + add_item / remove_item on the dashboards accessor, no round-trip of the whole dashboard.
  • Streaming data-value-set importclient.data_values.stream(source, content_type=...) feeds httpx's chunked transfer directly from a Path, bytes, sync / async iterable, or async generator. JSON / XML / CSV / ADX.
  • Streaming analytics exportclient.analytics.stream_to(destination, *, params, endpoint="/api/analytics.json") pipes httpx's chunked response straight to disk via aiter_bytes.
  • Multi-instance metadata diffdhis2 metadata diff-profiles <a> <b> -r <resource> exports two registered profiles concurrently and diffs them structurally.

Seed fixture

The committed e2e dump (infra/v42/dump.sql.gz) mirrors DHIS2 Play's Sierra Leone immunization demo with workspace-local additions: 1332 org units with GeoJSON geometries, 67 data elements, 3 indicators, 3 programs (Child Programme + Antenatal = tracker; Supervision visit = event), 2 datasets, 3 dashboards, 23 visualizations built programmatically via VisualizationSpec + 1 EventVisualization for the supervision program attached to the Immunization data dashboard, 8 maps built via MapSpec, 188k aggregate data values, 500 tracker entities, 12 sample supervision events covering 2024 monthly, 6 program rules + 10 program indicators. Workspace fixtures layered on top (infra/scripts/seed/workspace_fixtures.py): SNOMED_CODE attribute, VACCINE_TYPE option set with 5 fixed-UID options, 3 SqlViews (VIEW / QUERY / MATERIALIZED_VIEW), 2 BCG predictors + PredictorGroup + 2 output DEs, 2 BCG validation rules + ValidationRuleGroup, 4 named OrganisationUnitLevel records (Country / Province / District / Facility), 1 LegendSet (LsDoseBand1) attached to the Measles + Penta-1 monthly column charts. make refresh-and-verify wipes the stack, rebuilds the dump, runs every non-interactive example, and reports a pass/fail summary — 125 pass, 0 fail, 11 skipped (OIDC-login flows including the Playwright-driven variant, browser screenshot captures, long-running analytics jobs, outlier detection, external httpbin dep) on the current main.

CI

  • .github/workflows/ci.yml runs make lint && make test && make docs-build on every PR
  • .github/workflows/e2e.yml nightly — full DHIS2 stack + seeded fixtures + slow integration tests

Public distribution (PyPI, tagged releases, CHANGELOG.md) is explicitly out of scope — this workspace is internal-only.

Docs

  • Auto-generated CLI reference (docs/cli-reference.md, ~3700 lines from the Typer app) + MCP reference (docs/mcp-reference.md, 313 tools across 13 groups from the FastMCP server). Both regenerated on every make docs-build.
  • Narrative tutorials: docs/guides/cli-tutorial.md, docs/guides/client-tutorial.md, docs/guides/visualizations.md (step-by-step viz + dashboard composition).
  • Examples index (docs/examples.md) catalogues 167 runnable examples (74 client, 54 CLI, 39 MCP) with descriptions + cross-links to concept docs. Tracker-schema authoring examples (steps 1 / 2 / 3 under examples/cli/tracker_*.sh) round-trip the full chain end-to-end.
  • Architecture docs cover every plugin, the client, auth, profiles, codegen, typed schemas, plugins runtime, external plugins, MCP, versioning, browser automation.
  • BUGS.md — 29 upstream DHIS2 quirks with live curl repros + v43 re-audit status.

Test coverage

872 tests run via make test (903 collected including 31 slow-marked for the nightly integration stack). Unit + CliRunner + respx-mocked HTTP. Slow tests exercise live-stack workflows (--watch job polling, Playwright PAT creation, dashboard screenshot capture, Playwright-driven OIDC login). make coverage runs branch-coverage locally + on every CI run (produces coverage.xml as an artifact), fails CI if the run drops under 70% (current baseline 73%).

Detailed test gaps + the planned next moves are in Testing roadmap below.

Upstream quirks tracked

38 entries in the repo-root BUGS.md. Recent additions cover the seed / workflow cycle: DataSet Hibernate flush ordering (#23), Person-TET built-in name collisions (#24), /api/.../metadata leaking computed fields (#25), admin OU scope cached per session (#26), fresh-install flakiness on first metadata import (#27), RelativePeriods OAS schema shape (#28), /api/metadata ignoring rootJunction (#29 — the reason metadata search has to fan out N requests instead of one), App Hub versions[*].created returning epoch-millis ints instead of ISO-8601 strings (#30), and the predictor-expression parser rejecting uppercase aggregators (#31 — forces avg() / sum() lowercase even though DHIS2 docs use uppercase).

Gaps surfaced during use

Authoring surfaces

The organisation-unit PR (#174) set a template — canonical DHIS2 resource names, hand-written accessors, per-item membership shortcuts, no *Spec. The triples sweep (#174 / #175 / #176 / #180 / #181), aggregate data-set surface (#185), validation-rule + predictor CRUD (#186), the full tracker-schema stretch (TET + TEA #188, Program + PTEA + OU #189, ProgramStage + PSDE #194), and the category-dimension stack (Category #205, CategoryCombo + read-only CategoryOptionCombo #208, the one-pass CategoryComboBuilder helper #209) have all landed on top of it. No metadata-authoring gaps remain on the main workflow paths.

Optional ProgramStageSection grouping (rarely used in practice) is still unauthored; reach for metadata patch for it. That's the only known absence and it stays parked unless a concrete caller surfaces.

OIDC / OAuth2 polish

  • Token refresh is tested in code but undocumented for end users.
  • Local OIDC login-page button is non-functional for browser clicks (CLI-only redirect_url); no per-provider "hide from login UI" flag in DHIS2 v42 — documented in docs/architecture/auth.md.
  • Bearer-to-JSESSIONID path for browser workflows on OIDC profiles is unverified (flagged in authenticated_session docstring).

Near-term plan (next 3–5 PRs)

Latest cycle closed the category-dimension strategic option (Category #205, CategoryCombo + read-only CategoryOptionCombo #208, the one-pass CategoryComboBuilder create-or-reuse helper #209) plus the smaller metadata merge-bundle verb (#206). With every authoring path on the main workflow now covered, the codegen emitters fully regen-stable, and bulk verbs (rename / retag / share) shipped on top of patch_bulk / apply_sharing_bulk, the obvious tactical sweep is complete.

The near-term slate is once again open. One item has been carried over without action across the last few cycles: a multi-version CI integration matrix (pure YAML / docker-compose, no Python). Deferred until it becomes the highest-leverage thing to do. The *Spec-class audit also previously sat here; it is now resolved (keep VisualizationSpec / MapSpec + MapLayerSpec / LegendSetSpec + LegendSpec; the rule for when a spec is justified is documented on api/legend-sets.md).

The natural next direction is one of:

  • Pick one of the two remaining strategic options below and commit to a multi-PR body of work (data approval workflow, or audit log reader).
  • Promote a medium-term tactical item (CLI startup latency, property-based DSL tests) for a focused 1-PR cycle.

Carried over:

  1. Multi-version CI integration matrix — slow-marked tests run against v42 only; committed codegen for v40 / v41 / v43 / v44 is never exercised against a live stack. Stand up compose-managed stacks in the nightly workflow (e2e.yml) and run the slow suite against each. Catches codegen drift before it reaches users. Pure YAML + docker-compose work; no Python changes.

Demoted / parked:

  • apps snapshot example + CI hook — the feature works, just the restore --dry-run demo still isn't in examples/cli/apps.sh. Low value without an active need.
  • ProgramStageSection grouping — rarely used in practice; metadata patch covers the occasional need. Promote if a concrete caller surfaces.

BUGS.md #15 (undiscriminated JobConfiguration.jobParameters + WebMessage.response unions) stays off the near-term list: the sibling-field discriminator pattern doesn't fit the AuthScheme-style spec-patches approach, and the scheduler plugin isn't an active workflow. Revisit when someone hits a real-world need.

Strategic options (pick one before the next cycle)

Two independent directions — the right order depends on where the pain is. Each would be a multi-PR body of work.

1. Data approval workflow plugin

/api/dataApprovals + /api/dataApprovalLevels + /api/dataApprovalWorkflows cover multi-level aggregate approval (district → zone → ministry sign-off). Common in humanitarian + government reporting pipelines. Surface:

  • dhis2 dataapproval status <ds> <pe> <ou> — which level is this cell at? APPROVED_HERE / APPROVED_ABOVE / UNAPPROVED_READY / UNAPPROVED_WAITING.
  • dhis2 dataapproval approve / unapprove / accept / unaccept — the four write verbs.
  • dhis2 dataapproval bulk-status <ds> <pe> — every org unit for one dataset-period, exit-on-incomplete mode for CI.
  • Typed DataApprovalStatus enum + level-aware state machine.

2. Audit log reader

DHIS2's /api/audits/* endpoints track every write by user / timestamp / entity-uid (for DE values, tracker payloads, metadata changes). No wrapper today; integrations that need a "who changed X and when" history have to hand-build URLs.

  • dhis2 audit data-values --de <uid> [--ou <uid>] [--pe <pe>] — stream every change for a cell.
  • dhis2 audit metadata --klass DataElement --uid <uid> — metadata edit history for one resource.
  • dhis2 audit tracker-entity <uid> — tracker write audit.
  • client.audit.iter_data_values(...) / iter_metadata(...) async iterators for library callers.

Niche but valuable for compliance + forensics use cases.

Medium-term

  • Multi-version CI matrix — integration tests run against v42 only. Stand up v40 / v41 / v43 / v44 nightly jobs against compose-managed stacks so codegen drift gets caught before release.
  • CLI startup latencydhis2 --help takes ~2s to render, which is noticeable on every shell invocation. An explicit attempt in a perf/lazy-oas-init branch proved where the cost sits + why a clean fix is harder than it looked: python -X importtime shows ~900 ms goes into pydantic model_rebuild across the 562 OAS-emitted classes, triggered by 16 callers in dhis2w-client/*.py doing from dhis2w_client.generated.v42.oas import X at module top. A lazy PEP 562 __getattr__ on oas/__init__.py is correct in isolation but every caller that imports a class by the from oas import X form still triggers the full load. A templates-only change saves ~80 ms; converting every caller to from oas.<submodule> import X saves ~1 s but breaks FastMCP's list[Model] tool-return serialisation (MockValSer instead of SchemaSerializer — pydantic's forward-ref resolver can't find siblings when only one submodule has been loaded, and skipping the eager model_rebuild() loop leaves the schemas deferred). Branch was closed rather than merged. A proper fix needs: (a) per-class on-demand model_rebuild() when the class is first touched for validate/serialize, (b) a namespace provider pydantic can consult lazily (maybe via _types_namespace + a dict proxy that imports submodules on-demand), or © accept that OAS load is unavoidable on any real command and focus on lazy plugin discovery + the Typer app-construction cost instead. Target: dhis2 --help under 400 ms.
  • Property-based testing on filter / order DSL parsing.

Long-term / exploratory

  • Further dhis2w-browser workflows, layered on authenticated_session: Maintenance app driving (actions that don't have REST), Org-unit-tree drag-drop edits. Dashboard creation is covered by the REST DashboardsAccessor.add_item; layout drag-drop is UI-only but deferred until a concrete need appears.
  • Scheduled jobs plugin (/api/jobConfigurations) — blocked on BUGS.md #15 (undiscriminated jobParameters + WebMessage.response unions). Revisit when the OAS discriminator is fixed upstream, or when a concrete scheduling workflow forces us to hand-roll typed payloads for the common job types.
  • Interactive aggregate-data-entry TUIdhis2 data entry <ds> <pe> <ou> launches a terminal spreadsheet bound to one data set × period × org unit. Questionary or textual for the UI; posts via client.data_values.stream on save. Powerful offline-capable data-entry fallback when the UI is down.

Testing roadmap

The unique shape of this project — we generate code from a moving REST API, then hand-write CLI / MCP / auth layers on top — dictates the testing surface. Bugs slip in at five layers, each best caught with a different tool.

Layered overview

Layer What can break Today Strongest tool
Static Type errors, unused imports, dead code ruff + mypy + pyright (good) + add deptry for unused / missing deps
Unit Pure logic, parsers, builders 872 tests, respx-mocked HTTP (good) + property-based + mutation
Codegen Generator emits wrong code None (relies on humans reviewing the diff) Golden snapshots of the generated tree
Schema contract Generated code stops matching live API None Wire-vs-model + manifest drift
Live integration End-to-end against real DHIS2 v42 only, slow tests Multi-version matrix + per-PR read-only
Examples Documented usage drifts from reality verify_examples exists but v42-only Snapshot outputs + parallel both-version
Upstream bugs Workaround breaks; fix lands and we don't notice Manual BUGS.md retest @pytest.mark.upstream_bug(<id>) lifecycle

Tier A — high leverage, ~1 PR each

A1. Schema contract tests against the live play instances (per-PR, read-only). Pick ~20 representative resources (DataElement, OrganisationUnit, DataSet, Program, ProgramStage, Indicator, User, …). For each: fetch one real instance from play.im.dhis2.org/dev-2-{42,43}, run it through the generated pydantic model, assert it validates without extra (or with only known-safe extras). The single highest-value addition — catches DHIS2 ship-day API changes before users do, and adds ~5 s to CI.

A2. BUGS.md regression-suite scaffolding. A custom pytest marker @pytest.mark.upstream_bug("BUGS.md#7", state="present"). Each entry gets two paired tests:

  • Workaround test — passes when our shim does its job. Breaks if we accidentally remove the workaround.
  • Bug-still-present test — currently passes (asserts the bug is observable). When DHIS2 fixes it, this fails — the signal to delete the workaround.

Couple this with a small dhis2-utils bug-status reporter so the next BUGS retest is just pytest -m upstream_bug --tb=line. Replaces the manual curl spreadsheets.

A3. Multi-version CI matrix (carried-over from the near-term list). .github/workflows/e2e.yml matrix on dhis2_version: [42, 43]. Blocked today only by the missing v43 e2e dump. Build that once (make refresh-and-verify DHIS2_VERSION=43, ~45 min one-time) and the matrix unlocks itself.

A4. Property-based tests for the parser-shaped code paths. Hypothesis is overkill for happy-path business logic but devastatingly effective for parsers. Targets:

  • generate_uid — distribution properties (no character bias, all 11 chars, 62-symbol alphabet).
  • Period parsing (LAST_3_MONTHS, 202403, 2024Q1, 2024S2, 2024W12, …).
  • Filter DSL (name:ilike:foo, code:in:[a,b], nested attributeValues.attribute.id:eq:UID).
  • JSON Patch RFC 6902 round-trip — apply then invert; the composition should be a no-op.
  • URL construction — no double-slashes, correct encoding, .json suffix on /api/analytics/* (BUGS.md #1).

One PR per parser, ~50 lines of hypothesis strategies + 5 properties each.

A5. Generated-code golden snapshots. Today, codegen drift produces a giant diff that humans review by eye. Add tests/codegen/test_snapshots.py:

  1. Load committed schemas_manifest.json.
  2. Run emit() into a tmp dir.
  3. Diff against committed generated/v{N}/.
  4. Fail on any diff.

Combined with the existing dhis2 dev codegen diff CLI: codegen changes that drift the output must be deliberate (and update the snapshot). Catches the "innocent emit.py refactor that silently changes 200 files" class of bug.

Tier B — medium leverage, ~2-3 PRs

B1. Mutation testing nightly. mutmut or cosmic-ray against packages/dhis2w-client/src/ and packages/dhis2w-core/src/plugins/*/service.py. Surface mutations that survive — each survivor is either a missing test or dead code. Run weekly (it's slow); fail when survivor count goes up vs baseline.

B2. Per-package coverage gates. make coverage is workspace-wide at 70 %. That hides the case where dhis2w-client is at 95 % and a peripheral plugin is at 30 %. Split into per-package thresholds; show a coverage diff in PR comments via codecov / coveralls / a simple gh-action. Pin dhis2w-client higher than the rest since it's the public-API surface.

B3. Tracker write end-to-end test suite. Tracker is the most error-prone area (envelope shapes, atomic / non-atomic modes, importStrategy semantics, soft-delete behaviour). An integration suite that creates a tracked entity with enrollment + events, updates each via PATCH, deletes them, verifies cleanup. Run nightly against both v42 and v43 to catch tracker-specific drift between versions.

B4. MCP tool catalogue contract test. Walk every tool registered by FastMCP, assert:

  • Tool input schema is valid JSON Schema.
  • Docstring is non-empty.
  • Tool name follows <plugin>_<resource>_<verb> convention.
  • Return-type annotation is a BaseModel.

Stops the MCP surface from quietly degrading (missing docstrings, untyped returns).

B5. Live-instance smoke tests against play, parallel matrix. Beyond contract tests (A1) — actual dhis2 system whoami, dhis2 metadata list dataElements --limit 5, etc., run against play.im.dhis2.org/dev-2-{42,43} in parallel. Catches "we shipped a release that actually works against real DHIS2."

Tier C — exotic / specialty

C1. Snapshot example stdout. make verify-examples reports PASS / FAIL but doesn't pin output. Add --snapshot mode that records stdout into examples/.snapshots/. CI fails when output drifts unexpectedly. Catches "still passes" examples that produce subtly different / wrong output.

C2. Schema drift watcher (weekly cron). Cron job that runs dhis2 dev codegen diff against the live play instances. If the committed manifest no longer matches what live reports, post an issue. The "DHIS2 just shipped 2.43.2" early-warning system.

C3. Performance benchmarks + regression detection. pytest-benchmark for:

  • CLI startup time (already on roadmap, ~2 s today, target < 400 ms).
  • MCP list_tools latency.
  • Generated-code import time (the 562 OAS classes pydantic-rebuilds).
  • Bulk fetch (1 k metadata items).

Store baselines in CI; fail PRs that regress > 20 %.

C4. Hypothesis-driven fuzzer for the OAS generator. Generate adversarial OpenAPI specs (deeply nested oneOf, missing discriminators, recursive refs). Run oas_emit against them; assert it doesn't crash; collect cases where it does. One-time investment that finds latent oas_emit.py bugs.

C5. Browser / UI tests. Playwright is a runtime dep (for screenshot capture, OIDC login automation), not a test surface. The screenshot output IS the test today — compare PNG to a golden — and that's enough.

What we're explicitly skipping

  • Load testing. Not a server; the bottleneck is always the upstream DHIS2 instance, not our client. Premature.
  • Contract testing via Pact / Schemathesis. The OpenAPI spec is too unreliable (BUGS.md #14, #15, #28 are spec-quality issues). Our own contract tests against live instances pay better.
  • Hypothesis-jsonschema for the OAS models. Tempting, but the extra="allow" shapes spin Hypothesis on impossible negative cases.
  • Mutation testing on generated code. Mechanically derived; mutations there don't tell us anything we can fix.

The first three unblock everything else; A4 / A5 chase quickly behind:

  1. A3 — multi-version CI matrix (after a one-time v43 e2e dump build). Unblocks v43 in CI, makes B3 / B5 actually possible.
  2. A1 — live-schema contract tests against play, per-PR. Cheapest highest-leverage thing in this list.
  3. A2 — BUGS.md regression suite scaffolding. Stops the manual BUGS retest cycles.
  4. A4 + A5 — property-based + codegen snapshots. Independent; either order.

Tier B and C defer until A1–A5 are paying off.

Reference: dhis2-java-client

Apache-2.0 Java client maintained by the DHIS2 org (dhis2/dhis2-java-client). Targeted comparison against this workspace as of this writing:

Already covered here

  • Typed /api/sharingSharing, SharingBuilder, ACCESS_* constants, apply_sharing / get_sharing helpers. Full parity with the Java client's sharing builder.
  • User administration — dhis2 user list / get / me / invite / reinvite / reset-password. User-group + user-role plugins covering membership + authority-bundle flows.
  • Branding / theming — dhis2 dev customize logo-front/banner/style/set/apply/show + Dhis2Client.customize accessor. No equivalent in the Java client.
  • Auth providers (Basic, PAT, OAuth2); ours is async-first with a typed AuthProvider Protocol.
  • Generated resource CRUD across v40/v41/v42/v43/v44 (Java is hand-maintained).
  • WebMessageResponse envelope parsing; .import_count(), .conflicts(), .rejected_indexes(), .task_ref(), .created_uid().
  • Full metadata query surface; repeatable --filter, --order, rootJunction=AND|OR, --page/--page-size, --all, --translate/--locale, every fields selector form.
  • Metadata bundle export / import / diff + RFC 6902 patch with per-resource filters + dangling-reference warning on export.
  • Paging; list_raw(..., paging=True) returns the pager; list(..., paging=False) walks the full catalog.
  • Typed filter values on enum fields; ValueType.NUMBER is a StrEnum, substitutable into filter strings directly.
  • Client-side UID generation; matches the Java CodeGenerator algorithm exactly.
  • Typed tracker writes; TrackerBundle + TrackerTrackedEntity / TrackerEnrollment / TrackerEvent models for POST /api/tracker.
  • Event + enrollment analytics; outlier detection + tracked-entity analytics.

Considered, not adopted

  • Fluent query builder (.addFilter(Filter.eq("name", "Penta"))): the Java client wraps DHIS2's property:operator:value string syntax in a chainable builder. Deliberately skipped — Python f-strings make f"name:like:{name}" already readable; the builder doesn't buy type safety on the stringly-typed value side; DHIS2's own docs teach the string form.

Worth evaluating later (Java parity)

  • Domain-specific response types beyond WebMessageResponse: Java has distinct PagedResponse, Stats, Response for different endpoint shapes. We collapse into WebMessageResponse + helpers. The OAS codegen already emits the specific shapes (TrackerImportReport, ImportReport, etc.) — swap on-demand when a specific call site hits friction.

Beyond Java parity (already shipped)

Items that don't exist in the Java client and now exist here:

  • Retry / backoffRetryPolicy on Dhis2Client + open_client with exponential backoff, jitter, Retry-After honoured, idempotent-only by default.
  • Library-level task awaiterclient.tasks.await_completion(task_ref, ...) + client.tasks.iter_notifications(...).
  • Connection-pool tuninghttp_limits kwarg on Dhis2Client and open_client.
  • Typed codegen across five DHIS2 versions via schema-driven emission; Java is hand-maintained.
  • OAS spec-patches framework — synthesises the Jackson discriminators DHIS2's OpenAPI generator omits (Route.auth et al.).
  • Data-integrity streaming iteratorclient.maintenance.iter_integrity_issues() yields a flat stream of IntegrityIssueRows tagged with owning-check metadata.
  • System metadata cache — TTL-bounded in-memory cache on client.system for info() / default_category_combo_uid() / setting(key).
  • Bulk metadata deleteclient.metadata.delete_bulk(resource_type, uids) + delete_bulk_multi({...}).
  • Cross-resource metadata searchclient.metadata.search(query) returns typed SearchResults grouped by resource; handles UID / partial UID / code / name in one verb.
  • Typed Visualization + Map + Dashboard buildersVisualizationSpec, MapSpec + MapLayerSpec, DashboardSlot. Chart-type-aware dimension placement, typed data dimensions (DEs + indicators), RelativePeriod enum for rolling windows, legend-set support. CLI + MCP surfaces on top.
  • Per-viz + per-dashboard PNG capturedhis2 browser viz screenshot + dhis2 browser dashboard screenshot + dhis2 browser map screenshot, Chromium-driven via Playwright session helpers.
  • Typed bulk-save on every generated resourceclient.resources.<resource>.save_bulk(items). Supports import_strategy + atomic_mode + dry_run.
  • client.metadata.dry_run(by_resource) — cross-resource importMode=VALIDATE entry point.
  • Streaming analytics exportclient.analytics.stream_to(destination, *, params, endpoint="/api/analytics.json").
  • Messaging plugindhis2 messaging {list,get,send,reply,mark-read,mark-unread,delete} + messaging_* MCP tools + client.messaging accessor.
  • Validation + predictors workflowdhis2 maintenance validation {run,result,validate-expression,send-notifications} + dhis2 maintenance predictors run.
  • Streaming dataValueSets importclient.data_values.stream(source, content_type=...).
  • Multi-instance metadata diffdhis2 metadata diff-profiles exports two profiles concurrently + diffs them.
  • Files plugin — CLI + MCP + client.files accessor over /api/documents + /api/fileResources.
  • SQL views runnerclient.sql_views + dhis2 metadata sql-view {list, get, execute, refresh, adhoc}.
  • Tracker authoring workflowsdhis2 tracker {register, enroll, add-event, outstanding} verbs + the matching client.tracker helpers for operator flows beyond generic CRUD.
  • Rich conflict rendererdhis2 metadata import / dhis2 data aggregate import render /api/metadata and /api/dataValueSets error envelopes as a normalised ConflictRow table (object UID → offending property → server message).
  • Apps plugindhis2 apps {list, add, remove, update, update --all, reload, snapshot, restore, hub-list, hub-url} + apps_* MCP tools + client.apps accessor over /api/apps and /api/appHub. update --all --dry-run previews available hub updates before installing; bundled core apps update in place. hub-list --search filters the catalog client-side. hub-url read/writes the keyAppHubUrl system setting so self-hosted hubs can be wired via CLI. snapshot --output pins an instance's app inventory to a portable JSON manifest; restore <manifest> reinstalls every hub-backed entry via install_from_hub, with a --dry-run preview that mirrors update --all --dry-run.
  • Metadata cross-instance mergedhis2 metadata merge <source-profile> <target-profile> --resource ... [--dry-run] orchestrates export+import in one pass, returning typed per-resource export counts plus the target import's WebMessageResponse. Pairs with diff-profiles (same resource+filter shape): diff to preview, merge to apply. Sharing blocks are stripped by default to avoid false-positive conflicts from per-instance user/group UIDs. Conflicts on the dry-run and applied paths render through the shared ConflictRow Rich table used by metadata import (#177), so preview output is immediately actionable without reaching for --json | jq.
  • Canonical X / XGroup / XGroupSet authoring triples — sub-apps under dhis2 metadata, one client accessor per resource, following a single canonical-naming rule (lowercase + hyphenate the DHIS2 resource path). Shipped for organisation-units (#174), data-elements (#175), indicators (#176), and category-options (#181), plus the program-indicators pair (#180 — DHIS2 has no programIndicatorGroupSet). Each PR adds 15–19 MCP tools, full CLI verbs (list / get / create / rename-like / per-item membership), and hand-written accessors that return typed generated models. No *Spec builders — keyword args on the accessor (continues the spec-audit data point). The indicator accessor exposes validate_expression(context="indicator"), the program-indicator accessor validate_expression(context="program-indicator"), so callers can pre-flight numerator / denominator / expression references before a failed create. category-options additionally ships set_validity_window(uid, start_date, end_date) for the validity-window knob unique to that resource.
  • Aggregate data-set authoringdhis2 metadata data-sets + sections sub-apps (#185). DataSetElement + Section.dataElements[] are handled as join tables with round-trip helpers: add_element(ds_uid, de_uid, category_combo_uid=...) carries the per-set CC override; sections.reorder(section_uid, [de_uids]) replaces the ordered DE list in one PUT. Docstring calls out the DSE self-ref strip for DHIS2's read/write asymmetry.
  • Validation-rule + predictor CRUDdhis2 metadata validation-rules + predictors + their groups (#186). Closes the author-then-run gap — dhis2 maintenance validation run / predictors run shipped long ago, but rules + predictors themselves couldn't be authored from CLI. Surface assembles leftSide / rightSide / generator Expression sub-objects from plain kwargs.
  • Bulk RFC 6902 patchclient.metadata.patch_bulk(resource, [(uid, ops), ...], concurrency=8) + patch_bulk_multi(...) (#187). Client-side fan-out under a semaphore; per-UID failures land in BulkPatchResult.failures (with uid / resource / status_code / message) instead of raising. Building block for future CLI-level bulk verbs.
  • Bulk sharingclient.metadata.apply_sharing_bulk(resource_type, uids, sharing) + apply_sharing_bulk_multi(by_resource, sharing) fan out one SharingBuilder payload across many UIDs under a concurrency semaphore. CLI surface as dhis2 metadata share <type> [UID...] with --public-access / --user-access UID:access / --user-group-access UID:access (repeatable) + stdin UID input via - so metadata list ... \| jq -r .id \| xargs metadata share composes. Per-UID failures land in BulkSharingResult.failures with the same row-level table renderer used by rename / retag.
  • Category dimension authoring (complete end-to-end)dhis2 metadata categories (#205) + category-combos + read-only category-option-combos (#208) + the one-pass CategoryComboBuilder helper (#209). Categories accept ordered --option UID flags on create + per-item add-option / remove-option shortcuts. CategoryCombos accept ordered --category UID flags + a wait-for-cocs --expected N matrix-poll barrier handling DHIS2's async COC regeneration (cold-start can take tens of seconds, especially under arm64 emulation). The category-combos build --spec FILE verb walks a declarative CategoryComboBuildSpec (JSON or stdin) and ensures every CategoryOption -> Category -> CategoryCombo exists; idempotent, returning a typed CategoryComboBuildResult with per-layer created-vs-reused breakdown.
  • Bundle-source metadata mergedhis2 metadata merge-bundle <target> <bundle.json> (#206) imports a saved JSON bundle into a target profile. Sibling to the source-profile merge verb; same --strategy / --atomic / --include-sharing / --dry-run knobs. Useful when the bundle came from a saved metadata export, was hand-crafted, or was produced by a non-DHIS2 tool. MergeResult.source_base_url is bundle:<path> for traceability.
  • Tracker-schema authoring (complete end-to-end)dhis2 metadata tracked-entity-attributes + tracked-entity-types (#188) covers the leaf resources; tracked-entity-types add-attribute --mandatory --searchable round-trips the TETA join table. dhis2 metadata programs {list, get, create, rename, add-attribute, remove-attribute, add-to-ou, remove-from-ou, delete} (#189) covers the middle layer — WITH_REGISTRATION / WITHOUT_REGISTRATION program flavours, PTEA enrollment form linkage, per-item OU shortcuts. dhis2 metadata program-stages {list, get, create, rename, add-element, remove-element, reorder, delete} (#194) covers the inner layer — each stage's ordered programStageDataElements[] join table with compulsory / displayInReports / allowFutureDate / allowProvidedElsewhere flags. Documents the DHIS2 mergeMode=REPLACE requirement on Program + ProgramStage PUT (nested-list removal is additive without it) as a typed client-side workaround.
  • Codegen + base-client gap closure (#190–#192, #197). Generated create(item, *, merge_mode, import_strategy, skip_sharing, skip_translation) + update(item, ...) forward the write-flag query params. Every generated resource exposes add_collection_item(parent_uid, collection, item_uid) / remove_collection_item(...) for per-item POST/DELETE shortcuts. Base Dhis2Client ships typed post(path, body, model=T) + put(path, body, model=T) wrappers (parallels the existing typed get). Hand-written accessor sweep across 28+24+10 files replaced _put_with_replace / per-item loops / duplicated _uid_from_webmessage helpers / single-object get_raw + model_validate / paged list_all via resources.X.list(...) with the new surface. ~700 lines of duplication removed with no behavior change.
  • dhis2 metadata rename + metadata retag verbs — bulk CLI verbs on top of client.metadata.patch_bulk (#195, #199, #200). rename handles label-field add / strip prefix + suffix (idempotent both directions — won't double-apply, won't no-op-fail). retag handles ref-field rewrites (categoryCombo, optionSet, legendSets) + enum field rewrites (aggregationType, domainType). Both take --filter (repeatable, same DSL as metadata list) + --dry-run + --concurrency. Per-UID failures land in the shared ConflictRow renderer used by metadata import, so operators see row-level detail on partial failures.
  • CI coverage gate + failure threshold (#196, #202). make coverage replaces make test in the CI test step; every run uploads coverage.xml as an artifact retained 14 days, and fails the build if coverage drops under 70%. Current baseline 73% (85k statements / 7.5k branches).
  • Playwright-driven OIDC logindhis2 profile login --no-browser prints the auth URL for copy-paste; dhis2w_browser.drive_oauth2_login(profile, user, pw) drives the full flow via Chromium (React login → Spring AS consent → loopback redirect) for CI + headless use cases. examples/cli/profile_oidc_login.sh + examples/client/oidc_login.py auto-dispatch to the Playwright path when DHIS2_USERNAME / DHIS2_PASSWORD are in env.
  • Predictor + validation seed fixtures — the Sierra Leone play42 snapshot now ships 2 BCG predictors (avg + sum over 3-month windows) + a PredictorGroup + 2 output DEs, plus 2 BCG validation rules + a ValidationRuleGroup that reliably produce violations. dhis2 maintenance predictors run --group and dhis2 maintenance validation run --group have concrete targets out of the box.
  • Interactive CLI pickersdhis2 profile default launches an arrow-key menu via questionary.

Beyond Java parity (not yet)

(Empty — major Java-parity gaps are closed.)

Explicit non-goals

  • Python < 3.13. New typing features (StrEnum, TypeAliasType, PEP 604 unions, PEP 695 generics) justify the bump.
  • DHIS2 < v42. Every backport fork splits the code; no deployed users so no reason.
  • Flask / argparse / raw stdio MCP loops / hand-rolled TOML parsers; every slot has a chosen standard per the CLAUDE.md hard-requirements list in the repo root.
  • A second filter DSL layered on top of DHIS2's property:operator:value string syntax. See the dhis2-java-client comparison above for the rationale.
  • Synchronous client variant. async throughout is a hard requirement.
  • dict[str, Any] crossing module boundaries. CLAUDE.md hard rule; enforced workspace-wide as of the typing sweep (#71-#74, #76). New code that proposes dict-in-signature needs explicit justification referencing a specific HTTP-boundary carveout.
  • Public distribution — no PyPI, no tagged releases, no CHANGELOG.md. Workspace stays internal.
  • dhis2 program-rule trace / rule simulator — explicitly declined.

How this file gets updated

Greenfield voice; edits describe the current state of the plan, not its history. When a near-term item ships, delete it from the "near-term" list (don't rewrite to "already shipped"). Use the PR's own description for the history; this file is always about what's next.