Language semantics¶
This page is the precise reference for how d2ql and d2path behave — the rules that the tutorial and reference lean on. Read it when a program does something you didn't expect.
The two layers¶
- d2ql is the pipeline layer: a source,
|-separated stages, an optional>>sink, and top-leveldefines. - d2path is the expression layer used inside stages (
where,select,transform,order,group by,fold). Every value in d2path is a collection (a list); see d2path → Collection semantics.
The | character only ever separates pipeline stages. Collection union is the union() function,
never |, so the two layers never collide on it.
Focus modes¶
A d2path expression always evaluates against a focus. Which focus is in play depends on the stage — there are three modes:
| Mode | Where | Focus is… | Variables |
|---|---|---|---|
| Row | where, select, transform, order |
one row at a time | $this (the row), $index (its position) |
| Group | the { … } of group by |
the rows in one bucket | aggregations (sum(value)) run over the bucket |
| Whole-stream | the { … } of fold |
all rows at once | $rows (the whole stream) |
So value means different things by mode: a field of the current row (row mode), the set of values
in the bucket (sum(value) in group mode), and is not meaningful bare in fold (use $rows).
Focus variables¶
| Variable | Resolves to | Valid in |
|---|---|---|
$this |
the current row (row mode) or current item (inside a collection method like .where(…)) |
row-mode stages, method bodies, and a define function body (a helper sees the caller's $this/$index) |
$index |
the current row's zero-based position | row-mode stages. In order comparators it is fixed to 0 (position is meaningless while sorting) |
$rows |
the entire row stream | fold only |
$Name |
a scalar define's value (define Min: 3 → $Min) |
anywhere; resolves scalar defines only, not query defines |
$param |
a define function's argument |
that function's body |
Sources and name resolution¶
A source is one of: a resource (a bare identifier like dataElements), a named query (a
defined pipeline used by name), a call source (analytics(…) / dataValues(…)), a file
(read("…")), or a bare expression.
Resolution priority for a bare identifier is: a bound resource first, then a define. So if
the instance exposes a dataElements resource, a define dataElements: … would be shadowed — name
your defines distinctly. An identifier that matches neither is an error.
name(...) — two call forms¶
name(...) is disambiguated by its arguments, not a reserved list:
- Named arguments (
key: value) → a call source:analytics(dx: "…", pe: "…", ou: "…"),dataValues(dataSet: "…", period: "…", orgUnit: "…"). - Positional arguments or none → a d2path function call (an expression source):
today(),iif(active, 1, 0), a userdefine function.
This is why define Today: today() is a scalar define (an expression), not a call source.
define — two kinds¶
define NAME: <body> is a query define when the body is a pipeline (define A: dataElements |
where …) and a scalar define when the body is an expression (define Min: 3, define Today:
today()). The parser decides by the body's first tokens: a bare identifier / named-arg call /
read(...) begins a pipeline source; anything that continues into an expression (a member ., an
operator, an infix keyword, or a positional call) is a scalar expression. Use a query define as a
source; reference a scalar define as $NAME.
Definition rules. Names must be unique — a duplicate define/define function name is rejected,
not silently overwritten. A define function may not reuse a built-in function name (upper, sum,
…), since the built-in would shadow it, and its parameters must be distinct (f(x, x) is rejected).
A definition or function that references itself, directly or through a cycle (define A: B /
define B: A, or define function f(x): f($x)), is rejected as recursive rather than recursing until
it crashes. A define function body sees the caller's $this/$index, so helpers can read the
current row without taking it as a parameter.
select vs transform¶
Both project each row, but differ in output shape and naming:
select e1, e2 as alias, …produces a flat object. Each column is named by itsasalias, or derived from the expression: a bare name or a member uses the last segment (categoryCombo.name→name), and anything else becomescolumn1,column2, … If two columns derive the same name the program is rejected — add an alias. Useselectfor tabular output.transform { … }produces an arbitrary object you spell out in full — nested objects, arrays, computed fields. Keys are exactly what you write. Usetransformto build foreign shapes (FHIR, GeoJSON) or anything non-tabular. A duplicate key in any object literal ({ id: id, id: name }) is rejected at parse time, so a repeated key never silently overwrites.
group by key { … } names the key column with the same derivation rule (group by
categoryCombo.name → key column name; a computed key → column1); alias by grouping on a
transformed field if you need a specific name. The key name must not collide with an aggregation
name (the program is rejected if it does).
Operators¶
Precedence, lowest to highest:
| Tier | Operators |
|---|---|
| implication | implies |
| or | or, xor |
| and | and |
| equality / match | =, !=, ~, !~, like |
| relational | <, <=, >, >=, in, contains, is |
| additive | +, - |
| multiplicative | *, /, div (integer divide), mod |
| unary | prefix -, + |
| postfix | .member, [index], ["key"], .method(...) |
Notes:
likeand~are the same operator —likeis the readable spelling, normalised to~in the AST. There is no symbolic operator fornot; use thenot()function or!=/!~.containsis both an infix operator (tags contains "x") and a method (name.contains("x")).impliesis right-associative:a implies b implies cisa implies (b implies c). Every other binary operator is left-associative.- Comparisons are existential over collections (see d2path):
items.qty > 2is true if any collectedqtyexceeds 2.
String matching¶
like / ~ is case-insensitive substring matching — not SQL LIKE (no %/_ wildcards) and
not a regular expression. For full regex use the matches(regex)
function. For prefix/suffix use startsWith / endsWith.
null, missing, and presence¶
A missing field and an explicit JSON null both evaluate to the empty collection, so field = null
matches nothing. Test presence with field.exists() and absence with field.empty() — see
d2path → Presence and absence.
[] is overloaded by position¶
- At a source,
resource[predicate]is an inline filter:dataElements[domainType = "AGGREGATE"]is shorthand fordataElements | where domainType = "AGGREGATE". - In an expression,
collection[n]is integer indexing (coding[0]) andnode["key"]is member access by key (see below).
Field names that aren't identifiers¶
Member access (a.b) accepts identifiers and reserved words only. To reach a field whose name has
hyphens, spaces, or a leading digit (e.g. a FHIR extension key), use a string subscript — [...]
is integer indexing when given a number and member-access-by-key when given a string:
extension["us-core-race"] # nested awkward key
items["bad name"].v # then keep navigating
$this["code-x"] # a top-level awkward key on the current row
Pushdown — the equivalence it does and doesn't promise¶
d2ql pushes a leading run of where filters, then order, then paging, to DHIS2's list endpoint and
runs the rest locally; d2w query explain shows the split. The operators are mapped so a pushed
filter selects the same rows as the same filter evaluated locally: =→eq, !=→ne,
~→ilike (both case-insensitive substring), </<=/>/>=→lt/le/gt/ge, in→in. An
audit against the demo server confirmed pushed and local agree for literal values across these
operators (including != over rows where the field is absent).
Three corners are handled or called out explicitly:
~with%or_: DHIS2iliketreats these as SQL wildcards, but d2path~is a literal case-insensitive substring. A~value containing%/_is therefore kept local (not pushed) so the result matches the literal meaning.explainshows it as a localwhere.- Server-side value validation: DHIS2 validates a pushed filter's value, so an invalid value
(e.g. an enum in the wrong case like
domainType = "aggregate") raises a400where local evaluation would simply return no rows. Use the exact value DHIS2 expects. - Date / datetime comparisons: when pushed, DHIS2's date semantics apply (correct). The local
fallback compares values as-is, and a date/datetime field does not compare against a bare string
literal — so a date comparison that is not pushed can under-match. Pin date handling is tracked
with date-literal support (
@2026-06-23).
explain always tells you exactly what ran where.
Libraries¶
A .d2ql file is a self-contained library of defines and a terminal pipeline; run a specific
definition with --define <name>. There is no cross-file import/export or versioning — a
"library" is one file, shared by copying or running it directly.
Every rule on this page is asserted in packages/dhis2w-ql/tests/test_semantics.py, so the spec and
the implementation cannot drift apart silently.