d2ql — the query + transform language¶
d2ql is a pipeline language for querying and reshaping DHIS2 data. A program reads as a source feeding a chain of stages, optionally ending in a sink:
dataElements
| where domainType = "AGGREGATE" and name ~ "ANC"
| select id, name, categoryCombo.name as combo
| transform { code: id, label: name }
| order name asc
| limit 20
>> "elements.csv"
Expressions inside where, select, order, and transform are written in
d2path, the embedded path/expression language.
This page is the reference. New to d2ql? Start with the tutorial, then come back here to look things up; the cookbook has ready-to-run recipes.
Running a program¶
d2w query eval 'dataElements | where domainType = "AGGREGATE" | select id, name | limit 20'
d2w query run report.d2ql # run a program from a file
d2w query explain 'dataElements | ...' # show what is pushed to DHIS2 vs. run locally
d2w query ast 'dataElements | ...' # print the parsed AST (offline)
The same engine is available as MCP tools (query_eval, query_explain, query_d2path).
A library of runnable, commented programs lives in
examples/d2ql/ — run any
with d2w query run examples/d2ql/<name>.d2ql, inspect with d2w query ast "$(cat <file>)", or load
from Python via parse(open(<file>).read()).
Sources¶
- A resource name — any DHIS2 metadata resource:
dataElements,indicators,organisationUnits, … An inline filter is shorthand for a leadingwhere:dataElements[domainType = "AGGREGATE"]. read("path.json")— read rows from a local JSON or NDJSON file (FHIR bundles, fixtures, the output of an earlier query).- A definition — reference a named query as a source (see Definitions).
- A scalar expression —
define Total: 1 + 2(used by definitions, not usually run directly). - A call source for aggregate data:
analytics(dx: "...", pe: "LAST_12_MONTHS", ou: "...")— rows from/api/analytics, one dict per row keyed by dimension (dx,pe,ou,value, ...). An optionalfilter: "..."arg maps to an analytics filter.dataValues(dataSet: "...", period: "...", orgUnit: "...")— raw aggregate values from/api/dataValueSets(navigatedataElement,period,orgUnit,value).
Stages¶
| Stage | Purpose |
|---|---|
where <predicate> |
Keep rows where the d2path predicate is true. |
select <expr> [as name], … |
Project columns; name with as or let the path name it. |
transform { key: <expr>, … } |
Build a new object per row (native reshaping — see below). |
order <expr> [asc\|desc], … |
Sort by one or more keys. |
limit <n> / skip <n> |
Take / drop rows. |
count |
Replace the stream with its length (a scalar result). |
group by <expr> { name: agg, … } |
Group rows by a key and reduce each group. |
fold { … } |
Collapse the whole stream into one object (FHIR Bundle, GeoJSON FeatureCollection). |
group by¶
group by <group> { total: sum(value), n: count() } groups rows by the group expression and
emits one object per group: the group key (named like a select column) plus each aggregation.
Aggregation expressions are evaluated against the group's rows, so sum(value) gathers value
across the group. Works over any source — metadata, analytics, or data values:
analytics(dx: "fbfJHSPpUQD;cYeuwXTCPkU", pe: "LAST_12_MONTHS", ou: "ImspTQPwCqd")
| where value > 1000
| group by dx { total: sum(value), periods: count() }
| order total desc
fold¶
group by/count reduce per group; fold { … } reduces the whole stream into one object —
an envelope like a FHIR Bundle or a GeoJSON FeatureCollection. The template is built once with
the entire stream in focus: $rows is the rows as a list, and select(...) / aggregate functions
see all rows. Pair it with define functions to keep the per-item shape readable:
define function observation(de): { resourceType: "Observation", status: "final",
code: { coding: [ { system: "dhis2", code: $de.id, display: $de.name } ] } }
dataElements | where domainType = "AGGREGATE"
| transform { resource: observation($this) }
| fold { resourceType: "Bundle", type: "collection", entry: $rows }
fold (like count) yields a single value, so --json / a .json sink emit the object itself,
not a one-element array.
transform¶
transform builds a new value per row — an object literal { … } or any expression that
evaluates to one, e.g. a define function call transform feature($this). Nested objects, arrays,
and computed values are all allowed. It depends on nothing FHIR-specific, but it is exactly what you
use to emit FHIR-shaped output:
dataElements
| where domainType = "AGGREGATE"
| transform {
resourceType: "Observation",
status: "final",
code: { coding: [ { system: "dhis2", code: id, display: name } ] }
}
Sinks¶
End a pipeline with >> to write the result instead of returning it. The format is inferred from
the file extension:
dataElements | select id, name >> "elements.csv" # csv
dataElements | transform { … } >> "out.json" # json
dataElements | select id, name >> "elements.ndjson" # ndjson
On the CLI, --out FILE is the equivalent of an in-program sink.
Definitions¶
A program may begin with defines, making a .d2ql file a reusable library. Reference a scalar
definition or a function parameter with the $ sigil.
define MinLevel: 3
define function isAnc(de): $de.name ~ "ANC"
define Aggregates: dataElements | where domainType = "AGGREGATE"
Aggregates
| where isAnc($this) and level >= $MinLevel
| select id, name
define NAME: <pipeline>— a named query; reference it as a source.define NAME: <expression>— a scalar value; reference it as$NAME.define function NAME(params): <expression>— a reusable function; parameters are read as$paraminside the body.$thisis the current row insidewhere/select/transform.
Pushdown — what runs where¶
d2ql does not fetch everything and filter in memory when it does not have to. The planner pushes a
leading run of where filters, then order, then paging, down to DHIS2's list endpoint (compiled
to filter=, order=, pageSize), and runs everything it cannot express — transforms, function
predicates, cross-field expressions — locally over the rows DHIS2 returns. d2w query explain
shows the split:
$ d2w query explain 'dataElements | where domainType = "AGGREGATE" | transform { code: id }'
source: dataElements (resource)
pushed down: filter[AND] domainType:eq:AGGREGATE
order (none); skip None; limit None
local stages: transform
A predicate the server cannot express (for example where name.substring(0, 3) = "ANC") simply
stays local — the result is identical, only the work moves.
One predicate is not just local but meaningless: where field = null. A missing or null field is the
empty collection, so = null matches nothing and is never pushed. Use where field.exists() /
where field.empty() to test presence and absence — see d2path.
See also¶
- d2ql tutorial — learn the language step by step.
- d2path — the expression language used inside every stage.
- Cookbook — ready-to-run recipes (FHIR, GeoJSON, reports).
- API reference:
dhis2w_ql.