Bulk Data Export ($export)¶

Overview¶

The FHIR server implements the FHIR Bulk Data Access IG for asynchronous export of large datasets in NDJSON format.

FHIR Specification¶

Export Types¶

System Export¶

Export all resources from the server.

GET /$export
Accept: application/fhir+ndjson
Prefer: respond-async

Patient Export¶

Export Patient resources and all related clinical data.

GET /Patient/$export
Accept: application/fhir+ndjson
Prefer: respond-async

Group Export¶

Export data for all patients in a specified Group.

GET /Group/{id}/$export
Accept: application/fhir+ndjson
Prefer: respond-async

Required Headers¶

Header	Value	Description
`Accept`	`application/fhir+ndjson`	Request NDJSON format
`Prefer`	`respond-async`	Required for async processing

Parameters¶

Parameter	Type	Description
`_type`	string	Comma-separated list of resource types to export
`_since`	instant	Only export resources updated after this datetime

Workflow¶

1. Initiate Export¶

curl -X GET "http://localhost:8080/baseR4/\$export" \
  -H "Accept: application/fhir+ndjson" \
  -H "Prefer: respond-async"

Response (202 Accepted):

Content-Location: http://localhost:8080/baseR4/bulk-status/abc123-uuid

2. Poll for Status¶

curl http://localhost:8080/baseR4/bulk-status/abc123-uuid

In Progress (202 Accepted):

X-Progress: 50%
Retry-After: 1

Complete (200 OK):

{
  "transactionTime": "2024-12-17T10:00:00Z",
  "request": "http://localhost:8080/baseR4/$export",
  "requiresAccessToken": false,
  "output": [
    {
      "type": "Patient",
      "url": "http://localhost:8080/baseR4/bulk-output/abc123-uuid/Patient.ndjson",
      "count": 100
    },
    {
      "type": "Observation",
      "url": "http://localhost:8080/baseR4/bulk-output/abc123-uuid/Observation.ndjson",
      "count": 500
    }
  ],
  "error": []
}

3. Download Output Files¶

curl http://localhost:8080/baseR4/bulk-output/abc123-uuid/Patient.ndjson \
  -o patients.ndjson

Returns NDJSON (one JSON object per line):

{"resourceType":"Patient","id":"p1","name":[{"family":"Smith"}]}
{"resourceType":"Patient","id":"p2","name":[{"family":"Doe"}]}
{"resourceType":"Patient","id":"p3","name":[{"family":"Johnson"}]}

4. Delete Job (Optional)¶

curl -X DELETE http://localhost:8080/baseR4/bulk-status/abc123-uuid

Returns 204 No Content on success.

Examples¶

Export All Patients¶

curl -X GET "http://localhost:8080/baseR4/Patient/\$export" \
  -H "Prefer: respond-async"

Export Specific Resource Types¶

curl -X GET "http://localhost:8080/baseR4/\$export?_type=Patient,Condition,Observation" \
  -H "Prefer: respond-async"

Export Resources Updated Since Date¶

curl -X GET "http://localhost:8080/baseR4/\$export?_since=2024-12-01T00:00:00Z" \
  -H "Prefer: respond-async"

Export Group Members¶

curl -X GET "http://localhost:8080/baseR4/Group/diabetes-cohort/\$export" \
  -H "Prefer: respond-async"

Full Export Workflow Script¶

#!/bin/bash
BASE_URL="http://localhost:8080/baseR4"

# Start export
RESPONSE=$(curl -sI -X GET "$BASE_URL/\$export?_type=Patient,Observation" \
  -H "Prefer: respond-async")

# Extract job URL
STATUS_URL=$(echo "$RESPONSE" | grep -i "Content-Location" | cut -d' ' -f2 | tr -d '\r')
echo "Job started: $STATUS_URL"

# Poll until complete
while true; do
  STATUS=$(curl -s -o /dev/null -w "%{http_code}" "$STATUS_URL")
  if [ "$STATUS" = "200" ]; then
    echo "Export complete!"
    break
  elif [ "$STATUS" = "202" ]; then
    echo "In progress..."
    sleep 1
  else
    echo "Error: HTTP $STATUS"
    exit 1
  fi
done

# Get manifest and download files
MANIFEST=$(curl -s "$STATUS_URL")
echo "$MANIFEST" | jq -r '.output[].url' | while read URL; do
  FILENAME=$(basename "$URL")
  echo "Downloading $FILENAME..."
  curl -s "$URL" -o "$FILENAME"
done

Default Resource Types¶

Patient Export Types¶

Resource Type
Patient
Observation
Condition
Encounter
MedicationRequest
Procedure
DiagnosticReport
AllergyIntolerance
Immunization
CarePlan
Goal
ServiceRequest
DocumentReference

System Export Types¶

Includes all patient export types plus:

Resource Type
Practitioner
Organization
Medication
Measure
MeasureReport
Group

Response Status Codes¶

Status	Endpoint	Meaning
202	`$export`	Export job started
202	`bulk-status`	Job in progress
200	`bulk-status`	Job complete (returns manifest)
200	`bulk-output`	File download
400	`$export`	Missing Prefer header or invalid parameters
404	`bulk-status`	Job not found
404	`bulk-output`	Job or file not found
422	`Group/$export`	Group has no patient members
500	`bulk-status`	Job failed

NDJSON Format¶

NDJSON (Newline Delimited JSON) contains one JSON object per line:

{"resourceType":"Patient","id":"1","name":[{"family":"Smith"}]}
{"resourceType":"Patient","id":"2","name":[{"family":"Doe"}]}

Benefits: - Streamable - process line by line without loading entire file - Memory efficient for large datasets - Standard format for bulk data interchange

Group Export Filtering¶

When exporting via Group, only resources related to the patients in the group are included:

Patient resources matching group member IDs
Resources with subject or patient references to those patients

{
  "resourceType": "Group",
  "type": "person",
  "actual": true,
  "member": [
    {"entity": {"reference": "Patient/p1"}},
    {"entity": {"reference": "Patient/p2"}}
  ]
}

Error Handling¶

Missing Prefer Header¶

curl http://localhost:8080/baseR4/\$export

Response (400):

{
  "resourceType": "OperationOutcome",
  "issue": [{
    "severity": "error",
    "code": "required",
    "diagnostics": "Bulk export requires 'Prefer: respond-async' header"
  }]
}

Invalid _since Format¶

curl "http://localhost:8080/baseR4/\$export?_since=invalid" \
  -H "Prefer: respond-async"

Response (400):

{
  "resourceType": "OperationOutcome",
  "issue": [{
    "severity": "error",
    "code": "invalid",
    "diagnostics": "Invalid _since datetime: invalid"
  }]
}

Job Failed¶

curl http://localhost:8080/baseR4/bulk-status/failed-job-id

Response (500):

{
  "resourceType": "OperationOutcome",
  "issue": [{
    "severity": "error",
    "code": "exception",
    "diagnostics": "Export job failed: [error details]"
  }]
}

Python API¶

from fhirkit.server.api.bulk import (
    create_export_job,
    run_export,
    get_export_job,
    delete_export_job,
    resources_to_ndjson,
    PATIENT_EXPORT_TYPES,
    ALL_EXPORT_TYPES,
)

# Create and run export
job = create_export_job(
    resource_types=["Patient", "Observation"],
    patient_ids=["p1", "p2"],  # Optional filter
    since=datetime(2024, 1, 1),  # Optional since filter
)

await run_export(job, store)

# Check status
job = get_export_job(job.id)
if job.status == "complete":
    for resource_type, resources in job.output_files.items():
        ndjson = resources_to_ndjson(resources)
        print(f"{resource_type}: {len(resources)} resources")

# Cleanup
delete_export_job(job.id)

Notes¶

Export jobs are stored in memory and cleared on server restart
The server processes exports synchronously (fast for in-memory store)
For large exports, consider using pagination with _since to export incrementally
Binary resources are included as base64-encoded NDJSON entries