Skip to content

OpenFn/apollo

Repository files navigation

Apollo Server

Apollo is OpenFn's knowledge, AI and data platform, providing services to support the OpenFn toolchain.

Apollo is known as the God of (among other things) truth, prophecy and oracles.

This repo contains:

  • A bunjs-based webserver
  • A number of python-based AI services
  • A number of Typescript-based data services

Documentation

This README covers running, debugging and deploying the server. Deeper docs live alongside the code:

  • Architecture — see Server Architecture below, and each service's own README for service-specific detail.
  • Contributing & service conventionsCONTRIBUTING.md explains how to add and structure a Python service (entry.py, imports, logging, code quality).
  • Services — every service has its own README with payload specs and examples. See the Services index below.
  • Instance authplatform/src/auth/README.md covers authenticating /services/* per client and managing per-client Anthropic keys.
  • Testingservices/testing/README.md documents the shared acceptance-test harness (YAML specs + LLM-as-judge); per-service guides live in each service's tests/.

Requirements

To run this server locally, you'll need the following dependencies to be installed:

  • python 3.11 (yes, 3.11 exactly, see Python Setup)
  • poetry
  • bunjs

We recommend using asdf with the python plugin installed.

Getting Started

To run the server locally, you need to install the python dependencies

poetry install

Then start the server (note that bun install is not needed, see below)

bun start

To start a hot-reloading development server which watches your typescript, run:

bun dev

To see an index of the available language services, head to localhost:3000.

Python Setup

This repo uses poetry to manage dependencies.

We use an "in-project" venv , which means a .venv folder will be created when you run poetry install.

All python is invoked through entry.py, which loads the environment properly so that relative imports work.

You can invoke entry.py directly (ie, without HTTP or any intermediate js) through bun from the root:

bun py echo --input tmp/payload.json

Bun installation

Bun does not require an installation, like npm does. You can run bun start right after cloning the repo.

Bun will then install dependencies against the global cache on your machine. This still uses the lockfile (bun.lockb).

To update a module version, run bun add <module>@<version>, which will update your lockfile.

One drawback of this is that there is no intellisense, because IDEs rely on node_modules to load d.ts files. You are welcome to run bun install to run from a node_modules. None of this affects python.

See bun's install docs for more details.

CLI

To communicate with and test the server, you can use @openfn/cli.

Use the apollo command with your service name and pass a json file:

openfn apollo echo tmp/payload.json

Pass --staging, --production or --local to call different deployments of apollo.

To default to using your local server, you can set an env var:

export OPENFN_APOLLO_DEFAULT_ENV=local

Or pass an explicit URL if you're not running on the default port:

export OPENFN_APOLLO_DEFAULT_ENV=http://localhost:6666

Output will be shown in stdout by default. Pass -o path/to/output/.json to save the output to disk.

You can get more help with:

openfn apollo help

Note that if a service returns a { files: {} } object in the payload, and you pass -o with a folder, those files will be written to disk.

API Keys & Env vars

Some services require API keys.

Rather than coding these into your JSON payloads directly, keys can be loaded from the .env file at the root. See .env.example for the full list of keys and env vars Apollo reads.

Also note that tmp dirs are untracked, so if you do want to store credentials in your json, keep it inside a tmp dir and it'll remain safe and secret.

Debugging

The server defaults to port 3000. You can test any service directly with curl to confirm Apollo is working independently of Lightning (or any other client).

For example, to trigger a workflow_chat stream:

curl -N -X POST http://localhost:3000/services/workflow_chat/stream \
    -H "Content-Type: application/json" \
    -d '{"content":"make a simple http workflow","history":[],"api_key":"<your-anthropic-api-key>"}'

The api_key field is your Anthropic API key. If ANTHROPIC_API_KEY is already set in your .env, you can omit it. In Lightning, this is configured via the ANTHROPIC_API_KEY environment variable and passed through to Apollo on each request.

The -N flag disables buffering so SSE events appear as they arrive. You should see a stream of event: log lines followed by event: complete. An event: error response means the issue is inside Apollo.

If the stream returns successfully here but Lightning isn't receiving it, the issue is on the Lightning side -- check that APOLLO_ENDPOINT=http://localhost:3000 is set correctly in Lightning's environment (no trailing slash).

To check API key connectivity (Anthropic, OpenAI, Pinecone), hit the status service:

curl http://localhost:3000/services/status

Troubleshooting

If you get errors like poetry: command not found (error code 127), and poetry is set up on your machine, you may need to add these env vars to your .bashrc (or whatever you use):

export BUN_INSTALL="$HOME/.bun"
export PATH="$BUN_INSTALL/bin:$PATH"

Server Architecture

The Apollo server uses bunjs with the Elysia framework.

It is a very lightweight server. By default it includes no authentication, but instance auth can be enabled (see Database below).

Python services are hosted at /services/<name>. Each service expects a POST request with a JSON body, and will return JSON.

There is very little standard for formality in the JSON structures to date. The server may soon establish some conventions for better interopability with the CLI.

Python scripts are invoked through a child process. Each call to a service runs in its own context.

Python modules are pretty free-form but must adhere to a minimal structure. See the Contribution Guide for details.

Services

Each service lives in services/<name>/ and is auto-mounted by service discovery. Every mounted service has its own README with payload specs and examples — start there for anything service-specific.

Chat & orchestration

Service What it does
global_chat Orchestrator and single entry point for OpenFn AI chat; routes to subagents or escalates to a planner. Also see its PAYLOAD_SPEC.md.
job_chat AI chat for OpenFn job code, with a code-suggestions/auto-patch mode.
workflow_chat Generates and edits OpenFn workflow YAML, preserving job code and IDs.
doc_agent_chat Agentic chat over a project's uploaded documents (RAG).

Docs, search & RAG

Service What it does
search_docsite Semantic search over the OpenFn docs (docsite Pinecone index).
embed_docsite Downloads and indexes the OpenFn docs into the docsite index.
doc_agent_upload Fetches and indexes project documents into the doc-agent index.

Adaptors

Service What it does
load_adaptor_docs Parses adaptor function docs into Postgres.
search_adaptor_docs Queries adaptor docs back out of Postgres by version.
latest_adaptors Fetches the latest adaptor versions from the OpenFn repo.
adaptor_apis TypeScript service: produces a JSON schema of an adaptor's API.

Medical vocab & embeddings

Service What it does
vocab_mapper Maps medical vocabularies (LOINC/SNOMED) against the apollo-mappings index. (No README yet.)
embeddings Vector-store wrapper used by the vocab services.
embed_loinc_dataset Embeds the LOINC dataset into apollo-mappings.
embed_snomed_dataset Embeds the SNOMED dataset into apollo-mappings.
embeddings_demo Standalone embeddings demo (Zilliz).

Utilities & support

Service What it does
status Health check: validates Anthropic, OpenAI and Pinecone keys.
echo Test service that returns its input; useful for verifying the pipeline.
auth Instance-auth hook + provisioning (server layer, under platform/, not a mounted service). See Database.
testing Shared acceptance-test harness (not a mounted service).

Database

Apollo uses Postgres for two tables: adaptor_function_docs (parsed adaptor docs, used by load_adaptor_docs / search_adaptor_docs) and lightning_clients (the instance-auth allow-list).

There is no migration framework. The schema is just two schema.sql files you apply with psql, both written with CREATE TABLE IF NOT EXISTS so re-running them is safe:

  • services/load_adaptor_docs/schema.sqladaptor_function_docs. This table is also created lazily the first time load_adaptor_docs runs, so applying it by hand is optional.
  • lightning_clients — created and kept current by the migration runner (platform/src/db/migrate.ts, migrations under platform/migrations/). It is applied automatically at Apollo startup when POSTGRES_URL is set; no manual psql step is needed.

First, make sure you've configured your desired POSTGRES_URL in your .env file.

Create the DB

Create a Postgres DB matching your POSTGRES_URL from the .env file

To reset the DB

set -a; . ./.env; set +a; psql "$POSTGRES_URL" -c "DROP TABLE IF EXISTS lightning_clients, adaptor_function_docs CASCADE;"

Run the migrations

lightning_clients is migrated automatically at Apollo startup (see platform/src/db/migrate.ts). Apply the adaptor-docs schema separately:

set -a; . ./.env; set +a; psql "$POSTGRES_URL" -f services/load_adaptor_docs/schema.sql

Instance authentication (optional)

/services/* can be authenticated so that only known clients (e.g. specific Lightning instances) may call it, with Apollo using each client's own Anthropic API key for that client's requests.

  • It is transparent and backward compatible (map-if-known-else-forward): the auth hook is always active but only swaps in a key when it recognises the caller. Clients are looked up in the lightning_clients table via POSTGRES_URL; if that table can't be reached, known-client swaps simply don't resolve and every caller degrades to the forward path (it does not blanket-reject).
  • The credential is the api_key the caller already sends in the request body — there is no bearer token, no Authorization header, and no Lightning-side change. Apollo stores only a SHA-256 hash of it.
  • On a match, the inbound api_key is treated purely as a credential and is never forwarded to the LLM: it is replaced with the client's stored Anthropic key (so LLM usage bills to that client), or stripped — falling back to the global ANTHROPIC_API_KEY — if the client has no stored key.
  • An unrecognised key is forwarded unchanged only if it looks like an Anthropic key (prefix sk-ant-) — this is the bring-your-own-key path. An unrecognised key that is not sk-ant--shaped is a likely Lightning credential, so it is rejected (401) rather than forwarded, which would leak it to the LLM. A request with no api_key falls back to the global key.
  • Health/root endpoints (/livez, /status, /) are outside /services/* and never subject to the auth hook. Internal Apollo-to-Apollo apollo() calls are exempt via a per-process internal token (APOLLO_INTERNAL_TOKEN), not by network position.

To enable it and provision clients, see platform/src/auth/.

Deploying: pin APOLLO_INTERNAL_TOKEN in production

APOLLO_INTERNAL_TOKEN is the mechanism that lets internal apollo() self-calls through the auth hook: a self-call carries it in the x-apollo-internal header and the hook matches it. Because the global ANTHROPIC_API_KEY is dev-only, a token that fails to match is a dead end (a 401), not a soft fallback to the global key — so the match has to work in every topology.

  • Always set APOLLO_INTERNAL_TOKEN to a shared value across the deployment in production. When it is set, the per-process minting path never runs.
  • The per-process random mint is a dev-only convenience. Apollo assumes one Bun process per host; the apollo() self-call relies on loopback calls landing on the same process, so a minted token only works single-process-per-host.
  • If reusePort clustering is ever enabled, a shared APOLLO_INTERNAL_TOKEN is required, not optional: a self-call can otherwise be routed to a sibling process that minted a different token and will 401. Startup logs the token's provenance (env vs minted) and warns when this dangerous combination is detected; a mismatch at the hook is logged as a distinct, greppable warning.

Websockets

Every service can receive connections in one of two ways:

  • HTTP POST method
  • Websocket connection

The same URL is used for both connections, clients must request upgrade to a websocket.

Websocket connections will receive a live log stream.

Websockets use the following events:

start: sent by the client with a JSON payload in the data key.

complete: sent by the server when the python script has completed. The result is a JSON payload in the data key.

log: sent by the server whenever the python process logs a line through a logger object.

Note that print() statements do not get sent out to the web socket, as these are intended for local debugging. Only logs from a logger object are diverted.

Docker

To build the docker image:

docker build .  -t openfn-apollo

To run it on port 3000

docker run -p 3000:3000 openfn-apollo

Contributing

See the Contribution Guide for more details about how and where to contribute to the Apollo platform.

Release

New releases are assembled as Docker images whenever a version tag of the form @openfn/apollo@x.y.x is pushed to GitHub.

This tag is automatically generated upon merging to main.

Github's main should represent the latest production version of apollo. Ideally, releases should be assembled on a branch - usually release/next or release/1.2.3. But this is not required - releases can be cut straight from a fix or feature branch, or even from main.

To release a new apollo version:

  • Checkout the branch that contains the release
  • Run bun changeset version
  • (if there are no changesets, you can either run bun changeset to create one, or manually bump package.json and update changelog.md)
  • Sanity check the new version number and changelog updates, just to be sure there's no funny stuff.
  • Commit changes and push
  • When the PR is merged to main, a new tag is generated and a new Docker image is built

About

No description, website, or topics provided.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors