03:00 CEST The house is asleep. The laptop has been shut for hours. On a self-managed VPS in Stockholm - somewhere in a rack I will never see, in a building I have never visited - eleven small jobs start to wake in a quiet rotation.
The first one is a sentinel. It walks the unit list, checks that yesterday's twenty-odd timers all reported home, writes a single row to a SQLite table, and goes back to sleep. Two minutes later a zombie cleanup sweeps the agent socket directory, kills a Claude Code worker that should have exited an hour earlier, and leaves a note. By 03:30 the database housekeeping job has vacuumed last week's research cache out of three SQLite files and stamped a backup. By 04:00 Newton has finished an autoresearch sweep and queued five paragraphs into a Notion page. By 06:00 there is a 400-word briefing on my phone - overnight system health, anything new from Newton, the Hermes backtest summary - and the day has begun before I have opened the laptop.
None of this is dramatic. That is the whole point.
Every agentic system that lives longer than a demo grows an operational layer around itself. Some of it self-heals - a watcher notices the watcher has failed and restarts it. Some of it is scheduled intelligence - research that runs at the same hour every morning so the operator has something to read with their coffee. And some of it is housekeeping - backups, log rotation, retention, the boring work that keeps the disk from filling up at the worst possible moment. The interesting question is not "does the system have these jobs?" - every production system does. The interesting question is "what shape do they take when the workers are agents calling language models instead of stored procedures calling a warehouse?"
What follows is the operational layer as Nexus actually runs it: eleven scheduled jobs in three buckets, three design rules they all obey, and a refactor that reclaimed roughly a hundred and thirty thousand LLM-seconds a week by being honest about what an LLM is good for. The shape is recognisable to anyone who has run an overnight batch in a bank. The internals are not. The translation from one to the other is where this case study goes.
Three buckets, eleven jobs
The scheduling layer is native systemd user timers, not the orchestrator's built-in cron primitive - which did not survive a rebuild and tended to lose its job list whenever the gateway restarted. Timers survive gateway restarts cleanly; journalctl gives per-job history out of the box; and a timer firing does not depend on the gateway being responsive - the timer triggers a script that calls the gateway, and if the gateway is down the script logs the failure to nexus.db.cron_run_history and exits cleanly. One extra layer in exchange for an independent failure mode.
The jobs themselves fall into three groups. Self-healing watches the system. Scheduled intelligence does the work the operator would otherwise do at the start of every day. Housekeeping is the unglamorous floor-sweeping that prevents an operational layer from quietly becoming the source of its own outages.
Self-healing - three jobs that watch the system
Sentinel is the meta-job - a watcher of the watchers. It runs hourly, walks every other timer, checks that each fired within its expected window, that its cron_run_history row landed, and that no script is holding a stale lock file from a previous run. If any check fails it sends one paired alert card to the operational channel. The alert is small, the bar to fire is high, and the result is that operational chatter does not drown out the real signal when the real signal arrives.
Zombie cleanup walks the ACP socket directory at 03:00, finds Claude Code workers that should have exited but did not, and kills them. The bug it exists for is real and recurring - a session can hang on a malformed pipe inside a bash probe, leave its socket open, and quietly hold a slot until the gateway restarts. The job is a confession: production systems are full of small failures like this, and the right answer is usually a cheap recurring sweep, not a heroic root-cause fix.
Weekly reviews fires Sunday at 03:00, reads seven days of cron history and nightly-maintenance digests, and writes a short report. Trends - a job whose duration is creeping up, a model that has drifted to its fallback four nights in five, an integration timing out - surface here before they surface anywhere else. Most weeks the report is boring. The week it is not boring is the week it has earned its keep.
Scheduled intelligence - five jobs that do the operator's work for them
Morning briefing at 06:00 is the daily catch-up over coffee. The orchestrator reads three sources - overnight agent-health rows, the last completed Newton sweep, and the most recent Hermes backtest - and writes a 400-word Telegram message in three sections. Sections with no new content are skipped, not padded. The output is short on purpose; a briefing that takes ten minutes to read is one that gets archived unread.
Newton autoresearch runs every two hours from 08:00 to 22:00 and despatches the research swarm against a rotating set of queries - senior AI engineering roles in the Nordics, regulatory developments, market watchlists. Each completed sweep writes a full Notion page and pings the operational channel; the briefing reads the page, not the channel.
Leonardo daily charts at 08:00 generates the FX, equity, and risk-asset chart deck for the day and pages it into a date-stamped Notion database. Hermes nightly backtest at 01:00 runs the trading-intelligence agent against the previous day's market data and writes the result into the database; Hermes balance poller runs hourly during market hours, checking exchange balances and posting alerts only on movement. Five jobs, between them, replace what would otherwise be the first hour of the operator's morning.
Housekeeping - three jobs that prevent the future
Backup runs at 02:00, takes a restic snapshot of the configuration tree, agent state, and SQLite databases to a remote repository, and writes a row with the snapshot ID and size delta. Log rotation at 02:30 trims journal files and per-agent logs against a retention policy. Nightly maintenance is the broadest of the three - twenty staggered scripts across the 02:00–04:00 window covering task hygiene, database vacuums, scheduled-task health, plugin and config audit, agent health, credentials, MCP server health, telegram delivery, upgrade-pipeline status, cost monitor, and OS-level VPS housekeeping. Each writes structured findings into a nightly_maintenance table; a cross-system report at 03:30 composes a digest and posts it. Housekeeping is dull and indispensable. Without it, the operational layer becomes the source of its own outages within a month.
Three design rules every job obeys
The eleven jobs differ in what they do; they agree on how they do it. Three rules apply to every script in the directory, and the rules are what makes the layer survivable.
Idempotency. Every job has to be safe to run twice. A timer can misfire, a manual systemctl start during debugging can collide with a scheduled run, the 02:00 backup can spill into a 02:30 window when the previous snapshot was unusually large. None of those scenarios should produce a wrong result. The discipline shows up at the script level - every write goes via a primitive using an idempotent SQL pattern, every Notion page keys off a deterministic date-stamped slug, every Telegram send carries a content hash so a re-send is a no-op. A script that does not have the property gets caught at smoke-test time.
Dual-delivery. Every meaningful output ships to two destinations: the human channel and the audit channel. Operational alerts route to one Telegram bot, final outputs and briefings to another. Reports also write to Notion as full pages and to nexus.db.cron_run_history as structured rows. The pattern matters because the two consumers want different things - the operator wants a thirty-character pager line at 03:14 in the morning; the auditor wants a complete, structured, timestamped record six months later. A single channel cannot serve both well. Two channels with a shared formatter and idempotent writes can.
Sentinel. Without a meta-job that watches the other jobs, the operational layer can silently fail in the exact mode it is supposed to prevent - a timer goes inactive, no row lands, no alert fires because the alert script was the one that died. The sentinel is the cheap insurance: it knows what the schedule should look like, it knows where the rows should land, and it alerts when reality and expectation diverge. The day a regulator asks "how do you know your batch ran last night?" the right answer is not "because nothing complained" - it is "because the sentinel said it did, and here is the row."
Narration, not computation
A discipline that landed late and changed the shape of everything: do not ask a language model to do what code can do. Use language models for narration, not computation.
The first version of the scheduled-intelligence jobs were single-prompt monoliths. The morning briefing was one large prompt that asked the orchestrator to query three databases, summarise the results, and format the output. The weekly review was the same shape against a longer window. The jobs worked - but they were slow, expensive, and brittle. A model that fetches its own data, reasons about it, and formats the output in one breath does all three jobs worse than a system that separates them.
The refactor split every Tier 1 cron into three thin layers. compose_data.py queries the databases and shapes the result into a structured Python dictionary - no LLM. narrate.py takes that dictionary, makes a single call to a cheap model - at the time of writing, deepseek-v4-flash:cloud - and asks for a one-paragraph narration in a fixed register. assemble.py stitches the narration into the final output and routes it through the dual-delivery primitive. A thin shell driver, around a hundred and fifty lines, calls the three in order and writes one cron-history row.
The arithmetic that fell out: roughly a hundred and thirty-two thousand LLM-seconds reclaimed per week across six cron jobs. Cost down by an order of magnitude. Reliability up - because most of the work is now plain Python, the cheap LLM call is the only thing that can fail in an interesting way, and when it does the deterministic data is already on disk waiting for a retry.
What translates to a bank
Banks have run an overnight batch since before any of the regulators currently writing AI law were born. Sanctions list refreshes against the customer base. Risk position roll-ups for the morning meeting. Reconciliations between front-office systems and the general ledger. The shape of the work is familiar. The novelty is what runs inside the slot now - not a stored procedure hitting a warehouse, but an agent calling a tool calling a model. The operational disciplines transfer in a specific way.
| Nexus pattern | Translates to |
|---|---|
| Sentinel + self-healing - meta-monitor, zombie cleanup, weekly anomaly review | DORA Article 17 operational resilience controls. The supervisory expectation is not "the batch worked" - it is "you can prove the batch worked, name what would have caught it failing, and exercise that capability against synthetic failure on a calendar." Sentinel patterns become a control-plane service with named owners and tested runbooks. |
| cron_run_history with structured exit, duration, and payload per fire | DORA Article 18 ICT-related incident classification - the table is the system of record from which any incident report has to be reconstructable; AI Act Annex III record-keeping for high-risk AI systems for the production-decisioning jobs that sit inside this layer. |
| Scheduled intelligence - briefing, autoresearch, monitoring, balance polling | AMLR and sanctions batch cycles - overnight re-screenings, watchlist refresh, transaction-monitoring alert generation. Same shape, regulated content. The discipline of narration, not computation matters more here, not less - an AML alert generated by an LLM doing arithmetic is an AML alert no MLRO can defend. |
| Housekeeping - backup, log rotation, retention-policy maintenance | Internal-audit and regulatory-reporting windows - the unglamorous discipline that makes the rest of the layer auditable on demand and survivable across an outage. The retention policy is enforced by the housekeeping job, not by hope. |
| Dual-delivery - operational chatter to one channel, final outputs to another | Separation of operational audit and business audit trails, with different retention rules, different access controls, and different consumers. Most pilots fold the two together; they need separating from day one. |
None of this is exotic. All of it is what a regulated firm's risk function would expect to see in the control narrative for any scheduled automation, agentic or otherwise. What is new is that the workers are non-deterministic, the costs are metered in tokens not CPU-seconds, and the audit artefact has to capture the model version and the prompt at the time of the fire - not just the script version and the input parameters. The shape of the answer is recognisable. The contents have changed.
What I would do differently at bank scale
Three things. First, the scheduler. Native systemd user timers are excellent for one VPS with a single operator who can journalctl --user -u cron-newton-autoresearch.service at three in the morning when something looks wrong. At bank scale the scheduler has to be a proper control-plane service - Airflow, Argo Workflows, Temporal, or the firm's preferred equivalent - with role-based access controls, separated environments, audit logging independent of operator shell access, and a UI both the operations team and the auditor can work from. The sentinel pattern survives the transition; it just becomes a control-plane health endpoint rather than a script.
Second, the audit trail. The single cron_run_history table is the right shape but the wrong retention. At bank scale the operational table and the business-audit table separate cleanly: operational events retained ninety days, business-impacting events retained per the regulatory rule that governs them - seven years for AML, ten years for MiFID II, longer for credit decisioning under AI Act Annex III. Dual-delivery becomes dual-retention, with the routing rule explicit at write time and enforced by the housekeeping job that already exists.
Third, the narration not computation rule belongs in procurement. Most bank pilots of agentic stacks land with monolithic prompts that ask a frontier model to do work that ought to be cheap deterministic Python with a thin narration layer on top. The cost looks reasonable in a demo and unreasonable in production. The rule should be a procurement constraint: any production agentic workload over a defined frequency has to ship its narration layer separately from its computation layer, with the LLM call sized to the narration scope. That single discipline closes the gap between a £200,000 pilot and an £8m production line item more reliably than any vendor negotiation will.
Those three changes are the gap between "eleven jobs on one VPS" and "the scheduled-agent layer a bank could run alongside its existing batch." Everything else - the three buckets, the design rules, the dual-delivery, the audit row per fire, the narration-not-computation refactor - generalises cleanly. The shape is already familiar. The contents are the work.