Pittsburgh · GMT−4 · he / him · previously Forecast · Linnea · Halcyon

Marcus Tobin. Distributed systems engineer, consulting since 2024.

I work with high-performance teams on database internals, query planning, and concurrency. Available for short engagements — usually 4 to 12 weeks, hands on keyboard, embedded with one team.

Start a conversation Read selected work resume.pdf · 2 pages · 84 kb
Areas
Query planning Concurrency Postgres internals CockroachDB Rust · async Tokio CRDTs Geo-spatial · k-d trees Distributed tracing SRE · incident review Load & capacity modeling Bevy ECS
Won't help with
Frontend Hiring 5 mid-level engineers Greenfield monoliths Pure ML systems Vibes
§ Worked with · 2014–2026
Forecast Linnea Halcyon Stratos Protocol Atrium Postgres Tokio Bevy Quartermast Northwind Forecast Linnea Halcyon Stratos Protocol Atrium Postgres Tokio Bevy Quartermast Northwind

Selected work

Five roles. Each row links to a longer write-up where it exists; otherwise the bullet points are the whole story.

  1. 2024 — Now
    2.2 yr

    Forecast

    · Founding engineer

    Built the realtime query engine for a financial scenario-planning product. Three engineers; I led data & storage.

    • Designed a column-store + incremental-view planner; P95 1.18s on the median dashboard at 12,408 scenarios.
    • Wrote the Rust async runtime adapter on top of Tokio that the team still uses (1,402 LOC, 0 unsafe).
    • Shipped the first paying customer 11 weeks after I joined; ARR crossed $1.4M at the close of Q1.
    RustTokiocolumn-storePostgres
  2. 2021 — 2024
    3.0 yr

    Linnea

    · Staff engineer

    Led a 14-month Postgres-to-Cockroach migration for the analytics tier. Zero customer-visible downtime.

    • Cut warehouse cost −42% versus the rolling 6-month baseline by replanning hot dashboards.
    • Authored Linnea's internal "on-call playbook for query slowness"; still mandatory reading for staff+ candidates.
    • Drove the SLO program from idea to operating discipline; 0.0064% error budget burn on the read path through 2023.
    PostgresCockroachDBSLOmigration
  3. 2018 — 2021
    3.0 yr

    Halcyon

    · Senior engineer

    Owned distributed tracing platform serving 412 internal services.

    • Reduced ingest cost-per-span to $0.0064 while doubling retention to 14 days.
    • Open-sourced our sampling library (halcyon-rs/sample); 2,184 GitHub stars, used in production at four named companies.
    • Mentored five engineers from mid to senior — three are now staff-level elsewhere.
    OpenTelemetryRustKafka
  4. 2016 — 2018
    2.0 yr

    Stratos Protocol

    · Engineer

    Designed and shipped a custom k-d tree index for geo queries.

    • Replaced an off-the-shelf R-tree, dropping P99 nearest-neighbor latency from 84ms → 11ms on the production fleet.
    • Patent: US 11,408,124 — Coarse-to-fine geospatial bucketing for moving point sets. (Assignee not me; thoughts on patents available on request.)
    k-d treeC++geo
  5. 2014 — 2016
    2.0 yr

    Atrium

    · Engineer

    First job out of CMU. Wrote a lot of CRUD, shipped the first production GraphQL gateway, learned the trade.

    • Built the company's first read-replica failover system — still in service eleven years later.
    • Took the on-call beeper, on average, every fifth weekend. Learned what production actually feels like.
    GoGraphQLfirst job
Bench · forecast
A flame profile from the realtime planner running 50k scenarios. The expensive frame is exactly the one you would expect.
fn main · 50,000 scenarios · 1.18s P95 cargo bench plan_50k
plan_50k · 1180 ms
build_plan_tree · 850 ms
spill_to_disk · 210 ms
flush · 118 ms
cost_model_eval · 496 ms
hash_join · 284 ms
sort · 70 ms
io · 210 ms
misc
stat_estimate · 330 ms
card_est · 166 ms
probe · 260 ms
alloc · 210 ms
io
histogram
sample
build_table
scan

Talks, writing & open source

Conference talks I have recordings of, essays I still stand behind, and the OSS work that survived its own pull requests.

Talks
  1. 2025 · 10
    SREcon · "Six P99s that lied to me, and what I did about it"
    A 28-minute talk on misleading latency histograms, with the corrected dashboards open in a second window. Slides + recording linked.
  2. 2024 · 09
    PgCon · "Custom indexes you can actually keep in production"
    On building and operating GIST and SP-GiST opclasses, drawing on the Stratos k-d tree work and three regrets I had at Linnea.
  3. 2023 · 09
    Strange Loop · "Concurrency for people who hate Rust"
    A reluctant defense of async/await for non-systems engineers. The most viewed talk I've given (218,408 views and counting).
  4. 2022 · 05
    PostgresOpen Silicon Valley · "Reading EXPLAIN like a novel"
    A tutorial on plan trees, joins, and the four lies the planner tells you on a Tuesday afternoon.
Writing
  1. 2026 · 02
    Your error budget is a budget
    On treating SLO error budgets like real money, with a worked example from Forecast's first quarter.
  2. 2025 · 04
    The wrong P95 is worse than no P95
    On weighted vs. simple histograms; the talk above is the short version of this essay.
  3. 2024 · 11
    Notes on running Cockroach on bare metal
    Capacity tuning, NUMA, raid layout — what survived from the Linnea migration, what I'd do differently today.
  4. 2023 · 08
    Don't hire me to do this
    A short essay on when consulting is the wrong answer. I republished it when I went independent.
Open source
postgres/postgres 14 commits
Mostly planner stat-estimate corrections and one selectivity fix landed in v16.
coreplanner
tokio-rs/tokio 8 commits
Backpressure work on the unbounded channel; small docs cleanup that survived review.
runtimechannels
bevyengine/bevy 19 commits
ECS scheduler — parallel system ordering and a stage-merge optimization in v0.13.
ECSscheduler
halcyon-rs/sample maintainer
Distributed tracing sampler I wrote at Halcyon. 2,184 stars, MIT.
tracingRust

About

I have written code professionally for about ten years. I am picky about the work I take on because most companies don't actually need someone like me — they need to hire five mid-level engineers and run a real on-call rotation. If you have an actual hard distributed-systems problem, and you want someone with deep query engine and concurrency experience for four to twelve weeks, this is the page to start from.

I prefer being one of three on an engagement, not the lone outside expert. I don't ghost-write code or attend status meetings. I will tell you when the answer is to hire, not to consult.

Outside work I cycle a lot, repair old typewriters, and run a small reading group for engineers on databases papers from the 1980s. All of this is on Tuesday nights.

§ Education
  • 2010 — 2014
    Carnegie Mellon · BS Computer Science
    honors thesis · query optimization
  • 2014
    Postgres summer of code · planner
    mentor: Peter Geoghegan
§ Operating principles
  • → Write down the question before the code.
  • → A profiler beats a hot take.
  • → If it doesn't fit on a 2-page resume, it didn't ship.
  • → Two engineers, not five, beats five not three.
§ Now
dated · 2026.04.18 · Pittsburgh

Booked through Q3 2026. Available for a 6-week engagement starting October 6.

Current rotation: 3 days a week on the Forecast planner, 2 days reviewing query plans for a Series-C team in Berlin (NDA). Open to a fourth project starting Q4 — please write before September.

last commit · 39 minutes ago
§ Contact — preferred
marcus@tobin.engineering

Write a paragraph. Tell me the team size, the problem in plain prose, the deadline, and the budget range. I read every email and reply within two business days; if I'm not the right fit I'll usually suggest someone who is.

C · press to copy the address · or use the inbox you actually read