A four-week intensive on consensus, replication, and failure — taught live by Yusuf Abara to 28 senior engineers over April 2026. Reading, code, conversation. No videos.
Most online courses are videos of someone explaining what they already know. Forge Lab is not that. For four Monday evenings in April, twenty-eight senior engineers meet live with Yusuf Abara — formerly a principal at Forecast and Cinder, lecturer at CMU — to argue about Raft, build a small replicated state machine, break it on purpose, and read the papers that started the conversation.
The first half of each session is structured: a 45-minute lecture on the week's topic, then a worked example projected live from a real codebase. The second half is open — engineers bring real problems from work, the cohort argues, Yusuf moderates. Between sessions, there's a Slack and a small amount of homework: read the papers, implement the exercise, write up your reasoning.
By the end, you'll have built a working consensus-backed key-value store from scratch, hand-tuned through three planned failure scenarios. More importantly, you'll have read deeply with twenty-seven peers, and you'll know who to text when something breaks at work.
Most recently, Yusuf was principal engineer at Forecast where he led the team behind their global write path — a Paxos-backed replication layer serving 4.2M writes/second across 11 regions. Before that he spent six years at Cinder building their billing ledger, and four years at CMU teaching distributed computing to graduate students.
He has paged at 2am, at 4am, at 6am the morning of his wedding. He has run a 41-day migration. He has written one of the better blog posts on quorum reconfiguration that's still cited in the Raft literature. He started Forge Lab because he could not find a course at this depth that did not also try to sell him a vendor.
"By Week 3 you'll be reading the Spanner paper and arguing about clock skew at midnight. That's what I want from this."
A ~3h homework: 1–2 papers (annotated by Yusuf with margin notes), one code exercise, and a 200-word write-up of what you tried. The Slack is busiest on Wednesday evenings.
Three Wednesday slots (rotating). Groups of four. Forty-five minutes. Bring something specific.
Permanent. Alumni from cohorts 01–06 (n=142) still chat in #consensus and #pager-stories. You get full access from Day 1.
If you finish the capstone (Week 4), you get a hand-numbered letterpress certificate, mailed. Not for a resume — for the wall.
Apply with a short form: who you are, what you ship, why this. We read every application personally. Decisions go out April 1; you'll hear back either way.
"Week 3 saved my career. I'd been chasing the wrong replication bug for nine weeks. Yusuf drew the diagram on a Wednesday; I shipped the fix on a Thursday."
"The reading load is real. The first week I almost dropped. By Week 4 it felt like the only place anyone was talking to me as a peer."
"I have the certificate framed above my desk. My team thinks it's a joke. It is not a joke."
"Twelve weeks after the cohort ended I rewrote our quorum reconfiguration and removed 1,840 lines of code. The reasoning came from the Wednesday office hours."
"It is the only $1,499 my company has ever paid for me that I would have paid myself."
"The Slack alone is worth the fee. A year on, I still text three people from my cohort about real incidents at 2am."
Still curious? Email Yusuf directly at [email protected] — usually a same-day reply.
You should have at least three years of production engineering experience — running services in real environments, owning an on-call rotation, debugging at least one outage. We assume Linux, TCP, basic SQL. You don't need to know Raft or Paxos coming in; you'll know both cold by the end of Week 2. Cohort 06 ranged from senior backend engineers (year 4) to a CTO (year 17).
Weeks 1–3 are live-only. The format depends on real-time debate and we found recordings hurt that. Week 4 (capstone) is recorded for review. If you can't commit to live attendance for the first three Mondays (or a one-off conflict), please don't apply for this cohort — we run another in October.
Full refund any time before Week 2 begins (April 27, 18:00 UTC) — no questions, no awkward emails. After Week 2, the seat is yours. If something serious happens (medical, family) we'll talk; in three years of running this, we've prorated four people and refunded one in full mid-cohort.
Twenty-eight is the largest group where everyone can speak in a 3-hour session and Yusuf can remember everyone's name and the system they work on. Past 30, the open hour devolves into broadcast. We tried 36 in Cohort 02 — the post-cohort survey was clear.
You'll leave with a working consensus-backed key-value store you built yourself (in Go, but the patterns translate), with three planned-failure runbooks. It is not production-ready, on purpose — it's instructive. The judgment you'll have built about replication, partitions, and clock skew is what you'll bring to work.
Yes. Reply to your acceptance email with a billing contact and PO if you have one; we send a Stripe invoice, NET-15 default, W-9 on request. About 68% of Cohort 06 was company-expensed.
Four Mondays in April, online, capped at 28. Mailed certificate optional, lifetime Slack included.