Volver al blog
engineeringPublicado April 28, 2026

Why Our Emails Never Worked: 4 Root Causes in One Debug Session

Why Our Emails Never Worked: 4 Root Causes in One Debug Session

We shipped a complete email system: 17 React Email templates, a BullMQ queue, Resend integration, i18n in 5 languages. Tests passed. Build was clean. We went live.

41 emails failed. Every single one.

The architecture

Next.js (Vercel) → queueEmail() → Redis (BullMQ) → Worker (Railway)

import template.tsx

render() → HTML

Resend → inbox

Simple. Except the worker is plain Node.js, and the templates are React components written for Next.js.

Root Cause 1: Templates didn't exist in the worker

The worker's email-queue.ts imported templates via import("../../emails/${template}"). That path resolved to apps/worker/emails/, a directory that never existed. Templates lived in apps/web/emails/.

The import failed silently. Every job threw "Email template not found."

This bug doesn't surface during development. The web app (where templates live) and the worker (where templates render) are separate services. TypeScript compiled the worker cleanly because the dynamic import path isn't type-checked at build time. The failure waits for a real job in production.

Fix: Copy templates to the worker. Add react, @react-email/components, @react-email/render as worker dependencies.

Root Cause 2: Wrong JSX transform

Worker's tsconfig: "jsx": "react". Requires import React from 'react' in every file.

Email templates: written for Next.js "jsx": "react-jsx". Automatic transform, no import needed.

Result: React is not defined on every template render.

Same source code, different runtimes, different JSX behavior.

Fix: Change worker tsconfig to "jsx": "react-jsx". This required changing rootDir too, which led to Root Cause 3.

Root Cause 3: The path that moved

Changing rootDir from "./src" to "." moved the compiled output from dist/index.js to dist/src/index.js. The Railway entry point still pointed to dist/index.js.

We tried cp dist/src/* dist/ to flatten the output. That broke everything worse. The copy moved files one directory level up, so ../../emails/ from dist/jobs/ resolved to the raw .tsx source files instead of the compiled .js in dist/emails/.

Three-way path confusion: source paths, compiled paths, and flattened paths all pointed to different directories. Two of those directories contained .tsx files that Node.js can't execute.

Fix: A 1-line shim at dist/index.js:

require("./src/index.js")

All relative paths preserved. Compiled templates load from the right directory.

Root Cause 4: Missing translations

Templates rendered. But they showed raw i18n keys: email.welcome.heading instead of "Welcome."

The i18n helper loads locale JSON from ../messages/ relative to __dirname. In the compiled worker, that resolves to dist/messages/, which didn't exist.

Fix: Copy messages/ to the worker, add cp -r messages dist/messages to the build script.

Bonus: Zombie Processes

After fixing all four causes, emails still failed locally. Eight old tsx watch worker processes from previous dev sessions were competing for Redis jobs. BullMQ distributes jobs round-robin across consumers. The zombie workers grabbed emails before the fixed worker could, and crashed with the old React is not defined error.

Every other job succeeded. Every other job failed. The pattern was random enough to waste an hour before we checked ps aux.

pkill -f "tsx watch.*worker" fixed it instantly.

What we learned

"Build passes" means nothing about runtime. TypeScript compiled the worker for months. The dynamic import, the JSX transform, the i18n path: all three are runtime-only failures.

Dynamic imports across service boundaries will betray you. A path that works in apps/web/ silently resolves to nothing in apps/worker/. No error at import time. No error at compile time.

Test the deployed artifact. tsx in dev mode and node dist/index.js in production behave differently. Different module resolution, different JSX transform, different __dirname.

Kill your zombies. Multiple workers on the same BullMQ queue produce inconsistent results that look like intermittent bugs. Check ps aux before debugging application code.