We had a file called photo-styles.ts with 40+ hardcoded scene descriptions. Each one mapped to an industry. A food brand got:
Morning light through large windows, warm oak table, a ceramic bowl, scattered herbs, a linen cloth pushed aside. Steam rising, cast-iron nearby.
A fashion brand got:
Late afternoon in a raw concrete room, bare factory windows casting long shadows across the floor, one wooden chair against the wall, a jacket thrown over it.
A tech brand got a minimalist desk. A jewelry brand got dark marble and brass. We wrote a scene for every industry we could think of, and the system worked. For a while.
The identical coffee brand problem
We generated two coffee brands in the same week. One was a Thai-inspired roastery with warm, communal energy. The other was a Nordic minimalist single-origin operation. Both brands had different names, different color palettes, different voice descriptors.
Both got the same lifestyle photo: morning light through large windows, warm oak table, ceramic bowl, scattered herbs, linen cloth, steam rising.
The scene description was hardcoded per industry. Both brands matched the coffee keyword. Both got the same lifestyleScene string. Two coffee brands that looked like siblings when they should have looked like strangers.
The human bottleneck
The hardcoded descriptions had a second problem. A human (me) wrote them. Each scene represented one person's idea of what a "food brand" looks like. I defaulted to what I'd seen in Cereal magazine and Kinfolk. Every food brand got my personal aesthetic preferences, regardless of whether the founder was building a BBQ cart in Austin or a fine-dining omakase in Tokyo.
The system was supposed to create unique identities. The hardcoded scenes made them uniform.
The category mismatch
Some businesses don't fit clean categories. "Muay Thai gym that also does nutrition coaching" matched gym in the fitness entry and got a wellness studio aesthetic. Clean white surfaces, soft natural light, someone doing a gentle stretch.
A Muay Thai gym should smell like sweat and chalk. The keyword match picked the closest category, and the scene was wrong.
The fix: visualDirection
We added one field to Claude's brand synthesis output:
{
"visualDirection": "One sentence describing the visual direction for this brand's photography and design."
}
Claude reads the birth data, the belief statement, the business idea, the voice descriptors. From all of that, it writes a single sentence:
Warm Mediterranean light filtering through linen curtains, terracotta and aged brass, shot with nostalgic film grain.
Or:
Clean Scandinavian minimalism with warm natural light, shot on medium format film.
Or:
Raw industrial concrete and steel under harsh fluorescent, documentary grain, nothing staged.
This sentence goes into every image prompt. It's the brand's visual DNA. More specific than an archetype, more personal than a per-industry template.
The voice descriptor trick
Brand personality needed to appear in photos. You can't tell FLUX "this brand is unhurried and tactile." Those are abstract concepts.
The trick: pipe voice descriptors through the person's gesture.
The gesture feels unhurried and tactile.
vs.
The gesture feels bold and precise.
FLUX understands gestures. "Unhurried" produces slow, deliberate hand movements: pouring coffee, turning a page. "Bold" produces definitive actions: placing an object down, gripping with intent.
Two food brands with the same lifestyleScene now produce different images because one founder has "unhurried, tactile" descriptors and the other has "bold, precise." Same kitchen. Different person. Different energy. Different brand.
The "Modern, contemporary" anchor
We still inject "Modern, contemporary interior" into every scene prompt. This phrase acts as a quality floor.
Without it, a tattoo studio gets a dingy basement from FLUX's training data. With the prefix, every space looks aspirational. You can have any theme, any material palette, any mood. But it has to look modern.
Two descriptions per product
We also split product descriptions into two fields:
- product1Desc: "Cold-pressed Argan Face Oil, Atlas Mountains harvest." Marketing copy for the brand book.
- product1PhotoDesc: "Small amber glass dropper bottle with minimal label, placed on raw linen." Photographer direction for the image prompt.
"Atlas Mountains harvest" produced a landscape of mountains instead of a product shot. FLUX renders what you describe. Give it a metaphor and it renders the metaphor.
The result
Before visualDirection: Two coffee brands, same photos. A Muay Thai gym with yoga studio lighting. Every food brand looked like Cereal magazine because I like Cereal magazine.
After: Two coffee brands, different visual worlds. The Thai-inspired roastery gets warm tropical light and terracotta. The Nordic one gets cool Scandinavian minimalism and pale birch. Both use the same archetype (heritage-craft), same camera, same photographer references. But the visualDirection sentence makes them feel like they come from different continents.
We deleted zero lines of the hardcoded scene descriptions. We added one field to the synthesis output and one sentence to every image prompt. The system went from "good images, wrong personality" to "good images, right personality" in about three hours.
Don't write prompts for categories. Write prompts for individuals. Let the AI that knows the most about the brand write the creative direction for the brand.
