How gpt-image-2 Creates a Book Illustration | MyOwnChildbook

You click “generate” and thirty seconds later, an illustration appears. What happens in those thirty seconds is more complicated - and more interesting - than it looks. Edwin, the data engineer who built the pipeline at MyOwnChildbook, explains it step by step. Not marketing copy - a technical explanation for anyone who wants to understand what is actually happening.

What gpt-image-2 is

gpt-image-2 is OpenAI’s most recent image generation model, released in 2025. Unlike earlier generations of models, it combines GPT-level text comprehension with a diffusion-based image decoder. In practice, this means the model can follow nuanced text instructions far more reliably than its predecessors. “An illustration of a girl with red curly hair, a yellow raincoat, and a brown dog, standing in a rain shower on a cobbled street” - that works. Not always perfectly, but significantly more reliably than DALL-E 2 or Stable Diffusion on comparably complex prompts.

How diffusion works: from noise to image

The core principle of diffusion-based models was described in 2020 by Ho et al. in the influential paper “Denoising Diffusion Probabilistic Models” (NeurIPS 2020). The idea: train a model by progressively adding noise to images - the “forward process” - then teach the model to find its way back: from noise to coherent image, the “reverse process”.

At inference, the model starts with pure noise, an image of entirely random pixels. Over dozens of steps, it denoises that image, guided at each step by the text prompt. Edwin: “Around step 27 of 50, the character started becoming recognisable. Before that: noise. That transition is still impressive every time I see it.”

Code on a laptop screen visualising the AI pipeline for illustrations

The hardest problem: character consistency across eleven pages

Generating one beautiful illustration is not the hardest part. The hardest part is keeping the same character consistent across eleven pages - in eleven different scenes. The same red curls on page three as on page nine. The same yellow jacket.

Our solution: reference passing. The first approved character illustration is included as a reference image alongside every subsequent prompt. Combined with detailed written character specifications in each prompt, this creates enough consistency to work.

It is not flawless. Edwin: “On roughly one in eight pages, we automatically regenerate because character drift is noticeable. We have built in basic checks that flag when the hair colour has suddenly changed.”

The photo you upload becomes a second reference: the model uses it to approximate your child’s hair type, colour, and skin tone. After processing, the photo is automatically deleted. We do not store personal photos on our servers beyond the point of generation.

Colourful code on a monitor - the building blocks behind AI illustrations

When gpt-image-2 falls short

Honesty about limitations is part of a proper technical explanation. gpt-image-2 has known weak points.

Text within images rarely works well: we avoid prompts that require readable text inside the illustration itself. Photo-realistic likeness is not a strength of the model: if an accurate photographic resemblance to your child is the key requirement, a photo book or a hand-illustrated portrait by a professional illustrator is a better choice. Our honest comparison of a personalised book and a photoshoot explains when each format fits best.

Culturally specific clothing and settings tend to be less reliable: the model was trained predominantly on Western material. Very specific traditional dress or architectural styles can come out inconsistently.

There are also situations where a personalised book is not the right choice regardless of the technology. Our guide on when a personalised book does not fit covers this honestly.

Why it still works

Children do not require photographic accuracy. They require recognition. Red curly hair. A yellow coat. The dog from home. These recognisable markers - combined with their name in the story - trigger the “that’s me!” response that makes a personalised book feel different from a standard picture book.

A grandfather and grandson sharing a book together - the result the pipeline is aimed at

Edwin: “I tested the prototype first with a photo of my own daughter - red hair, freckles, blue eyes. When she recognised herself on the cover, I understood what we were actually building. She said: ‘that’s me, daddy.’ She did not look critically at the eye shape or the angle of the nose. She just saw herself.”

That is exactly where the technology is aimed. Interactive reading techniques build even more engagement once a child recognises themselves in the story - the personalised character becomes a prompt for real conversation.

Not magic, but careful engineering

gpt-image-2 does not “just” produce a children’s book. Behind each illustration are deliberate choices: which style descriptions to include, how to maintain character consistency, how the photo is processed as a reference, what automated checks run against each result. The technology delivers the pixel output; the pipeline determines whether the result is an illustration a child actually recognises themselves in.

The technology disappears when it works. That is the only thing that matters.

👉 Create your child’s book