Unlike the widespread , this task had the constraint to adhere to the original text, thus keep in check some peculiarities of generator models, such as . When working on my task, I quickly discovered the obvious: that -s are somewhat easier to generate for. (example RedCap). However, fairy tales contain much less descriptions than one might recall from child memories. So the challenge of the task was to generate illustrations also for . While it might be too ambitious to try to illustrate for a sequence of events (what would make a narrative), even describing a single event requires an interaction or a scene composition. However, interactions or compositions were notoriously difficult to get right by generative models. Until in mid-2022 Google's Parti (arxiv.org/abs/2206.10789) made a notable breakthrough by linking the image generation to (text) transformer models

Other models followed, in November came Midjourney v4 and in December Structured Diffusion Guidance (arxiv.org/abs/2212.05032). Better composition was notably easier to achieve. This allowed me to proceed with (and complete) my tasks of illustration of fairy tales. All resulting images can be seen in the paper, but the other important outcome is the definition of a preliminary process for the generation of images aligned to the original story text.

Show thread
Follow

The first stage of the process is converting the intended text to a prompt, without deviating from the original vocabulary. This means condensing content into a single phrase, removing words that are not meant to be visualised, e.g. this, here, he, and substituting them with what they refer to.

In the second stage, considering the outcome of the first step, I aimed to isolate parts of the prompt to be removed, added or replaced to improve the composition of the image, aiming to add important elements and to remove unwanted ones.

Show thread

Once composition is at least roughly right, the third stage to choose a style that helps the efficiency. One that possibly reduces hallucinations, yet eases interpretability by the viewer. For fairytales, "book illustration" is a possibility

Show thread
Sign in to participate in the conversation
Qoto Mastodon

QOTO: Question Others to Teach Ourselves
An inclusive, Academic Freedom, instance
All cultures welcome.
Hate speech and harassment strictly forbidden.