**Martin Ruskov** @mapto@qoto.org · Mar 30, 2023, 08:22

**Martin Ruskov** @mapto@qoto.org · Mar 30, 2023, 08:22

Martin Ruskov @mapto@qoto.org

Mar 30, 2023, 08:22

Unlike the widespread #AIart, this task had the constraint to adhere to the original text, thus keep in check some peculiarities of generator models, such as #hallucinations. When working on my task, I quickly discovered the obvious: that #DescriptiveText-s are somewhat easier to generate for. (example RedCap). However, fairy tales contain much less descriptions than one might recall from child memories. So the challenge of the task was to generate illustrations also for #narrative. While it might be too ambitious to try to illustrate for a sequence of events (what would make a narrative), even describing a single event requires an interaction or a scene composition. However, interactions or compositions were notoriously difficult to get right by generative models. Until in mid-2022 Google's Parti (https://arxiv.org/abs/2206.10789) made a notable breakthrough by linking the image generation to (text) transformer models

**Martin Ruskov** @mapto@qoto.org · Mar 30, 2023, 08:22

**Martin Ruskov** @mapto@qoto.org · Mar 30, 2023, 08:22

Mar 30, 2023, 08:22

Martin Ruskov @mapto@qoto.org

Other models followed, in November came Midjourney v4 and in December Structured Diffusion Guidance (https://arxiv.org/abs/2212.05032). Better composition was notably easier to achieve. This allowed me to proceed with (and complete) my tasks of illustration of fairy tales. All resulting images can be seen in the paper, but the other important outcome is the definition of a preliminary process for the generation of images aligned to the original story text.

**Martin Ruskov** @mapto@qoto.org · Mar 30, 2023, 08:23

**Martin Ruskov** @mapto@qoto.org · Mar 30, 2023, 08:23

Mar 30, 2023, 08:23

Martin Ruskov @mapto@qoto.org

The first stage of the process is converting the intended text to a prompt, without deviating from the original vocabulary. This means condensing content into a single phrase, removing words that are not meant to be visualised, e.g. this, here, he, and substituting them with what they refer to.

d232fe96fb3e8da1.jpg

**Martin Ruskov** @mapto@qoto.org · 2023-03-30T08:28:26Z

Martin Ruskov @mapto@qoto.org

In the second stage, considering the outcome of the first step, I aimed to isolate parts of the prompt to be removed, added or replaced to improve the composition of the image, aiming to add important elements and to remove unwanted ones.

9c594b0cbed1cd0d.jpg

Mar 30, 2023, 08:28 · · · ·

**Martin Ruskov** @mapto@qoto.org · Mar 30, 2023, 08:30

**Martin Ruskov** @mapto@qoto.org · Mar 30, 2023, 08:30

Mar 30, 2023, 08:30

Martin Ruskov @mapto@qoto.org

Once composition is at least roughly right, the third stage to choose a style that helps the efficiency. One that possibly reduces hallucinations, yet eases interpretability by the viewer. For fairytales, "book illustration" is a possibility

908efb04b5cb4829.jpg

Resources

Developers

What is Mastodon?

qoto.org

More…