Two monkeys are piloting an airplane.
Two cats watering roses in a greenhouse.
a toy poodle as a rocket scientist.
a tower of cheese.
A painting of a koala wearing a princess dress and crown, with a confetti background.
Harry potter as a cat, pixar style, octane render, HD, high-detail.
Gnomes are playing music during Independence Day festivities in a forest near Lake George.
paw patrol. ’This is some serious gourmet’. 2 dogs holding mugs.
Disease Monitoring: Through big data technology, trends in specific diseases can be monitored and predicted, thus improving disease prevention and treatment effectiveness.
A small green dinosaur toy with orange spots standing on its hind legs and roaring with its mouth open.
Award-winning Kawaii illustration of a cat samurai, holding two swords, background cyberpunk Styles, 4k, golden hour, cinematic light.
A slime monster.
crop top skinny russian 12 years old teen girl at the water mountain, HDR magazine photo.
A young woman witch cosplaying with a magic wand and broom, wearing boots, and posing in a full body shot with a detailed face.
A happy daffodil with big eyes, multiple leaf arms and vine legs, rendered in 3D Pixar style.
A 3D Rendering of a cockatoo wearing sunglasses. The sunglasses have a deep black frame with bright pink lenses. Fashion photography, volumetric lighting, CG rendering.
The image is a portrait of Homer Simpson as a Na’vi from Avatar, created with vibrant colors and highly detailed in a cinematic style reminiscent of romanticism by Eugene de Blaas and Ross Tran, available on Artstation with credits to Greg Rutkowski.
Anthropomorphic beagle dog wearing steampunk time traveller outfit, clocks and large round window above, photoreal epic composition, old world deco, tv commercial, sebastian kruger, artem, epic lighting, by Heinz Anger, wow factor, aardman animations, blocking the sun, very artistic pose, alexander abdulov.
Chic Fantasy Compositions, Ultra Detailed Artistic, Midnight Aura, Night Sky, Dreamy, Glowing, Glamour, Glimmer, Shadows, Oil On Canvas, Brush Strokes, Smooth, Ultra High Definition, 8k, Unreal Engine 5, Ultra Sharp Focus, Art By magali villeneuve, rossdraws, Intricate Artwork Masterpiece, Matte Painting Movie Poster.
Full Portrait of Consort Chunhui by Giuseppe Castiglione, symmetrical face, ancient Chinese painting, single face, insanely detailed and intricate, beautiful, elegant, artstation, character concept in the style illustration by Miho Hirano, Giuseppe Castiglione –ar 9:16.
Text-to-image diffusion model alignment is critical for improving the alignment between the generated images and human preferences. While training-based methods are constrained by high computational costs and dataset requirements, training-free alignment methods remain underexplored and are often limited by inaccurate guidance.
We propose a plug-and-play training-free alignment method, DyMO, for aligning the generated images and human preferences during inference. Apart from text-aware human preference scores, we introduce a semantic alignment objective for enhancing the semantic alignment in the early stages of diffusion, relying on the fact that the attention maps are effective reflections of the semantics in noisy images. We propose dynamic scheduling of multiple objectives and intermediate recurrent steps to reflect the requirements at different steps.
Experiments with diverse pre-trained diffusion models and metrics demonstrate the effectiveness and robustness of the proposed method.
The framework of our method. (a) Given a user prompt, we use the LLMs to identify the entities and corresponding attributes for knowledge graph construction. Then we design a semantic alignment objective via cross attention map alignment based on graph, cooperating with a pre-trained preference model to dynamically guide the denoising process for high-quality image generation. (b) The entire denoising process of one-step predicted clean images under the guidance of our method.
Methods | PickScore | HPSv2 | ImageReward | Aesthetics |
---|---|---|---|---|
SD V1.5 | 20.73 | 0.2341 | 0.1697 | 5.337 |
DNO | 20.05 | 0.2591 | -0.3212 | 5.597 |
PromptOpt | 20.26 | 0.2490 | -0.3366 | 5.465 |
FreeDom | 21.96 | 0.2605 | 0.3963 | 5.515 |
AlignProp | 20.56 | 0.2627 | 0.1128 | 5.456 |
Diffusion-DPO | 20.97 | 0.2656 | 0.2989 | 5.594 |
Diffusion-KTO | 21.15 | 0.2719 | 0.6156 | 5.697 |
SPO | 21.46 | 0.2671 | 0.2321 | 5.702 |
SD V1.5+Ours | 23.07 | 0.2755 | 0.7170 | 5.831 |
Methods | PickScore | HPSv2 | ImageReward | Aesthetics |
---|---|---|---|---|
SDXL | 21.91 | 0.2602 | 0.7755 | 5.960 |
DNO | 22.14 | 0.2725 | 0.9053 | 6.042 |
PromptOpt | 21.98 | 0.2708 | 0.8671 | 5.881 |
FreeDom | 22.13 | 0.2719 | 0.7722 | 5.908 |
SDXL+Ours | 24.90 | 0.2839 | 1.074 | 6.138 |
Diffusion-DPO | 22.30 | 0.2741 | 0.9789 | 5.891 |
Diff-DPO+Ours | 24.46 | 0.2836 | 1.049 | 6.116 |
SPO | 22.81 | 0.2778 | 1.082 | 6.319 |
SPO+Ours | 23.85 | 0.2821 | 1.166 | 6.278 |
SD V3.5 | 21.93 | 0.2726 | 0.9697 | 5.775 |
FLUX | 22.04 | 0.2760 | 1.011 | 6.077 |
The entire denoising process of SD V1.5 and DyMO, including the noisy images and one-step predicted clean images at step t, respectively.
@article{xin2024dymo,
title={DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling},
author={Xie, Xin and Gong, Dong},
journal={arXiv preprint arXiv:2412.00759},
year={2024}
}