Методи забезпечення консистентності генерації в дифузійних моделях

Кулик, Л. Р.; Мокін, О. Б.; Kulyk, L. R.; Mokin, O. B.

dc.contributor.author	Кулик, Л. Р.	uk
dc.contributor.author	Мокін, О. Б.	uk
dc.contributor.author	Kulyk, L. R.	en
dc.contributor.author	Mokin, O. B.	en
dc.date.accessioned	2024-11-16T12:37:41Z
dc.date.available	2024-11-16T12:37:41Z
dc.date.issued	2024
dc.identifier.citation	Кулик Л. Р., Мокін О. Б. Методи забезпечення консистентності генерації в дифузійних моделях // Вісник Вінницького політехнічного інституту. 2024. Вип. 4. С. 75–85.	uk
dc.identifier.issn	1997-9266
dc.identifier.uri	https://ir.lib.vntu.edu.ua//handle/123456789/43563
dc.description.abstract	The article investigates the problem of consistent generation in diffusion models. Modern generative diffusion models are capable of creating high-precision images, but maintaining the consistency between the related generation results remains a challenging task. The key methods for ensuring generation consistency are analyzed. Additionally, a new type of consistency is introduced — conceptual consistency, which allows for assessing the models’ ability not only to reproduce existing styles and objects but also to generate entirely new visual ideas that the model has never encountered during training. The existing methods for ensuring consistency are analyzed, and their advantages and disadvantages are identified. The image-to-image generation method based on an input reference image has the advantage of simplicity in implementation. Fine-tuning methods like DreamBooth and LoRA DreamBooth provide broader control over object consistency. ControlNet models ensure shape consistency using a special input image that serves as a guide shape in the reverse diffusion process. Noise inversion methods allow for more precise control and iterative refinement of the resulting images through manipulations with the noise space, enabling the generation of more stylistically and conceptually consistent images. The StyleAligned method, using a shared attention mechanism, can ensure the stylistic consistency of generated images. Understanding the capabilities and limitations of methods for ensuring diffusion generation consistency allows for selecting the most effective set of tools according to the task at hand. Diffusion models continue to evolve and expand into new areas, so achieving reliable and universal consistency in diffusion models could pave the way for even more creative and effective solutions.	en
dc.description.abstract	Досліджено проблему консистентної генерації в дифузійних моделях. Сучасні генеративні дифузійні моделі здатні створювати зображення високої точності, але підтримання консистентності між спорідненими результатами генерації залишається складним завданням. Проаналізовано ключові методи забезпечення консистентності генерації. При цьому введено додатковий тип консистентності — консистентність концепції, що дозволяє оцінити здатність моделей не тільки відтворювати існуючі стилі та об`єкти, а й генерувати абсолютно нові візуальні ідеї, з якими модель ніколи не стикалася під час навчання. Проведено аналіз наявних методів забезпечення консистентності та визначено їхні переваги та недоліки. Метод генерації на базі вхідного еталонного зображення image-to-image має перевагу в простоті реалізації. Такі методи дотренування, як DreamBooth і LoRA DreamBooth, забезпечують ширший контроль над консистентністю об`єктів. Моделі ControlNet за допомогою спеціального вхідного зображення забезпечують консистентність форми. Методи інверсії шуму, дозволяють здійснити точніший контроль та ітеративне вдосконалення підсумкових зображень за рахунок маніпуляцій з шумовим простором, що дозволяє генерувати стилістичніше та концептуально консистентні зображення. Завдяки механізму спільної уваги, що застосовується в методі StyleAligned, може забезпечуватись стилістична консистентність згенерованих зображень. Розуміння можливостей та обмежень методів забезпечення консистентності дифузійної генерації дозволяє обрати найефективніший набір інструментів відповідно до задачі. Дифузійні моделі продовжують активно розвиватися та поширюватися на нові галузі, тому досягнення надійної та універсальної консистентності в дифузійних моделях може дати шлях для креативніших та ефективніших рішень.	uk
dc.language.iso	uk_UA	uk_UA
dc.publisher	ВНТУ	uk
dc.relation.ispartof	Вісник Вінницького політехнічного інституту. Вип. 4 : 75–85.	uk
dc.relation.uri	https://visnyk.vntu.edu.ua/index.php/visnyk/article/view/3070
dc.subject	глибоке навчання	uk
dc.subject	генерація зображень	uk
dc.subject	генеративні дифузійні моделі	uk
dc.subject	консистентність генерації	uk
dc.subject	консистентність концепції	uk
dc.subject	deep learning	en
dc.subject	image generation	en
dc.subject	generative diffusion models	en
dc.subject	generation consistency	en
dc.subject	conceptual consistency	en
dc.title	Методи забезпечення консистентності генерації в дифузійних моделях	uk
dc.title.alternative	Methods for Ensuring Consistent Generation in Diffusion Models	en
dc.type	Article, professional native edition
dc.type	Article
dc.identifier.udc	004.054:[004.032.26+004.85]
dc.relation.references	Chenshuang Zhang, Chaoning Zhang, et al., “Text-to-image Diffusion Models in Generative AI: A Survey,” in arXiv eprints, 2023. [Online]. Available: https://arxiv.org/abs/2303.07909 . Accessed on: April 29, 2024.	en
dc.relation.references	Dustin Podell, Zion English, et al., “SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis,” in arXiv e-prints, 2023. [Online]. Available: https://arxiv.org/abs/2307.01952 . Accessed on: April 29, 2024	en
dc.relation.references	Ling Yang, Zhilong Zhang, et al., “Diffusion Models: A Comprehensive Survey of Methods and Applications,” in arXiv e-prints, 2022. [Online]. Available: https://arxiv.org/abs/2209.00796 . Accessed on: April 29, 2024	en
dc.relation.references	Omri Avrahami, Amir Hertz, et al., “The Chosen One: Consistent Characters in Text-to-Image Diffusion Models,” in arXiv e-prints, 2023. [Online]. Available: https://arxiv.org/abs/2311.10093 . Accessed on: April 29, 2024	en
dc.relation.references	Jonathan Ho, Ajay Jain, and Pieter Abbeel, “Denoising Diffusion Probabilistic Models,” in arXiv e-prints, 2020. [Online]. Available: https://arxiv.org/abs/2006.11239 . Accessed on: April 29, 2024	en
dc.relation.references	Yong-Hyun Park, Mingi Kwon, et al., “Understanding the Latent Space of Diffusion Models through the Lens of Riemannian Geometry,” in arXiv e-prints, 2023. [Online]. Available: https://arxiv.org/abs/2307.12868 . Accessed on: April 29, 2024.	en
dc.relation.references	Olaf Ronneberger, Philipp Fischer, and Thomas Brox, “U-Net: Convolutional Networks for Biomedical Image Segmentation,” in arXiv e-prints, 2015. [Online]. Available: https://arxiv.org/abs/1505.04597 . Accessed on: April 29, 2024	en
dc.relation.references	Diederik P. Kingma, Max Welling, et al., “An Introduction to Variational Autoencoders,” in arXiv e-prints, 2019. [Online]. Available: https://arxiv.org/abs/1906.02691 . Accessed on: April 29, 2024	en
dc.relation.references	Fan Judith. E., Bainbridge Wilma. A., et al, “Drawing as a versatile cognitive tool,” Nature Reviews Psychology, 2023. https://doi.org/10.1038/s44159-023-00212-w .	en
dc.relation.references	G. Greenberg, “Semantics of pictorial space,” Springer Link, 2021. https://doi.org/10.1007/s13164-020-00513-6 .	en
dc.relation.references	Gihyun Kwon, and Jong Chul Ye, “Diffusion-based Image Translation using Disentangled Style and Content Representation,” in arXiv e-prints, 2022. [Online]. Available: https://arxiv.org/abs/2209.15264 . Accessed on: April 29, 2024.	en
dc.relation.references	Aaron Hertzmann, “Toward a theory of perspective perception in pictures,” Journal of Vision, 2024. https://doi.org/10.1167/jov.24.4.23 .	en
dc.relation.references	Chenlin Meng, Yutong He, et al., “SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations,” in arXiv e-prints, 2021. [Online]. Available: https://arxiv.org/abs/2108.01073 . Accessed on: April 29, 2024	en
dc.relation.references	Nataniel Ruiz, Yuanzhen Li, et al., “DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation,” in arXiv e-prints, 2022. [Online]. Available: https://arxiv.org/abs/2208.12242. Accessed on: April 29, 2024	en
dc.relation.references	Edward J. Hu, Yelong Shen, et al., “LoRA: Low-Rank Adaptation of Large Language Models,” in arXiv e-prints, 2021. [Online]. Available: https://arxiv.org/abs/2106.09685 . Accessed on: April 29, 2024	en
dc.relation.references	Lvmin Zhang, Anyi Rao, et al., “Adding Conditional Control to Text-to-Image Diffusion Models,” in arXiv e-prints, 2023. [Online]. Available: https://arxiv.org/abs/2302.05543. Accessed on: April 29, 2024.	en
dc.relation.references	] Ron Mokady, Amir Hertz, et al., “Null-text Inversion for Editing Real Images using Guided Diffusion Models,” in arXiv e-prints, 2022. [Online]. Available: https://arxiv.org/abs/2211.09794 . Accessed on: April 29, 2024	en
dc.relation.references	Inbar Huberman-Spiegelglas, Vladimir Kulikov, et al., “An Edit Friendly DDPM Noise Space: Inversion and Manipulations,” in arXiv e-prints, 2023. [Online]. Available: https://arxiv.org/abs/2304.06140 . Accessed on: April 29, 2024	en
dc.relation.references	Amir Hertz, Andrey Voynov, et al., “Style Aligned Image Generation via Shared Attention,” in arXiv e-prints, 2023. [Online]. Available: https://arxiv.org/abs/2312.02133. Accessed on: April 29, 2024	en
dc.relation.references	Xun Huang, Serge Belongie, “Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization,” in arXiv e-prints, 2017. [Online]. Available: https://arxiv.org/abs/1703.06868 . Accessed on: April 29, 2024.	en
dc.identifier.doi	https://doi.org/10.31649/1997-9266-2024-175-4-75-85

Файли в цьому документі

Ім'я:: 164174.pdf
Розмір:: 1.169Mb
Формат:: PDF

Відкрити

Даний документ включений в наступну(і) колекцію(ї)

Наукові роботи каф. САІТ [428]
статті, матеріали конференцій
Вісник Вінницького політехнічного інституту. 2024. № 4 [8]

Показати скорочену інформацію