Generative artificial intelligence (AI) has long grappled with producing accurate images, often struggling with details such as fingers and facial symmetry. Additionally, these models can struggle when generating images of various sizes and resolutions.

Rice University computer scientists have now developed a new approach for generating images using pre-trained diffusion models. These models, which “learn” by adding layers of random noise to training images and then generating new images by removing the noise, show promise in addressing these shortcomings.

“Diffusion models like Stable Diffusion, Midjourney, and DALL-E create impressive results, generating fairly lifelike and photorealistic images,” Haji Ali said. “But they have a weakness: They can only generate square images. So, in cases where you have different aspect ratios, like on a monitor or a smartwatch, that’s where these models become problematic.”

When using a model like Stable Diffusion to create non-square images, such as a 16:9 aspect ratio, repetitive elements can lead to strange deformities in the generated image. These deformities, like people with six fingers or elongated objects, can be off-putting. The training process of these models also plays a role in this issue.

Moayed Haji Ali, a Rice University computer science doctoral student, presents his work presents his poster at CVPR. Credit: Vicente Ordóñez-Román/Rice University

According to Vicente Ordóñez-Román, an associate professor of computer science, and Guha Balakrishnan, an assistant professor of electrical and computer engineering, if a model is only trained on images of a certain resolution, it will struggle to generate images of other resolutions due to overfitting. Overfitting occurs when an AI model becomes too focused on generating data similar to what it was trained on, limiting its ability to deviate from those parameters.

“You could solve that by training the model on a wider variety of images, but it’s expensive and requires massive amounts of computing power ⎯ hundreds, maybe even thousands of graphics processing units,” Ordóñez-Román said.

Haji Ali’s research indicates that the digital noise utilized by diffusion models can be classified into local and global signal types. While the local signal contains specific pixel-level information, such as the details of an eye or the texture of a dog’s fur, the global signal captures the overall outline of the image.

“One reason diffusion models need help with non-square aspect ratios is that they usually package local and global information together,” said Haji Ali, who worked on synthesizing motion in AI-generated videos before joining Ordóñez-Román’s research group at Rice for his Ph.D. studies. “When the model tries to duplicate that data to account for the extra space in a non-square image, it results in visual imperfections.”

The picture on the left was generated by a standard method while the picture on the right was generated by ElasticDiffusion. The prompt for both images was, “Envision a portrait of a cute scientist owl in blue and gray outfit announcing their latest breakthrough discovery. His eyes are light brown. His attire is simple yet dignified”. Credit: Moayed Haji Ali/Rice University

Ali’s innovative ElasticDiffusion method represents a departure from traditional approaches. By segregating local and global signals into conditional and unconditional generation paths, the method ensures that the signals remain distinct. This prevents visual imperfections, especially in non-square images, as the AI can accurately process the data without confusion.

Furthermore, the ElasticDiffusion method applies the unconditional path with local pixel-level detail to the image in quadrants, ensuring a cleaner image that is independent of the aspect ratio and requires no additional training. This approach streamlines the generation process and produces high-quality images, proving to be an important advancement in AI-generated content.

“This approach is a successful attempt to leverage the intermediate representations of the model to scale them up so that you get global consistency,” Ordóñez-Román said.

ElasticDiffusion lags behind other diffusion models in terms of time. At present, Haji Ali’s method takes 6-9 times longer to create an image. The objective is to decrease this time to match the inference speed of models such as Stable Diffusion or DALL-E.

“Where I’m hoping that this research is going is to define…why diffusion models generate these more repetitive parts and can’t adapt to these changing aspect ratios and come up with a framework that can adapt to exactly any aspect ratio regardless of the training, at the same inference time,” said Haji Ali.

Source

New diffusion model could make weird AI images a thing of the past

Are there lost worlds in earth’s interior?

Ghost of Tsushima Anime Announced; Coming to Crunchyroll in 2027

Nothing Phone (2a) Plus Receives Stable Android 15-Based Nothing OS 3.0 Update

Mma Ebom goes home to be with her maker

One Year of the Rev Eyoanwan Otu Girls in STEM Project: A Transformative Journey for Cross River’s Future

Our wars are over but wounds yet to heal – Bahumonu widow

C’River: There is no massive logging by any Chinese company in Effi community – Lawyer

You raised false alarm over whereabout of missing worker – Njoh tells Mark Prince

Interview: Top govt officials doing everything possible to see Ayade’s ex-aide in prison without due process followed – Human Rights Lawyer, Ukweni

Franka Undie, Pioneers AI Solutions in Healthcare, Garners Distinction and Special Award from foreign Varsity

Alaafin introduced talking drum (iya-ilu) and sekere (beaded gourd): Prince Siyanbola Oladigbolu

New diffusion model could make weird AI images a thing of the past

Related News