After ChatGPT hit the public a few years ago, there has been rapid growth in the AI field by several brands. Apple’s work so far in this regard has been low-key. However, this does not mean Apple has not made any progress in the AI field. Apple recently released a new open-source AI model called “MGIE”. This AI model can edit images based on natural language instructions. The full name of MGIE is MLLM-Guided Image Editing. It uses a multi-modal large language model (MLLM) to interpret user instructions and perform pixel-level operations. MGIE can understand natural language commands issued by users. It can perform operations such as Photoshop-style modifications, global photo optimization, and local editing.
Apple and researchers from the University of California, Santa Barbara are collaborating to publish MGIE-related research results. The results will be published at the 2024 International Conference on Learning Representations (ICLR). This conference is one of the top conferences for AI research in the world.
What is MLLM?
To properly understand MGIE, we must first discuss MLLM because MGIE uses MLLM. MLLM is a powerful AI model that can process text and images simultaneously, thereby enhancing instruction-based image editing capabilities. MLLMs have shown excellent capabilities in cross-modal understanding and visual perceptual response generation. However, it has not yet been widely used in image editing tasks.
MGIE integrates MLLMs into the image editing process in two ways. First, it uses MLLMs to derive expressive instructions from user input. The instructions are concise and provide clear guidance for the editing process.
For example, when inputting “make the sky bluer”, MGIE can generate the command “increase the saturation of the sky area by 20%”.
Second, it uses MLLM to generate visual imaginations, i.e., latent representations of the desired edits. This representation captures the essence of editing and can be used to guide pixel-level operations. MGIE employs a novel end-to-end training scheme that jointly optimizes instruction derivation, visual imagination, and image editing modules.
Features of MGIE
MGIE can handle a variety of editing situations, from simple colour adjustments to complex object manipulation. The model can also perform global and local editing based on the user’s preferences. Some of the features and functionality of MGIE include:
Expressive Instruction-Based Editing: MGIE can generate concise and clear instructions to effectively guide the editing process. This not only improves editing quality but also enhances the overall user experience.
Photoshop Style Editing: MGIE can perform common Photoshop style editing such as cropping, resizing, rotating, flipping and adding filters. The mockup can also apply more advanced edits, such as changing the background, adding or removing objects, and blending images.
Gizchina News of the week
Global Photo Optimization: MGIE can optimize the overall quality of your photos, such as brightness, contrast, sharpness, and colour balance. The model can also apply artistic effects such as sketching, painting and caricature.
Local Editing: MGIE can edit specific areas or objects in an image, such as the face, eyes, hair, clothes, and accessories. The model can also modify the properties of these areas or objects, such as shape, size, colour, texture, and style.
MGIE is an open-source project on GitHub. Users can find code, data and pre-trained models here. The project also provides a demo notebook showing how to use MGIE to complete various editing tasks.
Implications and Future Prospects
The release of MGIE highlights Apple’s growing prowess in AI research and development. This new tool not only has practical applications for personal and professional image editing purposes, such as social media, e-commerce, education, entertainment, and art, but also represents a significant advance in multimodal AI. The model’s open-source nature and availability on platforms like GitHub and Hugging Face Spaces indicate its potential for further research and development beyond its current state.
In conclusion, Apple’s recent release of the MGIE (MLLM-Guided Image Editing) model marks a significant milestone in the field of artificial intelligence and image editing. Leveraging the power of multi-modal large language models (MLLMs), MGIE enables users to perform sophisticated image editing tasks through natural language instructions. This innovative approach, developed in collaboration with researchers from the University of California, Santa Barbara, demonstrates Apple’s commitment to advancing AI technology and its practical applications.
The integration of MLLMs into the image editing process not only enhances user experience but also opens up new possibilities for creative expression and productivity. MGIE’s ability to understand and execute complex editing commands, from simple colour adjustments to intricate object manipulations, sets a new standard for AI-driven image editing tools. Furthermore, its open-source nature fosters collaboration and innovation within the research community, paving the way for future advancements in multimodal AI and image-processing techniques.
As MGIE continues to evolve and gain traction among developers and users alike, its implications extend beyond personal and professional image editing scenarios. Its availability on platforms like GitHub and Hugging Face Spaces underscores its potential for broader applications across various domains, including social media, e-commerce, education, entertainment, and digital art.
In essence, the release of MGIE underscores Apple’s dedication to pushing the boundaries of AI technology while empowering users with intuitive and powerful tools for creative expression and visual storytelling. As AI-driven innovations continue to shape the digital landscape, MGIE stands as a testament to the transformative potential of collaborative research and interdisciplinary innovation in the pursuit of technological excellence.
Efe Udin is a seasoned tech writer with over seven years of experience. He covers a wide range of topics in the tech industry from industry politics to mobile phone performance. From mobile phones to tablets, Efe has also kept a keen eye on the latest advancements and trends. He provides insightful analysis and reviews to inform and educate readers. Efe is very passionate about tech and covers interesting stories as well as offers solutions where possible.