Apple Researchers released a new model which allows users to describe in simple language what they want to change in a photo without ever touching photo editing software.
The MGIE model, which Apple worked on with the University of California, Santa Barbara, can crop, resize, flip and add filters to images via text prompts.
MGIE, which stands for MLLM-Guided Image Editing, can be applied to simple and more complex image editing tasks, such as changing specific objects in a photo to give them a different shape or make them brighter. The model mixes two different uses of multimodal language models. First, it learns to interpret user prompts. Then, he “imagines” what the retouching would look like (requesting a bluer sky in a photo means increasing the brightness of the sky part of an image, for example).
When editing a photo with MGIE, users simply enter what they want to edit about the image. The newspaper used the example of editing an image of a pepperoni pizza. Typing the prompt “make it healthier” adds vegetable toppings. A photo of tigers in the Sahara looks dark, but after asking the model to “add more contrast to simulate more light,” the image appears brighter.
“Instead of brief but ambiguous advice, MGIE derives explicit visual intent and leads to reasonable image editing. We conduct extensive studies on various aspects of publishing and demonstrate that our MGIE effectively improves performance while maintaining competitive efficiency. We also believe that the MLLM-guided framework can contribute to future research on vision and language,” the researchers said in the paper.
Apple has made MGIE available through GitHub for download, but has also released a web demo on Hugging Face Spaces, reports BusinessBeat. The company didn't say what its plans were for the model beyond research.
Some image generation platforms, like OpenAI's DALL-E 3, can perform simple photo editing tasks on the images they create via text input. Photoshop maker Adobe, which most people turn to for image editing, also has its own AI editing model. Its Firefly AI model powers generative fill, which adds generated backgrounds to photos.