Sunday, January 26, 2025

Google introduces Whisk, an experiment to remix ideas using images and AI: How it works

Date:


Google has launched its latest experiment in generative AI, named Whisk, a tool that aims to transform the creative process by allowing users to generate images through prompts based on other images.

Unlike traditional image generation tools that rely on detailed text descriptions, Whisk enables users to drag and drop images for the subject, scene, and style, and remix them to create unique visuals, said Google in its blog post.

As per the tech giant, the process is powered by Google’s Gemini model, which automatically generates a detailed caption based on the inputted images. These captions are then used to feed into Google’s Imagen 3, the company’s latest image generation model. Whisk’s approach captures the essence of the subject rather than producing an exact replica, enabling users to experiment with combinations in novel ways.

Google describes Whisk as a tool for rapid visual exploration, designed for users to quickly create and iterate on a wide range of visual concepts. The platform is not intended as a traditional image editor but as a space for creatives to explore ideas in a flexible, iterative manner, added the California-based company. The result is a mix of new possibilities, from digital plush toys to enamel pins and stickers.

However, Whisk’s ability to generate highly accurate images may be limited. As it extracts only a few key characteristics from the uploaded images, the final results may not always align with users’ expectations. For example, the generated subject might have subtle differences in attributes such as height, weight, or skin tone. Google acknowledges that these features can be important for users and provides the option to edit and refine the underlying prompts as needed.

The launch of Whisk follows Google’s introduction earlier this year of its video generation model, Veo, and the subsequent release of Veo 2 and the latest iteration of Imagen 3. Both Veo and Imagen 3 have been lauded for achieving state-of-the-art results in their respective fields and are now available in various Google tools, including VideoFX, ImageFX, and Whisk.



Source link

Share post:

spot_img

Popular

More like this
Related

Meta to Begin Testing Ads on Threads in the US and Japan

Meta Platforms will begin test launching ads on...

$96 million payday for Starbucks CEO Brian Niccol; beats packages of Tim Cook, Sundar Pichai

Starbucks chief executive Brian Niccol has received one...

I’m finding iPhone SE4 rumours more exciting than iPhone Air gossip – and that’s just weird

It’s only January and yet it already feels...