Generative AI’s supercharging goes beyond just text to images

A sequence of events, set in motion in July last year when Stability AI’s Stable Diffusion, Midjourney and OpenAI’s Dall-E text-to-image models were unlocked for users to try out, shows no signs of slowing down. A few months later, OpenAI opening up the large language model (LLM) chatbot ChatGPT triggered the conversation AI battles that have since seen Microsoft, Google and Amazon push for a response.

Canva’s newly updated AI suite includes nifty tools such as Magic Expand.

As we dissect the generative artificial intelligence (AI) space a year later, its theme is unambiguous. Imaging AI tools have developed significantly greater competence and can do a lot more, beyond generating images. Videos too, with tools such as Runway AI and Synthesia needing simple text inputs for video generation. Chatbots are smarter than ever, as the likes of Google Bard challenge the incumbents.

Alongside, more AI productivity focused models are leading to a unification of capabilities, and more relevant use cases. AI is certainly getting better at what it does.

Case in point, an AI generated version of actor Tom Hanks was seen in a social media advertisement promoting a dental plan. Much to Hanks’ annoyance, as he clarified in an Instagram post this week to 9.5 million followers. The actor is no stranger to AI versions of himself in the 2004 movie ‘The Polar Express’, and artificially de-aged in parts of the 2022 film ‘A Man Called Otto’.

Canva, a popular multi-platform designing suite which completes 10 years, is widening the scope of utility for users with a generative AI Magic Studio update. New tools are incoming for video edits, a text to image generator, ability to transform a presentation into a document or email, translate a presentation or document into more than 100 languages including Hindi and a powerful image editor.

“We are seeing with AI today, there are different products for photos, videos, text and all other components. With Canva we’ve been able to integrate that all into one simple platform and make it accessible,” Melanie Perkins, co-founder and CEO of Canva, tells HT.

Globalisation is a developing theme for chatbots and AI tools, in search of a wider user base. Google’s Bard chatbot can understand and reply in 40 languages, with Indian language support including Hindi, Tamil, Telugu, Bengali, Kannada, Malayalam, Marathi, Gujarati and Urdu.

Adobe’s popular Creative Cloud suite allows the company to leverage apps including Photoshop, Lightroom and Express, to integrate its Firefly generative AI tool. Firefly can work with 100 languages, including Gujarati, Hindi, Malayalam, Marathi, Nepali, Punjabi, Tamil and Telugu.

The need for safety mechanisms with image generators is becoming clear. There are enough examples of realistic looking, but AI generated images public figures doing rounds on social media, including Elon Musk’s robot wife and the Pope in a puffer jacket.

Getty Images’ new Generative AI commercial art tool will compete with Midjourney and Stable Diffusion. What stands out are claims that it is “commercially safer” than its rival platform – by that, they mean the tool won’t allow a user to generate images that may be classified as misinformation.

“We’ve listened to customers about the swift growth of generative AI – and have heard both excitement and hesitation – and tried to be intentional around how we developed our own tool,” said Grant Farhall, Chief Product Officer at Getty Images. This is powered by an AI model provided by tech company Nvidia.

It may not be the only tool taking this approach. With Dall-E 3 update, OpenAI promises more accurate and realistic images, as well as the ability to tweak an image further with a few words. There are important safety mechanisms in place, such as blocking the generation of images resembling public figures or any images that include hateful content.

OpenAI recently updated the GPT-4 LLM to receive images of something a user may see and decipher in detail. That’s a functionality third-party app developer Be My Eyes latched on to, developing an AI assistant called Be My AI. The app is gradually rolling out for iPhones and Android phones. Users with vision difficulties can take a photo of something around them, and the assistant will likely provide a detailed description.

“We are entering the next wave of innovation for accessibility technology powered by AI. We believe this new tool will provide people who are blind or have low vision with a better way to address everyday needs and acquire tools and visual descriptions never before possible,” says Mike Buckley, CEO of Be My Eyes.

Some of the use-cases Be My AI is already proficient with includes decoding difficult to read buttons and keys on gadgets or appliances, reading instruction manuals, describing outfits or patterns including colours, restaurant menus, bus numbers, images and posts on social media as well as detailing outdoor environments.

The need for an AI marketplace

While there are examples of companies developing tools using another’s AI models, such as Microsoft’s use of OpenAI’s GPT language model for the Bing chatbot, users still do not have an option of wider choice. Once that happens, utility will increase.

Canva’s Perkins talks about a three-pronged approach, which begins with significant investment in AI research, followed by integration of the best tools available and building an app ecosystem. That makes Canva an early mover in the marketplace format, with generative AI apps including Dall-E, Imagen by Google, MurfAI and Soundraw, now available to users.

This is something Microsoft is attempting to achieve too, with a wide spectrum of apps. Just days after making the Copilot assistant for Windows 11 available to millions of PC users globally, the image generator residing with the Bing AI chatbot has been updated to use OpenAI’s Dall-E 3 model.

A unique development nonetheless, since OpenAI’s own ChatGPT chatbot will get the model update later this month, but only for premium subscribers at first.

With a long term view on AI, the tech giant hopes adoption of the open plugin standard which OpenAI introduced for ChatGPT plugins, will make it easier to bolt more capabilities to Copilot. Developers will utilise one platform to build add-ons that work across consumer and business apps, including ChatGPT, Bing, Dynamics 365 Copilot and Microsoft 365 Copilot.

“We are entering a new era of AI, one that is fundamentally changing how we relate to and benefit from technology. With the convergence of chat interfaces and large language models you can now ask for what you want in natural language and the technology is smart enough to answer, create it or take action,” says Yusuf Mehdi, Corporate Vice President and Consumer Chief Marketing Officer at Microsoft.

Microsoft’s vision of an AI layer is outlined by how the Copilot implementation unfolds. AI in productivity apps including Word, PowerPoint and Excel apps will add tools for generating drafts, context, summarising and replying to emails or analysing data.

Big numbers accompany a generative AI space that is has caught everyone’s attention. Data released earlier this year by Next Move Strategy Consulting suggests that the global AI marketplace will be worth around $207 billion in 2023, and register a twentyfold increase to almost $1.8 trillion by 2030.

It should come as no surprise. Millions have since had their first tryst with a broad umbrella called generative artificial intelligence, or AI. Dall-E, for example, unlocked a million strong waitlist for its first beta version, last year. Within a month after ChatGPT went online for consumers in late November, 100 million had tried it already. That user base has widened significantly since.

“Exciting news! Hindustan Times is now on WhatsApp Channels

Subscribe today by clicking the link and stay updated with the latest news!” Click here!

Leave a ReplyCancel Reply