ai:sight > Volume 8 > Pictures with perspective

Pictures with perspective

Bringing local languages to GenAI images

Suraj Amonkar

Chief AI Research & Platforms Officer, Fractal

Ritesh Thakur

Principal Research Architect AI Research & Platform, Kalaido, Fractal

Bringing local languages and contexts into GenAI images

A picture really can speak a thousand words. That’s the secret of a real masterpiece, whether its purpose is art or advertising. By layering elements like subject, background, actions and expressions, the artist creates something that stimulates the senses, emotions and intellect.

Recently, generative AI (GenAI) has provided various tools that can generate images based on a few descriptive words. It provides a paintbrush for unbounded creativity – as long as you know how to make the right brushstrokes.

That can be harder than it seems. Many people find it difficult to come up with the right idea for their image. Even when they do, they may not know how to provide the most effective text prompt for the GenAI.

On top of that, the quality of the images can depend on where in the world you are. This is because much of the internet has developed in Western, English-speaking regions. Text-to-image GenAI uses internet datasets, so the results often lean towards those Western contexts. For users in other regions, using a different language or trying to inject some local color often yields poorer pictures.

These are the challenges we set out to address with Kalaido, Fractal’s own text-to-image platform, which is available in beta for anyone to use, free of charge.

A wider GenAI world

India provided a logical starting point for broadening the technology’s linguistic and contextual range. As well as being home to Fractal and many of its employees, this vast country has 22 official languages and several different writing systems. If Kalaido can generate high-quality images from these inputs, it will be able to do the same for other regions that are not English-first.

Right now, Kalaido can generate images with Indian context, from text prompts in 17 Indian languages as well as English. This capability is a huge hit with users of all ages. However, it is just the first step in a longer journey.

Currently, the platform’s results show more familiarity with tourist spots than with lesser-known streets and locations. We have achieved our goal of making the technology more accessible to Indian users. Now we must continue our work to make sure the platform understands Indian languages and contexts as well as possible.

Kick-starting the creative conversation

Another key goal was to make sure Kalaido was accessible for anyone to use. One way to do that was to make sure the platform is simple and comfortable to use on a mobile phone. This has attracted people from all age groups to engage with the platform. An email address is all that is needed to log onto the platform. It is also the only piece of data we collect, so there are no worries about data security.

Next, we want to help users to blast through any creative blocks. Behavioral science tells us that finding a place to start the creative process is often the hardest part. We developed Kalaido’s ‘Enhance your prompt’ feature to provide that jumping-off point.

Say, for example, you want to create an image to promote a vintage clothing sale in aid of a local charity. You tell Kalaido to create an image of a cat in a hat. The results lack a certain something, but you can’t think what it is. No problem – simply click the option to enhance your prompt and Kalaido will generate details such as the setting, weather conditions, the cat’s mood and appearance, and the type of hat it is wearing. By refining those details – adding some jewelry, perhaps, or changing the suggested background from a green field to a local landmark – you can create an image that perfectly suits your purpose. Kalaido has helped to kick-start the creative conversation so you can continue it.

An ongoing journey

Ultimately, we are developing Kalaido with users, for users – and so far, their response has been overwhelmingly positive. Many are comparing this beta-stage release with other high-profile text-to-image generation tools and finding it just as good, in some cases better.

Users are also helping us to refine the platform. This doesn’t just come from the feedback they provide on what works well and what can be improved. We’re seeing interesting trends in the way they use Kalaido and the concepts they are fusing together in their images. Those trends suggest exciting possibilities for future use cases.

As Kalaido’s journey continues, our immediate focus is on perfecting the platform’s Indianization, as well as tackling issues that are common across the GenAI industry.

One ongoing challenge is ensuring that the images can’t be used in inappropriate ways. Kalaido has guardrails to protect against violent or explicit material in pictures and prompts, for example, and it watermarks images of well-known people to identify them as AI-generated. As GenAI develops, so do the potential issues, so we’ll continue to refine these measures and introduce new ones as necessary.

We are also exploring ways to improve the platform’s handling of very complex prompts. Looking further ahead, capabilities like video generation could also be a possibility. By perfecting our model and growing it organically with our users, we are building a foundation for greater creativity in the future.

Visit https://kalaido.ai/ to generate your own images with Kalaido.

Rate this Article

Tags

Contributors

Suraj Amonkar

Chief AI Research & Platforms Officer, Fractal

Ritesh Thakur

Principal Research Architect AI Research & Platform, Kalaido, Fractal

Related Reads