By Patrick Mullady, Creative Engineer, Experience Design at FCB Health New York and Joseph Graiff, Director, Creative Engineering, Experience Design at FCB Health New York
An artificial intelligence anecdote
We often underestimate artificial intelligence (AI). Here is a Tweet that encompasses that:
What is Text-to-Image (TTI)?
In short TTI, Text-to-Image AI or text-to-image “prompting” is a description in your natural language. You describe the image, and the AI makes it for you. It’s that simple.
For example, Dall-E 2 (Dall-E 2, from Open AI, is an AI platform that can create realistic images from a description in natural language), as shown in the video below.
Current (popular) AI Text-to-Image platforms
There are other TTI platforms as well:
Wombo Dream (desktop/mobile), NightCafe, Lexica (reverse prompt lookup) ArtBreeder Collage, Craiyon, Pinegraph, TikTok AI, and the list goes on. Take a look at this comprehensive list of platforms.
Since early 2021, there have been a lot of developments with deep-learning models that can create images from natural language text. Researchers from Stability AI, OpenAI, Midjourney, Google, Facebook and others are developing TTI platforms. These tools essentially remove the requirement for traditional skills like painting and drawing, and instead rely on imagination, creativity, language and taste to make images.
The idea of "painting with words" represents a unique cultural shift. With TTI AI platforms on the rise, what will the future be like if every person can be an artist? If AI-generated art becomes accessible to everyone, will we eventually reject it and long for that human imperfection that only humans are capable of? Will we even be able to tell the difference? Will there be regulations because of deepfakes?
There are still a lot of unanswered questions with these platforms and these tools could have consequences that have not yet been resolved or foreseen. The video below is a great summary that answers some of these questions.
Training the neural networks
The datasets of images are curated from multiple sources. Models like Stable Diffusion “look at” image datasets during training only; they do not store and cannot reproduce the original images. So, when you create an image with your text prompt, it will be inspired from the trained model; however, you will always get unique results. This article explains how Stable Diffusion is trained.
One could argue that if you are an artist, you have looked at (self-training) a lot of art to be inspired and you based your style on what inspired you.
Numbers of images trained for each platform
Life Architect provides information on text-to-image training dataset sizes in this chart.
An “AI Renaissance” - augmenting the creative process
Currently, the popular TTI platforms are led by “small-tech” companies. Tech giants such as Google and Meta have announced platforms, but they are not currently open to public use. What will happen if Google, Meta, Microsoft, Adobe and Amazon release TTI platforms? Whether we like it or not, the way we create and experience content is changing. These tools will have an impact on the artist and creator community, and it has already started. Jason Allen's AI-generated art won first place in the Colorado State Fair arts competition.
From stock photography to storyboarding for movies to video post-production, these tools will augment our workflow and enable collaboration at a much faster rate. For example, stock art and TTI platforms will allow users to quickly create custom stock art by using natural language in real-time. Using modifiers such as lens type, camera type, aspect ratio and art style will give users unlimited possibilities.
Below is an example of how to write a prompt for the TTI platforms. This example is from Midjourney and the first Image Prompt gives the AI an example image to start with. This is optional but nice if you have existing art that you want the AI to reference. The Text Prompt is all about your creative writing and the description of your imagination. The Parameters can get technical, but this is where you can set aspect ratio, styles, etc. There are tons of different Parameters to customize your creation.
The full URL would appear as: https://example/tulip.jpga field of tulips in the style of Mary Blair--nofarms --iw.5 --ar 3:2
TTI technology is also being built directly into mobile apps and now there are plugins, Figma and Adobe Photoshop. The image below was created with the Stable Diffusion plugin in Photoshop in 30 seconds! These technologies will fundamentally change the art world and the way we work.
And with the introduction of “in-painting” and “out-painting,” imagine being able to edit an image or video after it has been created. You can zoom, pan and introduce new elements just by using your words.
What are the Terms of Service/copyright?
“Technology moves faster than the law, and artificial intelligence is no exception. Therefore, we currently sit at a crossroads, with many different options available to us. Authorities are unsure on how to move forward with AI copyright, and more cases like Thaler’s will be in front of the U.S. Copyright Board sooner rather than later.”1
Creators are using these platforms despite the lack of clarity surrounding copyright. But whether to use an image generated by these platforms should be evaluated on a case-by-case basis. The decision to use the platform is not a “one size fits all” decision. After an analysis of the proposed usage and risk, someone with authority to decide at the applicable agency should make the decision to move forward.
We have become accustomed to putting our content on social platforms. Here is an excerpt from Facebooks Terms of Service, “To provide our services, though, we need you to give us some legal permissions to use that content. Specifically, when you share, post, or upload content that is covered by intellectual property rights (like photos or videos) on or in connection with our products, you grant us a non-exclusive, transferable, sub-licensable, royalty-free and worldwide license to host, use, distribute, modify, run, copy, publicly perform or display, translate and create derivative works of your content.”
So basically, to use Facebook, you give them the right to use your photos for anything they want, such as creating an image dataset for a text-to-image generation platform.
Should we stop putting our content on social platforms?
There is no doubt that quickly advancing TTI AI technology is paving the way for unprecedented opportunities for instant editing and generated creative output.
As these platforms mature, we will see more use-cases in healthcare and mental health. For example, Emad Mostaque the founder of Stability AI (Stable Diffusion) has Aphantasia, a condition where you cannot create mental images in your mind. He can now use his words to imagine.
Watch this Ted Talk "Aphantasia: Seeing the world without a mind's eye."
The below examples show how you can use TTI to create healthcare related art with just words.
Lungs made out of flowers
The impact of amplifying the immune system in the universe
There are also many challenges ahead, ranging from questions about ethics and bias to issues around copyright and ownership. The sheer amount of GPU power required to train these massive models may also limit the scope of the work to certain significant and well-financed/resourced companies.
There is also no question that these TTI AI platforms stand on their own as a way for anyone to let their imaginations run wild. The ancient Greek philosopher Heraclitus reportedly said, "The only constant is change."