OpenAI has today announced GPT-4, the next-generation AI language model that can read photos and explain what’s in them, according to a research blog post.
Axar.az reports citing foreign media that Chat GPT-3 has taken the world by storm but up until now the deep learning language model only accepted text inputs. GPT-4 will accept images as prompts too.
“It generates text outputs given inputs consisting of interspersed text and images,” OpenAI write today. “Over a range of domains - including documents with text and photographs, diagrams, or screenshots - GPT-4 exhibits similar capabilities as it does on text-only inputs.”
What this means in practice is that the AI chatbot will be able to analyze what is in an image. For example, it can tell the user what is unusual about the below photo of a man ironing his clothes while attached to a taxi.
Last week, Microsoft Germany Chief Technical Officer Andreas Braun said that GPT-4 will “offer completely different possibilities - for example, videos.”
However, per today’s announcement, there has been no mention of video within GPT-4 and the only multi-modal element is the inputting of images - far less than what was expected.