We don’t need to emphasize how much the emergence of Chat GPT-3 has excited spirits on a global scale. And what awaits us next? Excitement is building in the community, as CTO of Microsoft Germany, Andreas Braun, recently announced that GPT-4 will be launched next week. If Google isn’t worried, they should start worrying now.

Are you ready for GPT-4?

In a statement a few days ago, he said : “We will introduce GPT-4 next week, there we will have multimodal models that will offer completely different capabilities – for example videos.”

There is currently no announcement of where all GPT-4 will appear. But Azure-OpenAI was specifically mentioned. GPT-4 is expected to be a significant improvement over GPT-3.5, with a much larger context window, greater reliability, and new features to come.

GPT-4 is multimodal

The big takeaway from the announcement is that GPT-4 is multimodal. GPT-3 and GPT-3.5 only worked in one modality, text. According to a German newspaper report, GPT-4 could function in at least four modalities, image, sound, text and video.

Brown went on to add that GPT-4 is a “game changer” because machines are now learning to understand natural language and then statistically understand what was previously readable and comprehensible only to humans. Meanwhile, technology has come so far that it basically “works in all languages”. You can ask a question in German and get an answer in Italian. With multimodality, Microsoft(-OpenAI) will “make modalities inclusive”.

What can we expect?

Despite being one of the most anticipated AI news, there is little public information about GPT-4. What will it be like, what are its characteristics or abilities? According to Brown, GPT-4 will offer new capabilities, such as video processing, as well as multimodality (the tool will not only analyze and produce text).

The future of deep learning is multimodality. The human brain is multisensory, because we live in a multimodal world. Seeing the world in single modalities greatly limits the ability of artificial intelligence to navigate or understand. Good multimodal models are much more difficult to build than those based only on language or only on visual content.

It remains to be seen how likely it is that GPT-4 will be truly multimodal, in the truest sense of the word. Multimodal in the sense that you can give it verbal instructions, you can upload images, you can give it any input and it understands it and produces whatever you want in that context. If it really works on that principle, then an even more exciting period awaits us.

For more news and interesting stories , visit our blog page or follow our Instagram profile.

If you need Google advertising, please contact us at the links below:

Google Search Advertising>
Google Display Advertising>
Google Video Advertising>