There are many artificial intelligence (AI) tools on the market today that can take a user’s text or images and turn them into images or videos that match the initial prompt. A new patent reveals that audio could soon be an input option for bringing your visions to real life.
as discovered by MS power userthe United States Patent and Trademark Office (USPTO) 20 page document This paper was submitted by Microsoft on April 5, 2023 and published on October 10, 2024. This paper details a new AI-supported system that converts live audio into images.
Related article: Check out Adobe’s free AI video generator – how to try it out
This system takes audio live streams such as meetings and lectures and converts them into live text transcripts. The transcript is then summarized by a large-scale language model (LLM) and input into a text-to-image model, which generates an image and outputs it on screen, as shown in the image below.
The system continues to do this during the audio stream and continuously generates live images. According to Microsoft, viewing images in real time can increase people’s interest with visual aids, help them understand concepts, and make communication more effective.
“Displaying images related to information conveyed orally makes communications more engaging, memorable, and understandable, and improves communication efficiency,” Microsoft says.
Related article: Best AI chatbots in 2024: ChatGPT, Copilot, and worthy alternatives
If you’re wondering if this feature will be released soon, the answer is probably no. Filing a patent is a long road to developing a product or feature, and many patents remain ideas rather than reaching the commercialization stage.
However, if Microsoft decides to introduce this feature, it will be built into its video conferencing platform, Microsoft Teams, and will be accessible through its enterprise AI add-on Copilot, such as Copilot Pro and Microsoft 365 Copilot. Possibly.