Learn about AUDIT: an instruction-oriented audio editing model based on underlying diffusion models

Wireless

Diffusion models are advancing rapidly and making life easier. From natural language processing and natural language understanding to computer vision, diffusion models have shown promise in almost every field. These models are a recent development in generative AI and are a type of deep generative model that can be used to generate realistic samples from complex distributions.

A new diffusion model was recently presented by researchers that can easily edit audio clips. Called AUDIT, the latent diffusion model is an instruction-directed audio editing model. Audio editing basically involves changing the audio input signal to produce an edited audio output. This includes tasks such as adding background sound effects, replacing background music, fixing patchy audio, or improving low-quality audio. AUDIT takes both audio input and human instructions as a condition and generates the edited audio output.

The researchers used trio data to train an audio editing diffusion model in a supervised manner. The three data used are instructions, audio input, and audio output. The audio input was used directly as a conditional input to ensure consistency in audio clips without editing. The editing instructions were also used directly as a text guide to make the model more flexible and suitable for real-world scenarios.

🚀 Join the fastest ML Subreddit community

The team of researchers behind AUDIT have summarized their contributions as follows −

  1. AUDIT is the first development in which the diffusion model is trained on audio editing, which takes human text instructions as a condition.
  2. The Data Building framework is designed to train audits in a supervised manner.
  3. AUDIT is able to maximize the preservation of audio clips that do not require editing.
  4. Proofreading works well with simple instructions as a text guide without the need for a detailed description of the editing intent.
  5. The audit achieved noteworthy results on both objective and subjective measures of a number of audio editing tasks.

The team shared some examples where the proofreading did a great job and meticulously edited the audios. This includes adding the sound of a car horn in the audio, replacing the sound of laughter with the sound of a horn, removing the sound of a woman speaking from the sound of someone whistling, and so on. AUDIT performed very well on audio editing tasks and showed impressive results on objective and subjective measures, including the following tasks.

  • Add a sound to an audio clip.
  • Drop or delete a sound from an audio clip
  • Replaces a sound event in the audio input with another sound.
  • audio inpainting: Completing a masked clip of audio based on context or a text prompt.
  • An ultra-precise task that converts low-sample input audio into high-sampled output audio.

In conclusion, internal proofreading appears to be a promising approach for the future that can streamline flexible and efficient audio editing by following human guidelines.


scan the paper And project. All credit for this research goes to the researchers on this project. Also, don’t forget to join 18k+ML Sub RedditAnd discord channelAnd Email newsletterwhere we share the latest AI research news, cool AI projects, and more.


Tania Malhotra is a final year from University of Petroleum and Energy Studies, Dehradun, and is pursuing a BTech in Computer Science Engineering with a specialization in Artificial Intelligence and Machine Learning.
She is passionate about data science and has good analytical and critical thinking, along with a keen interest in acquiring new skills, leading groups, and managing work in an organized manner.


🔥 MUST READ – What is an AI Hallucination? What’s going wrong with AI chatbots? How do you discover the presence of artificial intelligence hallucinations?

Source link

Post a Comment

Cookie Consent
We serve cookies on this site to analyze traffic, remember your preferences, and optimize your experience.
Oops!
It seems there is something wrong with your internet connection. Please connect to the internet and start browsing again.
AdBlock Detected!
We have detected that you are using adblocking plugin in your browser.
The revenue we earn by the advertisements is used to manage this website, we request you to whitelist our website in your adblocking plugin.
Site is Blocked
Sorry! This site is not available in your country.