Machine Learning in Linux: Bark - Text-Prompted Generative Audio
  | 
 
     
      
        
          
            
              | 
              One of the standout machine learning apps is Stable Diffusion, a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. We’ve explored quite a few hugely impressive web frontends such as Easy Diffusion, InvokeAI, and Stable Diffusion web UI.
               | 
             
           
         | 
       
     
    
Extending this theme but from an audio perspective, step forward Bark. This is a transformer-based text-to-audio model. The software can generate realistic multilingual speech as well as other audio – including music, background noise and simple sound effects, from text. The model also generates nonverbal communications like laughing, sighing, crying, and hesitations.
  Bark follows a GPT style architecture. It is not a conventional Text-to-Speech model, but instead a fully generative text-to-audio model capable of deviating in unexpected ways from any given script.
  Full review Full Story  | 
This topic does not have any threads posted yet!
You cannot post until you login.