Yissum - Research Development Company of the Hebrew University

Audio Speech Enhancement by Computer Vision Analytics

Posted by Yissum - Research Development Company of the Hebrew UniversityResponsive · Innovative Products and Technologies · Israel

Summary of the technology

This was filed due to a meeting that Shmuel had with Orcam
Project ID : 10-2018-4602

Yissum - Research Development Company of the Hebrew University

Description of the technology


  • A voice of a speaking person is usually surrounded by unrelated sounds (background noise) which make it hard to understand him. In case of multiple speaker, the noise is hard to be aligned
  • Previous audio-visual approaches attempted to learn the statistics of the noise from periods when the face was static, and remove this noise from the incoming sound.

Our Innovation

A novel audio-visual approach that enhances the speaker's voice based on its correlation with his mouth and face movements.

Figure 1 Illustration of our encoder-decoder model architecture. A sequence of 5 video frames centered on the mouth region is fed into a convolutional neural network creating a video encoding. The corresponding spectrogram of the noisy speech is encoded in a similar fashion into an audio encoding. A single shared embedding is obtained by concatenating the video and audio encodings, and is fed into 3 consecutive fully-connected layers. Finally, a spectrogram of the enhanced speech is decoded using an audio decoder.


  • Enhances the audio signal of a talking person by removing the background noise
  • Separates the voices of two or more talking persons based on a synchronized video recording of their faces while talking.
  • Could be combined with existing audio-visual approaches


  • Predicts the person's speech based on the lips and face movement to filter the original incoming sound and removes unrelated frequency components.
  • Enhances the speaking person's voice by analyzing its facial movement as observed in the video, and producing voice that, during speaking periods, has better correspondence to the speaking person's articulatory motion as seen in the video.
  • Several sound encoding methods are being used as well to optimize performance.


  • Unique application for smart mobiles and video camera
  • Video conference improvement
  • Video chats improvement
  • Hearing aid improvement

Related publicationshttps://arxiv.org/pdf/1711.08789.pdf

Project manager

Anna Pellivert

Project researchers

Shmuel Peleg
HUJI, School of Computer Science and Engineering
Computer Science

Related keywords

  • Information Processing, Information System, Workflow Management
  • IT and Telematics Applications
  • Multimedia
  • Computers
  • Computer Graphics Related
  • Specialised Turnkey Systems
  • Scanning Related
  • Peripherals
  • Computer Services
  • Computer Software Market
  • Other Computer Related
  • Computer Science & Engineering
  • Image Enhancement

About Yissum - Research Development Company of the Hebrew University

Technology Transfer Office from Israel

Yissum Research Development Company of the Hebrew University of Jerusalem Ltd. Founded in 1964 to protect and commercialize the Hebrew University’s intellectual property. Ranked among the top technology transfer companies, Yissum has registered over 8,900 patents covering 2,500 inventions; has licensed out 800 technologies and has spun-off 90 companies. Products that are based on Hebrew University technologies and were commercialized by Yissum generate today over $2 Billion in annual sales.

Send your request

By clicking "Send your request" you are signing up and accepting our Terms of Service and Privacy policy

Technology Offers on Innoget are directly posted and managed by its members as well as evaluation of requests for information. Innoget is the trusted open innovation and science network aimed at directly connect industry needs with professionals online.