With COVID-19 taking over our world, most of us have gotten locked into our homes. And amidst all this, video conferencing apps have become an integral part of our lives -- whether it is catching up with friends, or studying from home or even attending remote conferences.??
But one thing we know about video conferencing apps is that they consume crazy amounts of bandwidth. And for people running metered internet connections, this means that their data cap is bound to reach its limit soon.
However, now, to circumvent this, Nvidia Research is taking help from AI to bring the bandwidth consumption to a minimum by replacing the conventional h.264 video codec with a neural network. The application is crazy simple -- it swaps the facial data on the video feed and converts them into reference points for eyes, nose, mouth and the face. Instead of sending the entire hefty video feed with facial data, it sends these reference points which are fairly smaller in size.?
The Generative Adversarial Network (or GAN) that¡¯s with the receiver decodes these reference points and reconstructs the images to look just like you. And the end result is a sharp, crystal clear image, not just when the internet speed is amazing, but also in times when it is barely hanging in there, not compromising on the image quality.
Nvidia¡¯s sample reveals that using this novel AI network, the bandwidth consumption went from a hefty 97.28 kilobytes per frame to just 0.1165 kilobytes per frame. Moreover, this application works flawlessly even when the caller is wearing glasses, hats or face masks. How cool is that?
Moreover, with the help of these reference points, they have an additional feature called ¡®Free View¡¯ which detects the angle of the face and if it senses it isn¡¯t aligned with the camera, using the reference points, it will create an image as if the person is looking straight at the camera, while maintaining eye contact.?
You can see how the feature works in real-time in the video below: