The VLC and VLR Codecs and the Neural Networks

Abstract

The VLR codec has been developed not only to compress very strongly the audio data, but also to minimize the transmission frequency of the audio frames.
It can be used to reduce the latencies in the bidirectional voice communications.
Finally, we can consider using it with the neural networks, especially with the deep reinforcement learning. Alert systems can be set up for monitoring the vital signs.
One can use data from the Internet of Things (IoT).
The storage requirements are reduced, the calculations are simplified, the predictions and the decision aids are facilitated.

The VLR Codec

The VLR codec was developed not only to compress very strongly the audio data, but also to minimize the transmission frequency of the audio frames:
- The consecutive similar audio frames are not transmitted, the receiver is authorized to repeat them until a credit of repetitions is exhausted.
- A backward search can be performed to search for a similar frame. In this case, one must just send a header containing a number pointing to the similar frame located behind.
- The header of each frame (complete or incomplete) allows to know the number of frames not transmitted.

For more information, see at the following addresses:
Algorithms
VLB

An implementation exists in the vlrPhone program, using only the first algorithm (non-transmission of consecutive similar audio frames).
In this implementation, with the current version of the program we have given the priority to the compression ratio at the expense of the quality and of the latency.
By changing the parameters (precision of the magnitudes and phases, more points in the foreground, ...), we can increase the quality and decrease the latency for certain applications (such as games) without decreasing the mean compression ratio.
By decreasing the size of the FFT buffers, we reduce the latencies but we also decreases the instantaneous compression ratio.

For more information, see at the following address:
Listening Page

The purpose of this document is to point out (without demonstration or mathematical formulas for now) that the algorithms of this codec can be used satisfactorily in artificial intelligence, especially in deep learning with neural networks, for all the one-dimensional signals (connected objects, industrial maintenance, vital signs, financial information, ...).
We will focus in particular on the problems of predictions rather than on the problems of actions to be carried out according to a state, the actions being limited only to predict the type of frame that will come.

General Information on the Neural Networks

An artificial neural network is a set of algorithms whose design is originally very schematically inspired by the workings of biological neurons, and which has subsequently approached statistical methods.

For more information:
Artificial Neural Networks

A recurrent neurons network is an artificial neural network with recurrent connections.
The recurrent neural networks are suitable for input data of variable size. They are particularly suitable for time series analysis.
They are used in automatic speech recognition or handwriting - more generally in pattern recognition - or in machine translation.

For more information:
Recurrent Neural Network

The recurrent Neural Network (RNN) can be used with the VLC codecs in general by working with the spectral envelope: we generate cubic splines connecting the points or the local peaks, we choose regularly spaced points, as if we had time series.

The reinforcement learning refers to a class of machine learning problems, the purpose of which is to learn from experiments what to do in different situations in order to optimize a quantitative reward over the time.

Formally, the basis of the reinforcement learning model is:
- A set of states S of the agent in the environment.
- A set of actions A that the agent can perform.
- A set of scalar values R ("rewards") that the agent can obtain.

For more information:
Reinforcement Learning

The temporal Difference Learning (TD Learning) is an automatic learning method based on the prediction.
It has been used primarily for the reinforcement learning problem, and is said to be a combination of the Monte Carlo ideas and the Dynamic Programming ideas.
It is well adapted to solve the problem of the infinite-horizon discounted prediction.
It is very often linked to the theory of the Markov decision processes.

For more information:
Temporal Difference Learning

More recently, very convincing results have been obtained with the project Alpha GO Zero compared to the initial Alpha GO project, involving only black and white boxes instead of feature vectors.

For more information:
Alpha GO Zero

The VLR Codec Algorithms and the Neural Networks

The unit in time is a complete FFT (Fast Fourier Transform) frame. The goal is to make short-term, medium-term and long-term predictions, with automatic readjustments, as the frames are received.

At any time t, before receiving a frame, the system must say whether the incoming frame is a new frame (value 0), is identical to the previous one (value 1), or is identical to a frame located behind at the position k. If the forecasts are correct, the system receives a reward R, otherwise it receives a penalty P. The reward R may vary depending on k.

The state is represented by the energy of the frame, which can be transmitted via the header of the frame or calculated from a complete frame received.
The energy can be in dB or in levels (for example low, medium or high levels). With this system, the predictions are made using the temporal difference method.

For the predictions:
- The current frame is as the previous frame, the system repeats the previous frame.
- The current frame is located at the position k, the system repeats the frame located at the position k.
- A new frame is received, the system inserts a null frame, or better it repeats the last frame in the limit of the credit of repetitions.

On the receipt (or non-receipt) of a frame, the system is able to update his states, his rewards or penalties, as well as his data.

For the data adjustments:
- The current frame is as the previous frame, the system repeats the previous frame.
- The current frame is located at the position k, the system repeats the frame located at the position k.
- A new frame is received, the system inserts the received frame.

The Internet of Things

With the Internet of Things, there are sensors and actuators.
A robot has several sensors, for example: odometers, cameras, proximity sensors, pressure sensors, laser scanners, etc.
Similarly, a house also has sensors such as thermometers, electricity meters, security cameras, smoke detectors, microphones, etc.
In a house, the actuators can be: light switches, blinds, camera movements, fire alarms, locks, etc.

The possible applications are the energy consumption, the personalized information services, the adaptive control applications for the user, etc.

To take into account these complex cases, it is necessary to switch to heterogeneous multichannel. Each channel contains the data of a sensor or an actuator.
The rewards and penalties may relate to a single target (for example the temperature or the power consumption).
It is necessary to work with states taking into account two or some important channels for the problem to be treated (for example a target and one or more actuators).

Notes

- A frame overlap of 50% or less is required to avoid the edge effects.

- In the frequency domain, the quasi-stationary signals are characterized by a high rate of successive repetitions and a good rate of non-successive repetitions. The others are characterized by a low rate of successive repetition and a medium or low rate of non-successive repetitions.

- One can start by training the system before going into the live mode so that there are as few new frames as possible.

- This system can be used as an anomaly detector, if it behaves well for normal cases. In this case, the system is trained and no data adjustments is made.

- The state is represented by the energy of the frame (in dB or in levels). We can modify this energy to have a different state for each number of past repetitions, with a limit.

- We can also work with states taking into account the energy of the first derivative of the frame, and / or the second derivative of the frame. In the frequency domain, the derivation is easy.

- The actuators may be arbitrary signals making it possible to simulate, in particular, discrete or continuous actions.

- By using the VLC and VLR codecs (including the versions with Codebook), we can not only characterize the emotion at a given moment, but also predict its evolution as function of various interactions applied. These features are very useful, especially to make the Chatbots sensitive to the emotion.

- The use of the VLC and VLR codecs to characterize and predict the evolution of the emotion will be the subject of a separate and complete article.

- The study of the patent applications concerning the VLR algorithms and the versions with Codebook is completed at INPI (France). Both patents will be issued in February or March 2018 at the latest.