How audio signal is processed in a digital video broadcasting system and how it is retrieved with respect to receive synchronized with video signal?

Answer:
Audio in a broadcast production environment is handled in two different ways. Once captured, the audio signal is normally converted to a digital signal. It can be stored and pass through each stage of the signal chain as a separate signal or it can be imbedded into the video signal. There are advantages to either method
Separate audio signals can be processed independently of the video signal so level changes, surround processing, equalization can all be carried out with ease. Separate signals of course, are not tied to the video signal. One of the possible consequences is that the timing can be changed compared to the video signal. To make sure this doesn't happen, the signal chain needs to have video and audio delay systems that can correct for processing delays. In some cases, the timing is handled by automated systems that measure or predict processing delays. In other cases, the delay is set manually to retain synchronization through a specific process.

As video signals can pass through numerous processes, the timing can become complex. Each video process such as level adjustments, adding closed caption data each cause one or two frames of delay to the video signal. It can be beneficial to get the two signals combined as soon as possible.

The separate audio signal uses a device called an imbedder to combine the audio signal and the video signal. The audio signal is fitted into the gaps in the video signal. In the days of analog video, the image was delivered in bursts of one line. Then there is a pause to allow the screen trace to return to the left hand side of the screen. Similarly, at the bottom of the screen, there is a longer pause to give time for the trace to return to the top of the screen. Digital video signals still use the same signal timings despite the fact that the signal timing is less important than it was in the analog days. The gaps in the digital signal are filled with small chunks of audio. There are sufficient gaps to allow up to 8 audio channels to be squeezed into an HD signal as well as other data such as closed caption information.

Once the signal is part of the video signal, there is no need to worry about timing, nor the need to store and retrieve audio data with separate files. Because there are as many as 8 audio channels, there is also scope for separate soundtracks to e stored such as alternative languages. When the program is broadcast, the correct soundtrack needs to be selected. The final step before broadcast is to re-compress all the data into a data stream ready for transmission to the viewer.
First answer by GreenlightAV. Last edit by GreenlightAV. Contributor trust: 394 [recommend contributor recommended]. Question popularity: 1 [recommend question].