VideoBench — How to measure your video quality easily
Introduction
Whether you’re a professional in the video industry or an amateur, when manipulating video files you’re often faced with encoding tuning choices that are rarely easy to make.
There are so many possibilities in terms of configurations that it’s often difficult to understand everything especially when codecs development is not your job.
Determining the encoding parameters usually consists of finding the best balance between 3 settings:
- the encoding time,
- the bitrate of the encoded file,
- the encoding quality.
That’s why I’ve developed a tool called “Video Bench” which helped me tremendously in my work. Its goal is to help you choose the best encoding settings and the best balance for your needs.
https://github.com/JNoDuq/videobench
Why measure video quality ?
While encoding time and video bitrate are easy to measure, quality remains difficult to quantify. When playing with encoding settings, it quickly raises some questions.
If I use encoding settings or codecs that take twice as long, will the quality of my encoding really be twice as good? If I increase my bitrate, does the quality increase proportionally? Etc., etc.
You’ll find tons of articles containing quality measurements with different codecs, different settings depending on different types of content, but even if they provide interesting insights, there are so many different needs and situations that it may be interesting to be able to carry out your own measurements.
In my case, I work in an OTT broadcasting infrastructure, therefore we need to encode sources with very different types of contents in very different qualities.
We had to refine our settings in order to have a consistent profile scale, so it was necessary to find a way to make encoding choices based on tests rather than feeling. These measurements also help us to compare encoders or to make technological watch by testing new codecs.
What is a quality measurement ?
The first idea that comes to mind when comparing two encodings is to watch both streams simultaneously on two screens and check them visually.
This technique is not to be neglected but remains rather subjective and difficult to implement when comparing many encoding variations.
For this reason, being able to make measurements returning numerical values quickly becomes interesting.
The important thing to understand in a quality measurement is that it doesn’t matter whether a content is beautiful., What matters is how much the degradation of the information introduced by the transcoding is perceptible. A degraded source file will give a transcoded file also degraded even if the encoding gives the best quality possible.
How to measure video quality ?
There are many kinds of measurement to evaluate the degradation of a video file but we’ll focus on PSNR and VMAF today.
PSNR:
PSNR (https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio) is a measure of image distortion. This measurement is very widespread in the video industry and knowing its results makes it easy to compare them with those of the specialized press.
This is a frame-by-frame comparison. The lower the PSNR value, the lower the quality is and vice versa.
One of the problems with this measurement is that it doesn’t provide a perception value but a distortion value, which implies that the interpretation of this value depends on the content. For example, a value of 40 can be considered very good on a complex video (e.g. football), average on a movie and low on a cartoon. Because if one is willing to accept some distortion on a complex content, one does not wish to undergo the same level of degradation on an easy encodable content.
A file strictly identical to the source will have an infinite value.
This measure strictly grades the difference between the encoded file and the source. In this context the use of enhancement filters such as “blur” or “deblocking” will lower the rating even if it may improve perception.
In the example above, three types of content (football, movie and cartoon) are encoded at 5Mbps: PSNR ranges from 40 to 50.
In the example above, a movie sample is encoded several times with the same resolution but with bitrates ranging from 1Mbps to 5Mbps: PSNR ranges from 35 to 46.
VMAF:
A new type of measurement developed by Netflix appeared in 2016: VMAF (https://en.wikipedia.org/wiki/Video_Multimethod_Assessment_Fusion). It provides a measurement closer to human perception and has a 100-fold rating scale that is easier to interpret. On the other hand, it is much more greedy in terms of calculation time.
Example of VMAF results on an OTT profile scale ranging from 480x360@500Kbps to 3840x2160@16Mbps
It’s interesting to have two quality measurement results to check there are no inconsistencies. PSNR and VMAF provide two different approaches: PSNR evaluates the difference between the source and the encoded file whereas VMAF grades the human perception of the encoded files.
How to use Video Bench ?
Thanks to the great FFmpeg software and several open source libraries, it is possible to obtain both PSNR and VMAF quality scores separately. But for someone not used to these tools, it can be difficult to use and quite frustrating to interpret and analyze both scores simultaneously.
That’s why I’ve created an easy-to-use interface that provides both VMAF and PSNR quality measurements as well as accurate video bitrates. The concept is quite simple: you select a reference file, then the different encoded files and you observe how the quality evolves.
I recommend doing measurements on short durations files (max 30 seconds), because the computation times can be quite long.
Installation :
For the installation procedure see
https://github.com/JNoDuq/videobench
Interface presentation :
Import Video Reference File:
The button (1) is used to import a file that will serve as the reference for the measurement of quality. Once the file is selected it will appear in the area (2).
The button (3) is used to import the encoded files that will serve for the measurement of quality. Once the files are selected they will appear in the area (4).
Import Json Files:
When a measurement is made, a JSON file containing all the information is saved to the video file location with the same name.
It can then be imported directly via button (5). This makes it possible to quickly visualize the results already calculated, but also to import results coming from files not having the same origin in order to compare the results.
Synchronize reference and tests files :
One of the first difficulties encountered during a measurement quality, comes from the fact that to compare two video files it is necessary that they are perfectly synchronized.
A desynchronization — event with a one frame difference — makes the quality scores drop tremendously. Knowing that an image lasts 0.02s on a file at 50fps, you can imagine the difficulty of finding the good temporal value allowing to align the frames of the two files.
If you know the exact time of the desynchronization, it is possible to provide it directly.
But if you don’t know it precisely, it is possible to find it “easily”: my method consists of providing a synchronization time and time window (sync windows) to the analyzer so that it computes the PSNR of the first three seconds of the tested file for all the possibilities contained in these sync windows.
The best PSNR value will be considered as the best sync time since it is the time when the tested file has the least difference with the source. If you see that the PSNR value tends to go up or down over the entire test range, your sync window may be incorrectly placed and you may get wrong results.
Another frequent case is that during a transcoding there’s a slight frame desynchronization. In the example above, we went from a source file in 50i to a transcoded file in 25p. The file is tested from -0.1s to +0.1s and we see that there is an offset of -0.04s which corresponds to a frame.
Start and Reset Buttons
Once the reference and the test files have been imported and the search for sync is adjusted (if necessary), the analysis processes are started by clicking on the start button (9).
Once the analysis is done, it is possible to return to the starting point by clicking on the reset button (8).
When the analyses are done, the result is displayed as graphs showing the quality and the bitrate values per second or per frame. A bar graph also shows the averages over the entire file.
You can select the items to display.
Settings :
For an easier usage of the app, all settings can be left in the “auto” mode. But they can also be manually adjusted to suit specific needs.
VMAF Model :
There are several VMAF models. In automatic mode, the tool has the following rules:
- If the source has a resolution equal to or less than 1920x1080, it uses the template vmaf_v0.6.1.pk. This is the standard mode of VMAF, which corresponds to a visualisation of a video on an HDTV at a distance equal to 3 times the height of the screen.
- If the source has a resolution greater than 1920x1080, it uses the template vmaf_4k_v0.6.1.pkl. This is the 4k mode of VMAF, which corresponds to a visualisation of a video on a UHD TV at a distance equal to 1.5 times the height of the screen
It is also possible to manually select the VMAF model in the settings, as well as the “phone model” mode, which corresponds to watching a video on a smartphone at a “comfortable” distance.
Scale filter :
To perform a VMAF or a PSNR measurement, the reference file and the measured file must have the same resolution. However, we may also want to compare files of different resolutions.
Depending on the VMAF model used, the files will be upscaled to 1920x1080 or 3840x2160 if needed.
As FFmpeg provides multiple upscaling filters, the tool uses the “neighbor” filter by default. This filter limits the notion of interpolation which would distort the quality measure.
It is possible to change the interpolation filter in the settings.
Reference Deinterlace Filter :
Sometimes the sources can be interlaced and the encoded files progressive. The tool compares the properties of the source and the encoded files to choose the correct deinterlacing filter to apply to the source.
- If ref = 50i and test = 25p -> the deinterlacing filter output one frame for each frame.
- If ref = 50i and test = 50p -> the deinterlacing filter output one frame for each field.
- If ref = test -> No deinterlacing process
Quality Subsampling :
Framerate may also vary between source file and encoded files.
If ref = 50p and test = 25p -> the tool uses a subsampling of 2 during the quality measurement, which means that only one image out of 2 of the source file is used during the measurement.
Final thoughts :
Developing “Video Bench” allowed me to think about how to measure video quality and make encoding choices based on metrics.
I hope this tool will make the evaluation and comparison of video encodings more accessible, that it will allow people who use it to improve their understanding of encoding parameters and to make their own opinion on the results obtained.
Because as the French expression says, “on est jamais mieux servi que par soi même” -> “If you want something done right, you have to do it yourself! “