Recommended Bitrates for 360° Video Using Spin Enc
By Sergio Sanz-Rodriguez, PhD — Virtual Reality (VR)/360° video is one of the hottest topics in the multimedia industry and film production. The real world is captured by an omnidirectional camera consisting of a set of cameras covering different regions in the space. Then, the acquired pictures are stitched together to create the 360° or panoramic picture. Once created the master file, the 360° video is encoded for final delivery in Head-Mounted Displays (HMDs).
In order to preserve the maximum possible quality and, therefore, to keep the original intention of the filmmaker content wise, the video encoder plays a paramount role. Spin Digital HEVC/H.265 Encoder (Spin Enc) has been optimized for that purpose. A formal subjective quality assessment was conducted to find the minimum encoding bitrate that ensures high quality video in HMDs.
This blog describes the design and implementation of the subjective test and summarizes the obtained results. Within the context of the EU-funded DDD60 project (ddd60.eu), this work is the result of the joint collaboration between Spin Digital, reelport, Marché du Film, Tampere Film Festival, Sheffield Doc/Fest, and Sunny Side of the Doc.
The subjective experiment was carefully implemented taking into account all the aspects and details specified in the ITU-T P.913 Recommendation on “Methods for the subjective assessment of video quality, audio quality and audiovisual quality of Internet video and distribution quality television in any environment” [P.913, 2016].
Test Video Sequences
The following six 360° monoscopic 4Kx2Kp30, 4:2:0, 10-bit video sequences of 30 seconds long were used for the subjective test: “ChairliftRide”, “KiteFlite”, “SkateBoardTrick”, “MamiesDream1”, “MamiesDream2”, “StepToTheLine2”. The first three clips belong to the Joint Video Exploration Team (JVET)’s test sequences for video coding experiments [Alshina, 2017]. The other three clips are pieces of the Step to the Line and Mamie’s Dream films.
The video clips were encoded using Variable Bitrate (VBR) control at 2.5, 5.0. 10.0, 15.0, 20.0, 25.0 Mbits/s. These range of bitrates is quite similar to that suggested by the 3rd Generation Partnership Project (3GPP) [3GPP, 2017] for 360° video, which goes from 5.0 to 20.0 Mbits/s.
In the end, a total of 36 video clips were prepared for the purpose of the subjective experiments. The order of presentation of those clips were randomized to avoid any kind of memory bias during the voting process.
Participation Duration, Sessions, Breaks, Feedback
A total of 45 minutes were estimated for the whole subjective test per user. The test was divided into two viewing sessions of approximately 15 minutes each, with a mandatory 5-minute break in between and some extra mini breaks during the sessions. In each session the participant had to assess the subjective quality of 18 clips of 30 second long each.
Before the first session, a 3-minute training session was conducted for the subjects to get familiar with the test methodology. A pre-screening test before the training session had to be done only to users with very poor vision.
Once finished the subjective test, the user was allowed to give feedback to the experimenter, ask any question about the test and its objective, and inform how he or she experienced the subjective test especially if the user felt dizziness or nausea.
Based on the guidelines given in P.913, the data collection procedure was as follows: the experimenter opens the video file with number 1; the participant watches the clip and tells one of the following scores: 1 “bad”, 2 “poor”, 3 “fair”, 4 “good”, 5 “excellent”; the experimenter writes down the value in the corresponding cell; and so on with the next videos files.
The experimental setup we used in this work is the following:
- Video Encoder: Spin Digital HEVC Encoder version 1.6.1 (September 2017)
- Media Player: GoPro VR version 3.0.1
- HMD: HTC Vive
- PC: Core i7-6700HQ, GeForce GTX 1060, 16 GB DDR4 2400, Windows 10 64-bit
The subjective quality assessment was divided into three stages: pre-pilot test, pilot test, and official test.
It consisted of a preliminary execution to check the experimental equipment and software use. With this pre-pilot the experimenter could also detect issues with the test methodology. Those issues were taken into account for the next stages.
An internal pilot test was conducted with a reduced number of users (mainly project partners) in order to detect any issues with the test methodology, total duration, PC and HMD usability, and selection of the clips.
The feedback and recommendations given by the subjects helped the experimented to elaborate a more clear instructions to the users for the final test. Some of those recommendations include:
- Although the purpose of the test was to evaluate only the quality of the videos at different bitrates, many participants also evaluated the quality of the HMD but unconsciously. They did not notice that the limited resolution of the HTC Vive caused the lack of definition in some videos. As a result, the users scored such videos very low with the hope of viewing later on a better version. The users should have been aware of such a limitation of the HMD before starting the test.
- Regardless of their quality, some clips received very low scores in general, because they were not very interesting or caused rejection. The users should have been informed more clearly about the real purpose of the subjective test.
The final test was performed by all the project partners at their offices during a time period of one month.
A total of 50 people participated in the official subjective quality test. If we classify the participants by gender, 27 of them were women and 23 men, and, by use of glasses, 21 wore glasses and 29 did not. However, the results given by some users were very inconsistent and out of any logic. For example, one user suffered technical issues during the test and four subjects could not see any difference in quality because of the low resolution of the HMD.
Those outliers were removed based on the rejection criterion specified in P.913. After applying that method, eight users were discarded for the final results. A total of 42 participants were finally used to generate the results. The gender distribution was: 23 women and 19 men; and the use of glasses distribution was: 18 with glasses and 24 without glasses.
The results concerning the bitrate specification for Spin Enc are summarized in the next figure. The figure shows the Mean Opinion Score (MOS)-vs-bitrate curves associated with every video sequence the resulting average curve.
According to these results and the feedback given by the participants, the following conclusions can be drawn:
- At bitrates higher or equal than 15 Mbits/s, the obtained MOS values do not vary significantly. In other words, a saturation point in perceived quality is reached at 15 Mbits/s.
- For the particular case of 15 Mbits/s the following results were achieved:
- Average MOS: 3.7 (≈good)
- Highest MOS: 4.1 (good), for the “KiteFlite” sequence
- Lowest MOS: 3.4 (≈fair), for the “SkateboardTrick” sequence
- No MOS value reached the maximum score (5 “Excellent”) even at high bitrates. The next generation of HMDs and videos not in 8K resolution might help to increase the subjective quality.
- Two participants felt dizziness for 3 hours after the test.
- A well gender balance was achieved: 23 women vs 19 men.
- A well-balanced age distribution was achieved from 18 to 45 years.
- Women were in general a bit more demanding than men when scoring.
- Participants with glasses were in general a bit more demanding than those without glasses when scoring.
The results of the subjective quality assessment show that 15 Mbits/s in VBR mode is the minimum bitrate required to ensure high quality 360° 4Kx2Kp30 video using Spin Enc. Bitrate values for other video formats can be derived from the frame resolution and from the fact that doubling the video frame rate results in about 30% increase in compressed data.
|25/30 frames/s||50/60 frames/s|
|4Kx2K||15 Mbits/s||20 Mbits/s
|4Kx4K (3D top-bottom)||30 Mbits/s||40 Mbits/s|
|8Kx4K||60 Mbits/s||80 Mbits/s|
|8Kx8K (3D top-bottom)||120 Mbits/s||160 Mbits/s|
- [P.913, 2016] Recommendation ITU-T P. 913, “Methods for the Subjective Assessment of Video Quality, Audio Quality and Audiovisual Quality of Internet Video and Distribution Quality Television in Any Environment,” ITU-T, Geneva, Switzerland, 2016.
- [Alshina, 2017] E. Alshina, J. Boyce, A. Abbas, Y. Ye, “JVET common test conditions and evaluation procedures for 360° video”, JVET-H1030, 8th Meeting, Oct. 2017, Macao, China.
- [3GPP, 2017] 3rd Generation Partnership Project “3GPP TR 26.918: Virtual Reality (VR) media services over 3GPP”, Release 15, 2017: http://www.3gpp.org/ftp/tsg_sa/WG4_CODEC/TSGS4_94/Docs/S4-170752.zip
This activity has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 732717 (www.ddd60.eu).