Jump to content

A story about 4K XAVC-S, Premiere and transcoding


Don Kotlos
 Share

Recommended Posts

EOSHD Pro Color 5 for Sony cameras EOSHD Z LOG for Nikon CamerasEOSHD C-LOG and Film Profiles for All Canon DSLRs
  • Administrators

Thanks Don nice post.

Meanwhile on a Mac I recommend to simply use EditReady, transcode to ProRes 422 LT and you will be able to playback full quality full res 4K on a laptop. There's no quality loss from XAVC-S.

The only way you can lose quality from XAVC-S is to go backwards significantly from H.264 100Mbit/s and that is quite hard to do :)

Link to comment
Share on other sites

Thanks Don nice post.

Meanwhile on a Mac I recommend to simply use EditReady, transcode to ProRes 422 LT and you will be able to playback full quality full res 4K on a laptop. There's no quality loss from XAVC-S.

The only way you can lose quality from XAVC-S is to go backwards significantly from H.264 100Mbit/s and that is quite hard to do :)

in my tests transcoding 1DC 4K MJPEG to prores LT creates a significant loss in quality, with compression artifacts and colour blocking. I used mpeg streamclip for the transcode on my Mac book. Viewing at 100% on a complex frame showed that very clearly.

Link to comment
Share on other sites

 

A lot to interpret from

these data but 2

ar

e the main

points: 

1. For editing optimized codecs, you don't need many cores. A fast quad core should be enough.

2. Lowering the playback resolution makes your GPU happy. Removing the effects does that too, but usually you need to preview them more than you need the extra resolution.   

As you can see there are very few transcoded files come close to the quality/colors/exposure of the original XAVC-S! I did not expect the banding issues with the Prores. 

Conclusions for 4K XAVC-S, Premiere CC and Windows users: 

 

1.       Don’t try to optimize your computer to edit 4K XAVC-S natively. Use proxies and low playback resolution if you want a smooth experience.

 

2.       For maximum output quality avoid using the transcoded files for the final render. If needed, use CineformYUV10bit. 

 

I hope this is helpful for some of you. 

 

Very interesting!  Thanks!  Some further explanation may help some readers.  Essentially, the CPU does the rendering and your GPU (graphic card) does the playback.  If one looks at the data above they can see that they have little effect on each other.  Because there's only so much parallel processing that can be done on a render, a faster CPU clock speed / data bus will shorten render times?  Yes?  So better to have 4 core at 4.5MHz, then 8-core at 3.2.

On optimization, depends on what you want to optimize, right?  If you're more interested in viewing 4K playback with effects, the more powerful GPU you have, the better.  And it doesn't really matter if the GPU is above 50% because that's all it's doing, it shouldn't effect CPU business, yes?  So it's less important if you have a slow CPU.

If you want to optimize your render times, then a liquid cooled, RAID 0, over-clocked CPU is probably your best bet (though I have no direct experience).  The latest i7 chipsets give maybe 5% improvement, from what I've read, so one shouldn't pay a premium for the absolute latest.

Also, I think you may mean for maximum output quality as an intermediate clip, use Cineform YUV10bit.  For final render, that should be up to the destination requirement?

Finally, another point of your analysis may be that laptops simply do not have strong enough GPUs for heavy 4K editing yet.  People may not realize even if they have the same number card, they're "M" versions, with less memory and power.  So you may want to use a desktop.

Again, very interesting!  THANKS!

 

Link to comment
Share on other sites

The only way you can lose quality from XAVC-S is to go backwards significantly from H.264 100Mbit/s and that is quite hard to do :)

Theoretically you are right, and that was what I expected but in reality I had hard time not loosing quality over transcoding. Only the 10bit uncompressed file is close but that is 70x the size! 

in my tests transcoding 1DC 4K MJPEG to prores LT creates a significant loss in quality, with compression artifacts and colour blocking. I used mpeg streamclip for the transcode on my Mac book. Viewing at 100% on a complex frame showed that very clearly.

In most parts of the image it would be hard to see any difference. But where the H264 is failing (banding in the sky, macroblocking) the transcoding always made things worse. 

On optimization, depends on what you want to optimize, right?  If you're more interested in viewing 4K playback with effects, the more powerful GPU you have, the better.  And it doesn't really matter if the GPU is above 50% because that's all it's doing, it shouldn't effect CPU business, yes?  So it's less important if you have a slow CPU.

It seems like that. Differences between GPUs are smaller in the final rendering but during the playback a strong GPU can make a real difference. The TITAN X might be an overkill but a GTX 970 might be necessary for full resolution playback with the effects.  

 

Very interesting!  Thanks!  Some further explanation may help some readers.  Essentially, the CPU does the rendering and your GPU (graphic card) does the playback.  If one looks at the data above they can see that they have little effect on each other.  Because there's only so much parallel processing that can be done on a render, a faster CPU clock speed / data bus will shorten render times?  Yes?  So better to have 4 core at 4.5MHz, then 8-core at 3.2.

If you want to optimize your render times, then a liquid cooled, RAID 0, over-clocked CPU is probably your best bet (though I have no direct experience).  The latest i7 chipsets give maybe 5% improvement, from what I've read, so one shouldn't pay a premium for the absolute latest.

Definitely. I went with the maximum number of cores (8) that can still be overclocked at 4.5MHz. For a new build I would suggest an overclocked 6 core. The article from pugetsystems that I mentioned is an excellent source for a new build. 

Also, I think you may mean for maximum output quality as an intermediate clip, use Cineform YUV10bit.  For final render, that should be up to the destination requirement?

Yes, that's what I meant. Since any transcoding only reduces the quality of the footage, the original XAVC-S files should be used as the final render to any final codec. 

Finally, another point of your analysis may be that laptops simply do not have strong enough GPUs for heavy 4K editing yet.  People may not realize even if they have the same number card, they're "M" versions, with less memory and power.  So you may want to use a desktop.

Yes, most laptops do not. High end gaming GPUs (GTX980m/GTX980) or the high end Quadros can be enough though. Its not that you cannot do it but as you say for heavy 4K editing a desktop should not be avoided. 

 

Link to comment
Share on other sites

  • 3 weeks later...

Regarding CPU power, newer Intel models has "Quick Sync Video" that does help rendering faster indeed, but Premiere does not support it: http://www.intel.com/content/www/us/en/architecture-and-technology/quick-sync-video/quick-sync-video-general.html

It could be helpful for systems without a dedicated GPU, but when you do have a dedicated GPU you are better off with doing the computations there since they are usually both faster and can optimize many more computations during editing than the quick sync. 

Link to comment
Share on other sites

Do you have a certain info about that? from what I know, Premiere use GPU for Mercury engine and preview, but not for final rendering.

Yes, it does use GPU even for the final rendering and can save you A LOT of time. 

Here are few tests from puget systems:

https://www.pugetsystems.com/labs/articles/Adobe-Premiere-Pro-CS6-GPU-Acceleration-162/

https://www.pugetsystems.com/labs/articles/Adobe-Premiere-Pro-CC-Professional-GPU-Acceleration-502/

Link to comment
Share on other sites

Thank you for the link.

From what I see (regarding the test with Premiere CC which is more relevant today), in 1080p and a lot of effects, GPU helped until certain point only and the CPU / Chipset made a big difference.

In 4K staff, GPU was much more important.

Link to comment
Share on other sites

Definetly the cpu power is important.

The reason that I included the cs6 link is that only there they compare the rendering with and without the GPU and this holds true for cc too. With even a relatively  low end GPU the encoding takes 1/5 of the time which is similar to going from a 1 core system to a 6 core (that is from the link posted in the first post). And that is for 1080p, for 4k gpu helps even more. 

Davinci Resolve makes even better use of the GPUs so nowadays a GPU is a must for any editing system. 

 

 

 

Link to comment
Share on other sites

Yes, it does use GPU even for the final rendering and can save you A LOT of time. 

I believe liork is correct -- significant GPU acceleration of long-GOP encode/decode/transcode (e.g, H264) is almost impossible. The terminology is confusing since we often refer loosely to exporting the file as "rendering", or we say "render it out". However technically you render an *effect*, but but encode or export to a video stream or file. If the timeline is not rendered, then exporting may require rendering as one phase but they are distinct actions.

Parallelizing transcoding for an interframe codec like H.264 is extremely difficult. In a recent interview with editor Scott Simmons and Andrew Page (nVidia Product Manager for Professional Video Technologies), they explained: 

"there are a lot of tasks that can't be broken down to be parallel: encoding is one, decoding some of the camera compressed formats is another one...what happens in frame 2 depends on frame 1, we do something to frame 1 then feed the results into frame 2....that's pretty much an encoding problem...everything is interrelated, so we can't break it up into lots of different simultaneous things." (That Studio Show podcast, 5/20/14: https://itunes.apple.com/us/podcast/that-studio-show/id293692362?mt=2)

There are dozens of academic papers where the smartest researchers have tried to apply GPU acceleration to H264 encoding. In general they have not been successful. Jason Garrett-Glaser, (lead developer of X264) described this: 

"...Incredibly difficult...countless people have failed over the years. nVidia engineers, Intel engineers...have all failed....Existing GPU encoders typically have worse compression...and despite the power of the GPU, still slower....The main problem is [H264] video encoding is an inherently linear process. The only way to make it not linear is using different algorithms [which entail] significant sacrifices."

By contrast Quick Sync does not target rendering but encode/decode. Those are two different things. Quick Sync can accelerate encoding by 4x or 5x -- it is a huge improvement. The downside is software must use that. To my knowledge FCP X and Handbrake do; Premiere does not. This will become much more critical as H265 is rolled out since it is far more compute-intensive than H264. Intel CPUs starting with Skylake have enhanced Quick Sync that supports H265.

Because of the inability of GPUs to handle this, both nVidia and AMD have recently added separate, fixed-function hardware encoders to their GPUs. Architecturally it's a bag hung on the side. nVidia's is called NVENC and AMD's is called VCE. They are both essentially a separate ASIC run by a proprietary microcontroller, but integrated into the GPU die or on the card. The downside is they both require separate and proprietary APIs to access, as opposed to Intel's Quick Sync that is on nearly every Intel CPU since Sandy Bridge (except unfortunately Xeon).

The Puget Systems tests were showing GPU accelerated *effects*, not encode/decode/transcode. Any purported benefit to export was during timeline rendering, not encoding. 

Link to comment
Share on other sites

Thanks for the info. 

You are correct there are two distinct steps happening at the final rendering, one being the rendering of the effects and the second the encoding process. You could render the effects before hand but then you have the same problem. 

While quick sync can do both, it supports very few filters compared to the cuda platform. If you wanted to just trancode video to a low res file with a h264 codec then quick sync is great. But for an editing system that you use multiple filters you are better off with a GPU that can offload almost all of the effects from the CPU. Sure, it would be great if quick sync could be used just for the trancoding part in Premiere when exporting in h264 but that is of limited use and since most workstations use Xeons anyways, I see why adobe hasn't bothered. 

 

The Puget Systems tests were showing GPU accelerated *effects*, not encode/decode/transcode. Any purported benefit to export was during timeline rendering, not encoding. 

From what I understand it includes both the rendering of the effects and the encoding time. 

Link to comment
Share on other sites

....While quick sync can do both, it supports very few filters compared to the cuda platform. If you wanted to just trancode video to a low res file with a h264 codec then quick sync is great. But for an editing system that you use multiple filters you are better off with a GPU that can offload almost all of the effects from the CPU. Sure, it would be great if quick sync could be used just for the trancoding part in Premiere when exporting in h264 but that is of limited use and since most workstations use Xeons anyways, I see why adobe hasn't bothered. 

From what I understand it includes both the rendering of the effects and the encoding time. 

Don thanks for all your research and testing on this. It was beneficial to my documentary group as we investigate more effective workflows in the 4k era.

Re Quick Sync and filters, I just did a test on my 2015 iMac, where I have both FCP X 10.2.2 and Premiere CC 2015.0.2 installed. I exported a 5 min H264 4k clip from my Sony A7RII. On both FCP X and Premiere I applied color correction and sharpening. I did not render the timeline in either one which forced this to happen during export. Below are my numbers (note this is on the same physical machine):

Premiere CC 4k H264 export, 1-pass VBR: 12 min 40 sec

FCP X 4k H264 export, 1-pass: 3 min 45 sec

Premiere CC 1080p H264 export, 1-pass VBR: 7 min 38 sec

FCP X 1080p H264 export, 1-pass: 2 min 15 sec.

This is almost certainly due to Quick Sync although I don't have a Xeon machine for comparative testing. Others have tested similar scenarios on Xeon-powered Mac Pros running FCP X and they are a lot slower at exporting to H264 than i7-powered machines.

Quick Sync may seem like a "one trick pony" because (before Skylake) it only did MPEG-2 and MPEG-4/H264, and only worked in single-pass mode. However I cannot see any visible difference in my tests between single and multi-pass modes. My group's workflow like many others involves final H264 products for web upload. So while a "one trick pony", it is a very effective one.

Link to comment
Share on other sites

  • 5 months later...

WOW! You're the man, OP. I have been doing a lot of research on this topic but no info came close, except your findings!

 

Some questions here:

1. Surprised that scrubbing is choppy on native XAVCS files, even with your specifications. Will having a GTX980 (or 1070, upcoming) remove the choppiness while scrubbing? Otherwise, is there any foreseeable new tech in the future that can natively handle such native files? Don't really care about encoding times, but scrubbing is VERY important.

2. Does the hard drive play any part in determining the scrubbability of footage in Premiere Pro CC? Asking for both NATIVE and Cineform footage. 

2a. In terms of priority in component upgrading in scrubbability, I suppose CPU is tops. What are the next few items in the sequence? CPU >> RAM >> SDD >> GPU?

2b. Is there any performance difference between an internal and external Seagate 4TB drive? I think I'd need to have that if I were to use Cineform. The size is insane. 

3. For Cineform 10-bit YUV, does it suffice to use Quality 4 in AME?

4. Is there a GPU-accelerated transcoder from XAVCS 4k 100mbps to Cineform 10 bit? I checked my GPU load (660Ti, 2Gb) and it was only at 1%, while my CPU is on full load all the way when transcoding 1h+ of 20 clips.

5. What is the current workflow for swopping out proxy files with the originals? 

Link to comment
Share on other sites

8 hours ago, s0ny said:

WOW! You're the man, OP. I have been doing a lot of research on this topic but no info came close, except your findings!

Thanks alot, I am glad that some people find this useful. 

8 hours ago, s0ny said:

1. Surprised that scrubbing is choppy on native XAVCS files, even with your specifications. Will having a GTX980 (or 1070, upcoming) remove the choppiness while scrubbing? Otherwise, is there any foreseeable new tech in the future that can natively handle such native files? Don't really care about encoding times, but scrubbing is VERY important.

GPU with premiere is used mostly for effects preview and effects render. So if you have plenty of effects (that are off course accelerated) , a GPU can help with that when scrubbing. But the bottleneck seems to be the CPU. Remember that I did these tests with a Titan X, hard to find something faster. 

8 hours ago, s0ny said:

2. Does the hard drive play any part in determining the scrubbability of footage in Premiere Pro CC? Asking for both NATIVE and Cineform footage. 

Hard drive does not affect scrubbability that much since Most SSD drives should be able to handle the data rates of most compressed codecs. We are talking about <100MB/sec here. Even many fast spinning drives will work fine but a raid would be preferable with those. 

8 hours ago, s0ny said:

2a. In terms of priority in component upgrading in scrubbability, I suppose CPU is tops. What are the next few items in the sequence? CPU >> RAM >> SDD >> GPU?

For highly compressed codecs CPU>>GPU>>SSD>RAM(DDR3/DDR4). 

For editing friendly codecs GPU>CPU>>SSD>RAM (DDR3/DDR4)

8 hours ago, s0ny said:

2b. Is there any performance difference between an internal and external Seagate 4TB drive? I think I'd need to have that if I were to use Cineform. The size is insane. 

Depends on the connection. USB2 will kill you. esata/thunderbolt/USB3 are much better and can have similar performance with an internal drive. USB3 can have a small impact on your cpu so that is why it is last. 

8 hours ago, s0ny said:

3. For Cineform 10-bit YUV, does it suffice to use Quality 4 in AME?

I haven't done extensive testing but I believe 4 is very good and hardly distinguishable from 5. 

8 hours ago, s0ny said:

4. Is there a GPU-accelerated transcoder from XAVCS 4k 100mbps to Cineform 10 bit? I checked my GPU load (660Ti, 2Gb) and it was only at 1%, while my CPU is on full load all the way when transcoding 1h+ of 20 clips.

Even though chips in CPUs and GPUs allow acceleration of H264/H265 codecs, unfortunately premiere is not taking advantage of that neither for decoding or encoding. So no. 

8 hours ago, s0ny said:

5. What is the current workflow for swopping out proxy files with the originals? 

See here:

But keep in mind that with the new version of premiere the proxy files can be generated automatically and it should be much easier to switch between the two.

Link to comment
Share on other sites

Hi there,

Thanks for the prompt reply. 

Some questions:

1. I checked the benchmarks and saw that your E5-1680v3 is faster than Skylake 6700k for single-core performance. Does that mean even for 6700k or 5820, there is no way to achieve smooth scrubbing on native XAVCS footage?

2. Why is it that for editing-friendly codecs (e.g. Cineform 10bit), GPU is the most important for scrubbing? Thought GPU can only accelerate limited effects.

3. Regarding the GPU-accelerated transcoder from XAVCS 4k 100mbps to Cineform 10 bit, I was referring to a 3rd party app than Media Encoder or Premiere Pro CC. Any that you can recommend?

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

 Share

  • EOSHD Pro Color 5 for All Sony cameras
    EOSHD C-LOG and Film Profiles for All Canon DSLRs
    EOSHD Dynamic Range Enhancer for H.264/H.265
×
×
  • Create New...