Author Topic: SBSRender processes hanging when rendering 4k maps.  (Read 2719 times)

I'm running into multiple renders hanging when attempting to launch them in parallel threads with 4k exports (d3d10pc).
If I try to export 2k textures, it works just fine.

Symptoms:
SAT: 2018.3.0
CPU: Xeon E5-2650 v3 x2
GPU: Quadro m4000
Memory: 64gb
OS: Windows 7 64

It begins normally and starts exporting out the maps to disk, eventually it just stops (about 20/189 maps in).
I cannot capture the stdout as the sbsrender process hangs and won't write out to file unless it finishes.

Debugging
- Tried setting memory-budget and cpu-count limits, to no avail.
- Running the sbsrender processes one by one works but takes a long time.
- Once these processes hang, I can restart them using Windows Process Explorer and they continue rendering just fine, until something causes it to hang again. At which point if I restart them again, they continue working.
- Ive locked each task_pool to make sure that the next pool cannot start until the first one is completely finished; this allows me to get a few more exports, but it still ends up hanging.



Any idea what could be causing the sbsrender processes to hang indefinitely?
Could it be a GPU memory issue?


I've included an attachment with a simplified diagram of how the threading is set up and also of Process Explorer when the sbsrender jobs hang.

Thanks

So we swapped out the Quadro m4000 with a GTX 1080 and the 4k renders no longer seem to be hanging.
This isn't an ideal solution of course, but I would appreciate any feedback to help debug this further.

Both have 8GB GPU memory, albeit the 1080 has improved memory bandwidth, so I'm not sure if this is a driver issue or a specific GPU setting that causes these freezes.


Another update.

I was able to track down where some of these subprocesses hang:
Code: [Select]
[INFO][SBSRender]Will load engine located at "C:/Users/user.name/packages/substance_automation_toolkit/2018.3.0/platform-windows/plugins/engines/substance_d3d10pc_blend.dll".
It doesn't appear to recover past this stage.

Debugging
There's two things I was able to do here to prevent it from hanging...
  • Add a 10 second delay between render threads, giving the engine a chance to complete initialization.
  • Switch engine to sse2.

I'm not sure what the issue is specifically between the Quadro, drivers and d3d10 engine that seem to be causing the conflict, but this is as far as I'm able to track it down.

Hey Nev!

Thanks for all these details!

We used to run sbsrender with multi-process pool in the pretty same context than you except Quadro card!
We will make some test with Quadro cards and open the discussion with engine team in order to resolve the issue.

I keep you informed.

Cheers.


Hi Colin,

Just to update you:
I was wrong when I mentioned that adding a delay between render threads caused it to work.
It did work, coincidentally, for a limited time but the issue started up again today.
It would complete render processes for 26 of 27 UDIMs and freeze one of the threads in between.
Same issue as before where that render process wouldn't progress past engine initialization.

Initially I couldn't even get past the 3rd UDIM before all the other render threads froze, so this was definitely a huge improvement, but unfortunately not good enough.

___

After debugging it a bit further today, I updated the drivers for the Quadro and I seem to be getting consistent results (so far) with the renders. It also seems to be using my physical memory a lot more effectively, as I'm creating so many jobs that it maxes out the physical memory in d3d10, whereas it would only consume 50-60% with the old drivers.

I'm currently trying to limit threads based on memory utilization and will do some more testing with other assets to confirm if the driver change did in fact fix the issue.

I'll follow up as well when I have some new info for you.
Thanks

It appears the render freezing problem was somehow related to the older drivers.

I ran a performance test yesterday with a 1.6m poly asset, 57 UDIMs, 7 channels; 399 textures total. It completed successfully, no frozen render processes at all.

Debugging:
Older drivers: 391.33
New drivers: 431.02

I've attached a dxdiag report if it helps.