Author Topic: [SOLVED] AMD Threadripper  (Read 601 times)

I've been trying to use SSE2 engine in Substance Designer lately and it seems like it's working in some odd manner.
It's using only 50% (more or less) of its power and always allocates tasks to the same threads (see image below).

FYI in other DCC applications the CPU is working just fine, 100%, all threads whenever needed.

Is that correct? Am I missing some registry tweaks? Happy to find out more.
Last Edit: August 21, 2020, 02:20:21 pm

Hello Mateusz,

The Substance Engine will very rarely stress the CPU enough to pin all of its cores to a 100% sustained load. System memory will usually be a bottleneck well before a workstation-grade CPU will be, especially when working with high resolution graphs – i.e. 4k and above.
That being said, a high grade workstation CPU is definitely a great asset to have for improved productivity in Substance Designer.

You can find a demonstration of Substance Designer running on an AMD Threadripper 3970X CPU here.
In the demonstration, the 100% usage on all cores occurs when the bitmaps outputs are refreshed and trigger a new render in Blender, but not while rendering the graph in Substance Designer.

Best regards,
Last Edit: August 10, 2020, 04:00:04 pm
QA Analyst
Substance Designer Team

Thanks Luca, I appreciate your response!

Would you mind shining even more light on how SD uses multi-threading and/or parallel processing? There is not much insight to it, apart from the status bars and task manager.

Just like you said, it seems like it caps out at some point, meaning it will never use the full potential of the “high-core-count” CPU. How is the memory exactly bottle-necking it? Is it something you may be looking at in the future, in terms of improvement and better usage of hardware? Maybe something similar to PDG in Houdini? I understand it’s a very specific tool (most likely completely different) but it does give you a whole lot of control on how you could use computational resources available.

Last Edit: August 14, 2020, 02:06:14 am

Hello Mateusz,

Regarding the memory bottleneck:

Each node in a graph with a rendered thumbnail stores a full resolution, uncompressed image in memory – the image displayed in the 2D View. Thus, each image can have a significant memory footprint. For instance, an 8k RGBA 32-bit floating point image alone has a memory footprint of 256 MB.

Depending on resolution and bit depth, this memory footprint can grow exponentially when rendering large graphs at high resolutions and precisions.
In common system configurations, this can fill up the entire memory pool quite quickly. At that point, the system needs to write the overflowing data to disk (specifically, on the pagefile), which is considerably slower than memory. This will result in far longer graph rendering times.
Workstation systems are less exposed to this issue, since they are more commonly endowed with a larger memory pool (32 GB and above).

I hope this is clear and informative!

Regarding your other queries, I have reached out to the Substance Engine team for details, and will let you know if I can shed more light on how it leverages the CPU.

Best regards,
QA Analyst
Substance Designer Team

Hello Mateusz

The Substance Engine uses multithreaded code in two situations:
 - the CPU engine performs image computations using MT code. When performing the computations for a node, the output image is split in tiles, and each CPU thread computes a tile of the output. When it's possible (a succession of 'local' nodes, i.e each filter only reads a single pixel at the same location as the pixel it's writing), we 'chain' the nodes and each thread processes a full chain, it's more cache-friendly and it saves bandwidth.
 - in some conditions, MT code is used when traversing an FxMap graph so that patterns in different branches can be generated in parallel. Depending of inheritance, usage of Random or not and a few other factors, it's not always possible to do so, but we try to use MT code to speed this up when possible.

As to why the CPU usage rarely goes over 50% on an hyperthreaded CPU, it's because we voluntarily launch as many threads as there are *physical* cores in the machine. The MT code we use in the CPU engine is composed of maybe 95% SIMD code (because pixel processing is very often parallel in nature), and while hyperthreading makes it look like a core can run 2 threads at once, the physical reality is that the hardware resources that perform SIMD operations are not duplicated, and that it's very likely that spawning more threads would actually slightly slow things down. This is in fact what we observed when adding multithreading to the CPU engine several years ago. Maybe the impact would be smaller on newer CPUs than it was last time we looked at this but I promise you we're not voluntarily leaving 50% perf on the table :)

I hope that makes things a bit clearer!

Best Regards,
Last Edit: August 24, 2020, 11:39:32 am

Luca and Eric - thank you for such a thorough explanation. It certainly clears things up :)