GPU driver crashes after becoming unresponsive (Windows)

Last update: A. Peternier, 16/11/2016

1. GPU used both for graphics and compute

In most cases, computers have one single GPU that is connected to a display and used for graphics rendering by the operating system. When such GPUs are OpenCL-enabled and meet SARscape’s minimum requirements, they can be used also to speedup the processing.

2. Timeout Detection and Recovery (TDR) problem

When one same GPU is connected to a display and, at the same time, is it used as OpenCL device for intensive SARscape processing, the GPU might become so heavily loaded that graphical operations stutter or freeze. After a specific threshold (which is of 2 seconds, by default) the operating system resets the GPU driver with an error (e.g., “Display driver stopped responsing and has recovered”).
This problem happens under Windows when a GPU is set to use the Windows Display Driver Model (WDDM) mode. With the more recent versions of Windows, this same problem is present even on GPUs in WDDM mode although not connected to any display.

3. Solutions

To get rid or limit the impact of this problem, please follow one of the proposed workarounds:

  • If possible, use two different GPUs: one for rendering, one for processing. If you have Nvidia hardware, see whether the compute-only GPU can be switched to Tesla Compute Cluster (TCC) mode.
  • Increase the TDR timeout. This is done by editing a key into the Windows registry as suggested here.
  • Use SARscape under Linux.
  • Use a CPU-only OpenCL runtime for the processing of large datasets that trigger the TDR problem.