NVIDIA CUDA Toolkit 12.2.0 (for Windows 11)

June, 29th 2023 - 3 GB - Freeware

Latest Version

NVIDIA CUDA Toolkit 12.8.0 (for Windows 11)
Operating System

Windows 11
User Rating

Click to vote
Author / Product

NVIDIA Corporation / External Link
Filename

cuda_12.2.0_536.25_windows.exe

Sometimes latest versions of the software can cause issues when installed on older devices or devices running an older version of the operating system.

Software makers usually fix these issues but it can take them some time. What you can do in the meantime is to download and install an older version of NVIDIA CUDA Toolkit 12.2.0 (for Windows 11).

For those interested in downloading the most recent release of NVIDIA CUDA Toolkit or reading our review, simply click here.

All old versions distributed on our website are completely virus-free and available for download at no cost.

We would love to hear from you

If you have any questions or ideas that you want to share with us - head over to our Contact page and let us know. We value your feedback!

Download NVIDIA CUDA Toolkit 12.2.0 (for Windows 11)

What's new in this version:

New Features:
- This release introduces Heterogeneous Memory Management (HMM), allowing seamless sharing of data between host memory and accelerator devices. HMM is supported on Linux only and requires a recent kernel (6.1.24+ or 6.2.11+).
- HMM requires the use of NVIDIA’s GPU Open Kernel Modules driver

As this is the first release of HMM, some limitations exist:
- GPU atomic operations on file-backed memory are not yet supported
- Arm CPUs are not yet supported
- HugeTLBfs pages are not yet supported on HMM (this is an uncommon scenario)
- The fork() system call is not fully supported yet when attempting to share GPU-accessible memory between parent and child processes
- HMM is not yet fully optimized, and may perform slower than programs using cudaMalloc(), cudaMallocManaged(), or other existing CUDA memory management APIs. The performance of programs not using HMM will not be affected.
- The Lazy Loading feature (introduced in CUDA 11.7) is now enabled by default on Linux with the 535 driver. To disable this feature on Linux, set the environment variable CUDA_MODULE_LOADING=EAGER before launch. Default enablement for Windows will happen in a future CUDA driver release. To enable this feature on Windows, set the environment variable CUDA_MODULE_LOADING=LAZY before launch.
- Host NUMA memory allocation: Allocate a CPU memory targeting a specific NUMA node using either the CUDA virtual memory management APIs or the CUDA stream-ordered memory allocator. Applications must ensure device accesses to pointer backed by HOST allocations from these APIs are performed only after they have explicitly requested accessibility for the memory on the accessing device. It is undefined behavior to access these host allocations from a device without accessibility for the address range, regardless of whether the device supports pageable memory access or not.
- Added per-client priority mapping at runtime for CUDA Multi-Process Service (MPS). This allows multiple processes running under MPS to arbitrate priority at a coarse-grained level between multiple processes without changing the application code.
- We introduce a new environment variable CUDA_MPS_CLIENT_PRIORITY, which accepts two values: NORMAL priority, 0, and BELOW_NORMAL priority, 1.

CUDA Compilers:
- LibNVVM samples have been moved out of the toolkit and made publicly available on GitHub as part of the NVIDIA/cuda-samples project. Similarly, the nvvmir-samples have been moved from the nvidia-compiler-sdk project on GitHub to the new location of the libNVVM samples in the NVIDIA/cuda-samples project.
- Resolved potential soft lock-ups around rm_run_nano_timer_callback(). A Linux kernel device driver API used for timer management in the Linux kernel interface of the NVIDIA GPU driver was susceptible to a race condition under multi-GPU configurations.
- Fixed potential GSP-RM hang in kernel_resolve_address().
- Removed potential GPUDirect RDMA driver crash in nvidia_p2p_put_pages(). The legacy non-persistent memory APIs allow third party driver to invoke nvidia_p2p_put_pages with a stale page_table pointer, which has already been freed by the RM callback as part of the process shutdown sequence. This behavior was broken when persistent memory support was added to the legacy nvidia_p2p APIs. We resolved the issue by providing new APIs: nvidia_p2p_get/put_pages_persistent for persistent memory. Thus, the original behavior of the legacy APIs for non-persistent memory is restored. This is essentially a change in the API, so although the nvidia-peermem is updated accordingly, external consumers of persistent memory mapping will need to be changed to use the new dedicated APIs.
- Resolved an issue in watchcat syscall.
- Fixed potential incorrect results in optimized code under high register pressure. NVIDIA has found that under certain rare conditions, a register spilling optimization in PTXAS could result in incorrect compilation results. This issue is fixed for offline compilation (non-JIT) in the CUDA 12.2 release and will be fixed for JIT compilation in the next enterprise driver update.
- NVIDIA believes this issue to be extremely rare, and applications relying on JIT that are working successfully should not be affected