Powerful and reliable programming model and computing toolkit

NVIDIA CUDA Toolkit

NVIDIA CUDA Toolkit 12.0.0 (for Windows 10)

  -  3.4 GB  -  Freeware

Sometimes latest versions of the software can cause issues when installed on older devices or devices running an older version of the operating system.

Software makers usually fix these issues but it can take them some time. What you can do in the meantime is to download and install an older version of NVIDIA CUDA Toolkit 12.0.0 (for Windows 10).


For those interested in downloading the most recent release of NVIDIA CUDA Toolkit or reading our review, simply click here.


All old versions distributed on our website are completely virus-free and available for download at no cost.


We would love to hear from you

If you have any questions or ideas that you want to share with us - head over to our Contact page and let us know. We value your feedback!

What's new in this version:

General CUDA:
- CUDA 12.0 exposes programmable functionality for many features of the Hopper and Ada Lovelace architectures:

Many tensor operations now available via public PTX:
- TMA operations
- TMA bulk operations
- 32x Ultra xMMA (including FP8/FP16)
- Membar domains in Hopper, controlled via launch parameters
- Smem sync unit PTX and C++ API support
- Introduced C intrinsics for Cooperative Grid Array (CGA) relaxed barrier support
- Programmatic L2 Cache to SM multicast (Hopper-only)
- Public PTX for SIMT collectives - elect_one
- Genomics/DPX instructions now available for Hopper GPUs to provide faster combined-math arithmetic operations (three-way max, fused add+max, etc.)

Enhancements to the CUDA graphs API:
- You can now schedule graph launches from GPU device-side kernels by calling built-in functions. With this ability, user code in kernels can dynamically schedule graph launches, greatly increasing the flexibility of CUDA graphs.
- The cudaGraphInstantiate() API has been refactored to remove unused parameters
- Added the ability to use virtual memory management (VMM) APIs such as cuMemCreate() with GPUs masked by CUDA_VISIBLE_DEVICES
- Application and library developers can now programmatically update the priority of CUDA streams
- CUDA 12.0 adds support for revamped CUDA Dynamic Parallelism APIs, offering substantial performance improvements vs. the legacy CUDA Dynamic Parallelism APIs

Added new APIs to obtain unique stream and context IDs from user-provided objects:
- cuStreamGetId(CUstream hStream, unsigned long long *streamId)
- cuCtxGetId(CUcontext ctx, unsigned long long *ctxId)
- Added support for read-only cuMemSetAccess() flag CU_MEM_ACCESS_FLAGS_PROT_READ

CUDA Compilers:
- JIT LTO support is now officially part of the CUDA Toolkit through a separate nvJitLink library. A technical deep dive blog will go into more details. Note that the earlier implementation of this feature has been deprecated. Refer to the Deprecation/Dropped Features section below for details.

New host compiler support:
- GCC 12.1 (Official) and 12.2.1 ( Experimental)
- VS 2022 17.4 Preview 3 fixes compiler errors mentioning an internal function std::_Bit_cast by using CUDA’s support for __builtin_bit_cast
- NVCC and NVRTC now support the c++20 dialect. Most of the language features are available in host and device code; some such as coroutines are not supported in device code. Modules are not supported for both host and device code. Host Compiler Minimum Versions: GCC 10, Clang 11, VS2022, Arm C/C++ 22.x. Refer to the individual Host Compiler documentation for other feature limitations. Note that a compilation issue in C++20 mode with <complex> header mentioning an internal function std::_Bit_cast is resolved in VS2022 17.4.
- NVRTC default C++ dialect changed from C++14 to C++17. Refer to the ISO C++ standard for reference on the feature set and compatibility between the dialects.
- NVVM IR Update: with CUDA 12.0 we are releasing NVVM IR 2.0 which is incompatible with NVVM IR 1.x accepted by the libNVVM compiler in prior CUDA toolkit releases. Users of the libNVVM compiler in CUDA 12.0 toolkit must generate NVVM IR 2.0.