NVIDIA L40S

Availability: In Stock

NVIDIA L40S

Overview

The NVIDIA L40S is a high-performance, data-centre-class GPU built on the Ada Lovelace architecture. It is designed for universal workloads — from generative AI (inference & training), large language models (LLMs), 3D graphics, real-time rendering and video applications. With 48 GB of GDDR6 memory and a broad precision range (from FP32 to FP8 and INT4/INT8), it offers an excellent balance of AI compute, graphics acceleration and media/streaming capabilities.
It is ideal for enterprises and service providers looking for a single-GPU solution to support multimodal AI workloads, visualization, virtual workstations, cloud graphics and inference deployments.

Key Features

Ada Lovelace GPU Architecture: built for modern AI and graphics workloads.
4th-generation Tensor Cores + support for structural sparsity and low-precision (FP8, INT8, INT4) compute for accelerated AI training & inference.
3rd-generation RT (Ray-Tracing) Cores: high ray-tracing throughput to support real-time rendering, digital twins, XR, architecture/engineering workflows.
Massive 48 GB of GDDR6 memory with ECC (error-correcting) to handle large models, rich textures, high-resolution scenes.
High memory bandwidth (864 GB/s) to feed data-intensive pipelines.
Passive cooling, dual-slot form factor suitable for data-centre server integration. Enterprise-ready features: Secure Boot with root of trust, NEBS Level 3 ready, virtualization (vGPU) support.
Multi-purpose support: not only AI but also professional visualization, virtual workstations, cloud gaming/graphics, rendering workflows.

Specifications

Here is a comprehensive summary of the main technical specifications:

Specification	Value
GPU Architecture	NVIDIA Ada Lovelace
Memory	48 GB GDDR6 with ECC
Memory Bandwidth	864 GB/s
CUDA® Cores	18,176
RT Cores (3rd Gen)	142
Tensor Cores (4th Gen)	568
Peak FP32 Performance	~91.6 TFLOPS
Tensor Performance (TF32, FP16, FP8)	Up to ~366 TFLOPS (TF32) / ~733 TFLOPS (FP16) / ~1,466 TFLOPS (FP8) with sparsity support
RT Core Performance	~209-212 TFLOPS
Interface	PCI Express Gen4 x16
Form Factor	Full-height, full-length (10.5″), dual-slot FHFL (≈4.4″ H x 10.5″ L)
Max Power Consumption	350 W
Cooling Solution	Passive cooling (suitable for server/airflow design)
Display Outputs	4 × DisplayPort 1.4a (typically disabled by default in server mode)
Virtual GPU (vGPU) Support	Yes – for virtual workstations and virtualised environments
Multi-Instance GPU (MIG)	No support (as of specification)

Developer & Designer | Hossein Donyadideh