June 17, 2025 - NVSHMEM and GDRcopy support

NVSHMEM and GDRcopy support now available for enhanced GPU communication and performance

Update SUNK Support for NVSHMEM and GDRCopy is now available in SUNK, enabling high-performance GPU-to-GPU communication. These technologies provide significant performance improvements for distributed GPU workloads and multi-GPU applications.

Overview

SUNK now supports NVSHMEM and GDRCopy, enabling high-performance GPU-to-GPU communication. These technologies provide significant performance improvements for distributed GPU workloads and multi-GPU applications.

NVSHMEM support

NVSHMEM (NVIDIA SHMEM) is a parallel programming interface that provides high-performance communication between GPUs across nodes.

Key features

Feature	Description
High-performance communication	Optimized GPU-to-GPU communication across nodes
Memory consistency	Strong memory consistency guarantees
Scalability	Efficient communication patterns for large-scale deployments
Ease of use	Familiar programming model similar to OpenSHMEM

Use cases

NVSHMEM is ideal for:

Distributed training: Large-scale machine learning model training
Scientific computing: High-performance computing applications
Data analytics: GPU-accelerated data processing workflows
Multi-node applications: Applications requiring GPU communication across nodes

GDRCopy support

GDRCopy (GPU Direct RDMA Copy) enables direct memory access between GPU memory and network interfaces, bypassing CPU memory.

Key features

Feature	Description
Direct memory access	GPU memory to network interface communication
Reduced latency	Bypass CPU memory for faster data transfer
Higher bandwidth	Optimized data transfer performance
CPU offloading	Reduce CPU overhead in data transfer operations

Use cases

GDRCopy is beneficial for:

High-frequency trading: Low-latency data processing
Real-time analytics: Fast data ingestion and processing
Streaming applications: Continuous data processing workflows
Network-intensive workloads: Applications with high network I/O requirements

Performance benefits

Communication performance

Reduced latency: Direct GPU-to-GPU communication
Higher bandwidth: Optimized data transfer paths
Better scalability: Efficient communication patterns for large clusters
CPU offloading: Reduced CPU overhead in communication operations

Application performance

Faster training: Accelerated distributed machine learning
Improved throughput: Higher data processing rates
Better resource utilization: More efficient use of GPU resources
Enhanced scalability: Better performance at scale

Configuration

Enabling NVSHMEM and GDRCopy

To enable these features in your SUNK deployment:

Update your SUNK Helm values to include NVSHMEM and GDRCopy support
Configure the necessary network settings
Deploy the updated SUNK configuration
Verify the features are working correctly

Requirements

Compatible NVIDIA GPUs with NVSHMEM and GDRCopy support
Appropriate network infrastructure (InfiniBand recommended)
Updated SUNK version with support for these features

Documentation

For detailed setup and configuration instructions, see:

Migration notes

Existing deployments

Existing SUNK deployments will continue to work without changes
NVSHMEM and GDRCopy are optional features that can be enabled as needed
No migration is required for existing workloads

Planning considerations

When planning to use NVSHMEM and GDRCopy:

Hardware compatibility: Ensure your GPUs support these features
Network requirements: Verify network infrastructure compatibility
Application compatibility: Check if your applications can benefit from these features
Performance testing: Test performance improvements in your specific use case

Overview​

NVSHMEM support​

Key features​

Use cases​

GDRCopy support​

Key features​

Use cases​

Performance benefits​

Communication performance​

Application performance​

Configuration​

Enabling NVSHMEM and GDRCopy​

Requirements​

Documentation​

Migration notes​

Existing deployments​

Planning considerations​

Overview

NVSHMEM support

Key features

Use cases

GDRCopy support

Key features

Use cases

Performance benefits

Communication performance

Application performance

Configuration

Enabling NVSHMEM and GDRCopy

Requirements

Documentation

Migration notes

Existing deployments

Planning considerations