PCIe 6.4/CXL 3.2 Fabric Switch Sample is Out Now! - Request the Silicon Sample via[email protected]
Learn More
Logo
  • About
  • Product
  • Technology
  • Newsroom
  • Careers
VisionLeadershipHistoryMembership

Hardware

PanSwitchPanRetimer

Silicon IP

LAU IPController IP

Custom Silicon & Solutions

PanEndpointPanFabricTotal AI Solution
Technical ReportsTech BlogPublications
EnglishKorean
CareersPositionsApply
Contact Us
  1. Back to Publications
  2. /
  3. MPI-over-CXL: Enhancing Communication Efficiency in Distributed HPC Systems

Featured Publication

MPI-over-CXL: Enhancing Communication Efficiency in Distributed HPC Systems

MPI-over-CXL: Enhancing Communication Efficiency in Distributed HPC Systems

Miryeong Kwon, Donghyun Gouk, Hyein Woo, Junhee Kim, Jinwoo Baek, Kyungkuk Nam, Sangyoon Ji, Jiseon Kim, Hanyeoreum Bae, Junhyeok Jang, Hyunwoo You, Junseok Moon, Myoungsoo Jung

SPICE

2025

Research Areas
Coherent Interconnect
Operating Systems
Architecture
Read PaperGoogle Scholar

Abstract

MPI implementations commonly rely on explicit memory-copy operations, incurring overhead from redundant data movement and buffer management. This overhead notably impacts HPC workloads involving intensive inter-processor communication. In response, we introduce MPI-over-CXL, a novel MPI communication paradigm leveraging CXL, which provides cache-coherent shared memory across multiple hosts. MPI-over-CXL replaces traditional data-copy methods with direct shared memory access, significantly reducing communication latency and memory bandwidth usage. By mapping shared memory regions directly into the virtual address spaces of MPI processes, our design enables efficient pointer-based communication, eliminating redundant copying operations. To validate this approach, we implement a comprehensive hardware and software environment, including a custom CXL 3.2 controller, FPGA-based multi-host emulation, and dedicated software stack. Our evaluations using representative benchmarks demonstrate substantial performance improvements over conventional MPI systems, underscoring MPI-over-CXL's potential to enhance efficiency and scalability in large-scale HPC environments.


Related Publications
Featured
ScalePool: Hybrid XLink-CXL Fabric for Composable Resource Disaggregation in Unified Scale-up DomainsDIMES • 2025
Coherent Interconnect
Architecture
Bridging Software-Hardware for CXL Memory Disaggregation in Billion-Scale Nearest Neighbor SearchACM Transaction on Storage • 2024
Operating Systems
Architecture
+2 more
CXL-ANNS: Software-Hardware Collaborative Memory Disaggregation and Computation for Billion-Scale Approximate Nearest Neighbor SearchThe USENIX Annual Technical Conference (ATC) • 2023
Operating Systems
Architecture
+2 more
View All Publications
Logo

Building the future of AI infrastructure with innovative semiconductor solutions.

Privacy Policy© 2025 Panmnesia, Inc.
All rights reserved.
About
VisionLeadershipHistoryMembership
Product

Hardware

PanSwitchPanRetimer

Silicon IP

LAU IPController IP

Custom Silicon & Solutions

PanEndpointPanFabricTotal AI Solution
Technology
Technical ReportsTech BlogPublications
Newsroom
EnglishKorean
Careers
CareersPositionsApply
Logo

Building the future of AI infrastructure

Quick Access
AboutProductsCareersNews
Technical ReportsPublications

About

▼

VisionLeadershipHistoryMembership

Products

▼

PanSwitchPanRetimerLAU IPController IPPanEndpointPanFabricTotal AI Solution

Technology

▼

Technical ReportsTech BlogPublications

Newsroom

▼

EnglishKorean

Careers

▼

CareersPositionsApply
Privacy Policy© 2025 Panmnesia, Inc.