PCIe 6.4/CXL 3.2 Fabric Switch Sample is Out Now! - Request the Silicon Sample via[email protected]
Learn More
Logo
  • About
  • Product
  • Technology
  • Newsroom
  • Careers
VisionLeadershipHistoryMembership

Hardware

PanSwitchPanRetimer

Silicon IP

LAU IPController IP

Custom Silicon & Solutions

PanEndpointPanFabricTotal AI Solution
Technical ReportsTech BlogPublications
EnglishKorean
CareersPositionsApply
Contact Us
  1. Back to Publications
  2. /
  3. ScalePool: Hybrid XLink-CXL Fabric for Composable Resource Disaggregation in Unified Scale-up Domains

Featured Publication

ScalePool: Hybrid XLink-CXL Fabric for Composable Resource Disaggregation in Unified Scale-up Domains

ScalePool: Hybrid XLink-CXL Fabric for Composable Resource Disaggregation in Unified Scale-up Domains

Hyein Woo, Miryeong Kwon, Jiseon Kim, Eunjee Na, Hanjin Choi, Seonghyeon Jang, Myoungsoo Jung

DIMES

2025

Research Areas
Coherent Interconnect
Architecture
Read PaperGoogle Scholar

Abstract

This paper proposes ScalePool, a novel cluster architecture designed to interconnect numerous accelerators using unified hardware interconnects rather than traditional long-distance networking. ScalePool integrates Accelerator-Centric Links (XLink) and Compute Express Link (CXL) into a unified XLink-CXL hybrid fabric. Specifically, ScalePool employs XLink for intra-cluster, low-latency accelerator communication, while using hierarchical CXL-based switching fabrics for scalable and coherent inter-cluster memory sharing. By abstracting interfaces through CXL, ScalePool structurally resolves interoperability constraints, enabling heterogeneous cluster operation and composable resource disaggregation. In addition, ScalePool introduces explicit memory tiering: the latency-critical tier-1 combines accelerator-local memory with coherence-centric CXL and XLink, whereas the highcapacity tier-2 employs dedicated memory nodes interconnected by a CXL-based fabric, achieving scalable and efficient memory pooling. Evaluation results show that ScalePool accelerates LLM training by 1.22x on average and up to 1.84x compared to conventional RDMA-based environments. Furthermore, the proposed tier-2 memory disaggregation strategy reduces latency by up to 4.5x for memory-intensive workloads.


Related Publications
Featured
MPI-over-CXL: Enhancing Communication Efficiency in Distributed HPC SystemsSPICE • 2025
Coherent Interconnect
Operating Systems
+1 more
Featured
Compute Can't Handle the Truth: Why Communication Tax Prioritizes Memory and Interconnects in Modern AI InfrastructurearXiv (Technical Report) • 2025
Coherent Interconnect
CXL Topology-Aware and Expander-Driven Prefetching: Unlocking SSD PerformanceIEEE Micro • 2025
Coherent Interconnect
Machine Learning
+1 more
View All Publications
Logo

Building the future of AI infrastructure with innovative semiconductor solutions.

Privacy Policy© 2025 Panmnesia, Inc.
All rights reserved.
About
VisionLeadershipHistoryMembership
Product

Hardware

PanSwitchPanRetimer

Silicon IP

LAU IPController IP

Custom Silicon & Solutions

PanEndpointPanFabricTotal AI Solution
Technology
Technical ReportsTech BlogPublications
Newsroom
EnglishKorean
Careers
CareersPositionsApply
Logo

Building the future of AI infrastructure

Quick Access
AboutProductsCareersNews
Technical ReportsPublications

About

▼

VisionLeadershipHistoryMembership

Products

▼

PanSwitchPanRetimerLAU IPController IPPanEndpointPanFabricTotal AI Solution

Technology

▼

Technical ReportsTech BlogPublications

Newsroom

▼

EnglishKorean

Careers

▼

CareersPositionsApply
Privacy Policy© 2025 Panmnesia, Inc.