ExCL May Meeting 2025
May 2025 ExCL meeting slides.
Below is a LLM-generated summary of the meeting based on the presentation slides.
ExCL Monthly Meeting Highlights — May 2025
The Experimental Computing Laboratory (ExCL) at Oak Ridge National Laboratory held its monthly meeting on May 15, 2025, offering a variety of updates on system infrastructure, software improvements, and the official deployment of the powerful Mi300a system. Here are the key takeaways from this month’s meeting.
🖥️ Infrastructure & System Status
-
news
Shell Command: A new command-line utilitynews
is now available to keep users informed of system status changes. The login banner will notify users of unread items. Eventually, this will integrate with thenews@excl.groups.io
mailing list. -
Mi300a (“Faraday”): Officially deployed and ready for use.
-
U280 FPGA: Deployed to
milan3
, with updated SLURM helper functions and environment setup. -
ThinLinc: Updated to version 4.18.
-
Account Automation: Improved reporting and automation reduce the burden of manual account management.
-
Security: Gateway machines will now auto-update and reboot weekly to improve vulnerability management.
-
TinyPilot Update: New run script menu enables remote system resets via REISUB.
-
Siemens Remix and Licensing: Deployed with no changes to user selections.
⚡ Spotlight: Mi300a System (“Faraday”)
What Is It?
- Supermicro AS-4145GH-TNMR 4U air-cooled server.
- Contains 4 APU nodes, each with:
- 912 CDNA3 GPU units
- 96 Zen 4 CPU cores
- 128 GB HBM3 (for a total of 512 GB)
- Ubuntu 24.04 LTS + ROCm 6.4.0
- Specialized riser/backplane needed for hardware expansion
Why It’s Interesting
- High-capability AMD GPUs – ideal for AI and HPC workloads
- Infinity Cache & Interconnect
- Unified Memory Access – significant for memory hierarchy research
- Fast performance – e.g., LLVM built in 5 minutes
How to Access
- Log in via
login.excl.ornl.gov
- Hostname:
faraday
- Load ROCm module:
module load rocmmod
- Test setup with
amd-toy
- Contact
help@excl.ornl.gov
if render group access is missing
Documentation Links
- ExCL Faraday User Docs
- Supermicro AMD Accelerator Systems
- Faraday Datasheet (PDF)
- Supermicro Hardware Manual (PDF)
🔋 Power Monitoring
- CheckMK Integration for Mi300a power tracking is in progress.
- Documentation for power configuration will follow soon.
📣 Final Notes
- Questions, suggestions, and community discussions are welcome.
- Stay informed through the
news
shell command and the ExCL mailing list.
The ExCL team continues to work on enhancing their infrastructure and documentation to better support users. The upcoming months promise exciting new deployments and important updates. Make sure to stay informed through the ExCL mailing list, monthly meetings, and their documentation updates!