ExCL May Meeting 2025

2 minute read

May 2025 ExCL meeting slides.

It appears that you don't have a PDF plugin for this browser. You can click here to download the PDF file.


Below is a LLM-generated summary of the meeting based on the presentation slides.


ExCL Monthly Meeting Highlights — May 2025

The Experimental Computing Laboratory (ExCL) at Oak Ridge National Laboratory held its monthly meeting on May 15, 2025, offering a variety of updates on system infrastructure, software improvements, and the official deployment of the powerful Mi300a system. Here are the key takeaways from this month’s meeting.


🖥️ Infrastructure & System Status

  • news Shell Command: A new command-line utility news is now available to keep users informed of system status changes. The login banner will notify users of unread items. Eventually, this will integrate with the news@excl.groups.io mailing list.

  • Mi300a (“Faraday”): Officially deployed and ready for use.

  • U280 FPGA: Deployed to milan3, with updated SLURM helper functions and environment setup.

  • ThinLinc: Updated to version 4.18.

  • Account Automation: Improved reporting and automation reduce the burden of manual account management.

  • Security: Gateway machines will now auto-update and reboot weekly to improve vulnerability management.

  • TinyPilot Update: New run script menu enables remote system resets via REISUB.

  • Siemens Remix and Licensing: Deployed with no changes to user selections.


⚡ Spotlight: Mi300a System (“Faraday”)

What Is It?

  • Supermicro AS-4145GH-TNMR 4U air-cooled server.
  • Contains 4 APU nodes, each with:
    • 912 CDNA3 GPU units
    • 96 Zen 4 CPU cores
    • 128 GB HBM3 (for a total of 512 GB)
  • Ubuntu 24.04 LTS + ROCm 6.4.0
  • Specialized riser/backplane needed for hardware expansion

Why It’s Interesting

  • High-capability AMD GPUs – ideal for AI and HPC workloads
  • Infinity Cache & Interconnect
  • Unified Memory Access – significant for memory hierarchy research
  • Fast performance – e.g., LLVM built in 5 minutes

How to Access

  • Log in via login.excl.ornl.gov
  • Hostname: faraday
  • Load ROCm module: module load rocmmod
  • Test setup with amd-toy
  • Contact help@excl.ornl.gov if render group access is missing

🔋 Power Monitoring

  • CheckMK Integration for Mi300a power tracking is in progress.
  • Documentation for power configuration will follow soon.

📣 Final Notes

  • Questions, suggestions, and community discussions are welcome.
  • Stay informed through the news shell command and the ExCL mailing list.

The ExCL team continues to work on enhancing their infrastructure and documentation to better support users. The upcoming months promise exciting new deployments and important updates. Make sure to stay informed through the ExCL mailing list, monthly meetings, and their documentation updates!