ExCL March Meeting 2026

2 minute read

March 2026 ExCL meeting slides.

It appears that you don't have a PDF plugin for this browser. You can click here to download the PDF file.


Below is a LLM-generated summary from the meeting slides.


🚀 ExCL Monthly Update — March 2026

This month’s updates focus on storage usability, CI improvements, shared infrastructure, and data management best practices.


📂 Improved Project Storage Navigation

We’ve made a small but impactful usability improvement:

  • You can now read project folder names under /auto/projects
  • Permissions remain unchanged, but directory discovery is much easier
  • Tools like vim, yazi, and tab completion now work more effectively

💡 Note: The first access still triggers automount—after that, navigation is seamless.


📦 New Shared Storage for Apptainer Images

To better support large container workflows:

  • New shared location:

    /auto/projects/apptainer
    
  • Designed for large, reusable Apptainer images
  • Current use cases include:
    • Unreal Engine containers
    • AMD Vitis environments
  • Storage policy:
    • Default cap: 1 TB
    • Adjustable as needed

Running CI through Slurm is now the standard and recommended approach for jobs running across multiple ExCL systems.

Key Improvements

  • CI template now checks sbatch return codes
  • ⚠️ Standard error is treated as a warning, not immediate failure
  • 🧪 New system-device-smoke-test:
    • Validates CUDA, HIP, OpenMP, OpenCL
    • Runs across major ExCL systems
    • Uses shared IRIS setup scripts:
source /auto/software/iris/setup-system.source

📌 This significantly improves reliability and portability of CI pipelines across heterogeneous nodes.


📊 DVC for Data Management on ExCL

We now recommend DVC (Data Version Control) for managing datasets and pipelines.

Why DVC Works Well on ExCL

  • Shared cache model:
    • Store data in a project-level shared cache
    • Avoid duplication across repos
  • Hybrid workflow:
    • Same cache can act as:
      • Local shared storage (on ExCL)
      • Remote storage (outside ExCL)

👉 See the DVC Quick Start Guide in ExCL Docs for setup instructions.


📚 Documentation Best Practices Highlight

We highlighted an excellent resource:

“Minimum Viable Documentation Product (MVDP)” from the Consortium for the Advancement of Scientific Software

Key Takeaway

Follow a documentation checklist:

🔗 Resource: https://onegoodtutorial.org


🖥️ System & Infrastructure Updates

  • ⏳ Migration of MI100 / V100 system delayed until after SC26 submissions
  • 🔄 Work ongoing to restore Groq inference hardware
  • 🆕 Tenstorrent Blackhole acquisition in progress
  • 🔧 Scheduled downtime:
    • Tuesday, March 24, 9–11 AM
      • OS updates + system reboots
      • No NVIDIA driver updates (unless requested)

💬 Open Topics & Discussion Areas

We’re actively gathering feedback on:

  • Legacy system usage (e.g., Leconte PowerPC + V100)
  • Hudson access policies. Request to limit system to Slurm-only access.
  • Cloud GPU access options

📌 Summary

This month’s updates continue to push ExCL toward:

  • More usable shared infrastructure
  • Scalable data workflows (DVC)
  • Robust CI across heterogeneous systems
  • Better documentation practices

If you have feedback or want help adopting any of these tools, reach out!