High Performance Computing (HPC)
What is HPC?
The volume of data generated by healthcare systems, research, and clinical trials is growing at an exponential rate, far outpacing the computing power needed to process it. High-performance computing (HPC) is a powerful solution that enables the analysis of massive datasets and the execution of complex calculations at incredibly high speeds.
To answer this question we will borrow from the old saying: "Many hands make light work." With HPC, hundreds or thousands of nodes (individual computers) working together in clusters (teams) can solve these problems much faster than a large single, powerful computer could on its own. Clusters of interconnected computers can each work on a small piece of a problem simultaneously to solve a very large problem – no matter how complex.
HPC Advantages
High-performance computing (HPC) brings a host of benefits, especially when running in a cloud environment. Here are some key advantages:
Auto-Scaling for Cost Efficiency: One of the most powerful features of HPC is its ability to auto-scale. This means that computing resources are dynamically adjusted to meet the demands of your workload. As your computational needs grow, more nodes (individual computers) are added automatically, and when the demand decreases, the system scales down by removing excess nodes. This ensures that you're only using and paying for the resources you need at any given time—avoiding the cost of maintaining expensive, underutilized servers.
Access to Cutting-Edge Technology: Running HPC in the cloud means that you’re always working with the latest, state-of-the-art computing hardware. You never need to worry about costly hardware upgrades or downtime due to aging infrastructure.
UCLA Health's HPC Capabilities
- Customized Clusters: UCLA Health offers a flexible HPC cluster that supports compute-optimized, memory-optimized, AI/ML, and other specialized workloads.
- Integration with Azure Services: The cluster integrates with Azure Data Lake Storage for scalable and efficient data management, both for frequently accessed (hot tier) and archived (cool tier) data.
- Specialized Software: The cluster supports customized software like NVIDIA CUDA drivers and Illumina DRAGEN for secondary genomic analysis, with over 7,000 software packages available through Spack.
- Specialized Hardware: Includes support for GPUs (NVIDIA T4, A100) and FPGAs (Xilinx U250), ideal for deep learning, AI training, large-scale simulations, and other high-demand computational tasks.
Technical Specifications for Current HPC Cluster
Development Work Partition (F2): Ideal for small-scale testing, development, and job submission, the F2 partition provides a lightweight, low-cost environment for early-stage software development, debugging, and workflow design.
- Resources: 2 CPU cores, 4 GB of memory, and 32 GB of scratch space per node.
- Key Features: The F2 partition allows users to test code and develop applications with minimal resources while keeping costs low. The 32 GB scratch space provides ample temporary storage for intermediate computations.
- Enhanced Development Tools: Users can persist their work using tmux, and access their nodes remotely via SSH from platforms like VSCode and Jupyter. This offers greater control and flexibility compared to command-line access alone.
Compute-Optimized Partitions (F16, F32, F64, F72): For large-scale simulations and compute-intensive tasks with varying CPU cores and memory.
- F16: 16 CPU cores, 32 GB of memory, and 512 GB of scratch space.
- F32: 32 CPU cores, 64 GB of memory, and 2000 GB of scratch space.
- F64: 64 CPU cores, 128 GB of memory, and 2000 GB of scratch space. These nodes are ideal for large-scale simulations, data analysis, and other compute-intensive applications.
- F72: 72 CPU cores, 144 GB of memory, and 2000 GB of scratch space. A single F72 node can be called to run a meta-pipeline to sequence genomics data, with many running in parallel.
GPU-Enabled Compute Partition: For AI and machine learning workloads, as well as high-end deep learning training, the GPU-enabled partitions bring immense computational power to handle these specialized tasks.
- NVIDIA T4 GPU Options
- NVIDIA A100 GPU Option
FPGA-Enabled Compute Partition: The cluster also supports FPGA-enabled nodes from the NP series. These nodes are equipped with Xilinx U250 FPGAs, providing high-throughput, low-latency, and customizable processing capabilities for specialized workloads that benefit from parallel processing and reconfigurable logic.
Azure Data Lake Storage: The integration of Azure Data Lake Storage in both hot and cool tiers provides robust and scalable shared storage solutions. The hot tier is suitable for frequently accessed data, offering low latency and high throughput, essential for real-time data processing tasks. The cool tier is cost-effective for storing infrequently accessed data, maximizing cost effectiveness for infrequently accessed data. An archive tier is also available for ultra-low cost long-term offline storage.
Lustre Parallel File System (Coming Soon): AMLFS Lustre provides a high-performance parallel file system for workloads requiring high throughput
and low latency. It is seamlessly integrated with the HPC cluster, making it ideal for large-scale simulations and data processing tasks.
As the largest UC consumer of Azure services, UCLA Health IT has a strong partnership with Microsoft and AWS that allows us to directly negotiate discounts across the entire Health IT organization. As an academic medical institution, our negotiated discounts for higher education are among the most competitive for cloud customers and reflect UC-wide discounts on all services in addition to UCLAH-specific deep discounts on our most utilized services, primarily around high performance computing workloads and data storage. Customers can also reserve capacity in the data center for three-year terms and lock in the deepest discounts offered.
Support
UCLA Health’s Office of Health Informatics and Analytics (OHIA) can help set up and manage a high performance computing (HPC) environment for you.
We’d love to hear about your data and computing challenges and how we can help you overcome them. Please feel free to contact OHIA’s High Performance Computing Team to set up a consultation today!
Support Email: OHIAHPCSupport@mednet.ucla.edu