SC21 Paper Submissions - Super Computing
Overview
The SC Papers program is the leading venue for presenting high-quality original research, groundbreaking ideas, and compelling insights on future trends in high performance computing, networking, storage, and analysis. Technical papers are peer-reviewed and an Artifact Description is mandatory for all papers submitted to SC.
Areas/Tracks
Submissions will be considered on any topic related to high performance computing within the areas below. Authors must indicate a primary area from the choices on the submissions form and are strongly encouraged to indicate a secondary area.
Small-scale studies – including single-node studies – are welcome as long as the paper clearly conveys the work’s contribution to high performance computing.
Algorithms 算法
The development, evaluation, and optimization of scalable, general-purpose, high performance algorithms.
Topics include:
- Algorithms for discrete and combinatorial optimization
- Algorithms for hybrid and heterogeneous systems with accelerators
- Algorithms for numerical methods and algebraic systems
- Data-intensive parallel algorithms
- Energy- and power-efficient algorithms
- Fault-tolerant algorithms
- Graph and network algorithms
- Load balancing and scheduling algorithms
- Uncertainty quantification methods
- Other high performance computing algorithms
Applications 应用
The development and enhancement of algorithms, parallel implementations, models, software and problem solving environments for specific applications that require high performance resources.
Topics include:
- Bioinformatics and computational biology
- Computational earth and atmospheric sciences
- Computational materials science and engineering
- Computational astrophysics/astronomy, chemistry, and physics
- Computational fluid dynamics and mechanics
- Computation and data enabled social science
- Computational design optimization for aerospace, energy, manufacturing, and industrial applications
- Computational medicine and bioengineering
- Improved models, algorithms, performance or scalability of specific applications and respective software
- Use of uncertainty quantification, statistical, and machine-learning techniques to improve a specific HPC application
- Other high performance applications
Architecture and Networks 架构和网络
All aspects of high performance hardware including the optimization and evaluation of processors and networks.
Topics include:
- Architectures to support extremely heterogeneous composable systems (e.g., chiplets)
- Design-space exploration / Performance projection for future systems
- Evaluation and measurement on testbed or production hardware systems
- Hardware acceleration of containerization and virtualization mechanisms for HPC
- Interconnect technologies, topology, switch architecture, optical networks, software-defined networks
- I/O architecture/hardware and emerging storage technologies
- Memory systems: caches, memory technology, non-volatile memory, memory system architecture (to include address translation for cores and accelerators)
- Multi-processor architecture and micro-architecture (e.g. reconfigurable, vector, stream, dataflow, GPUs, and custom/novel architecture)
- Network protocols, quality of service, congestion control, collective communication
- Power-efficient design and power-management strategies
- Resilience, error correction, high availability architectures
- Scalable and composable coherence (for cores and accelerators)
- Secure architectures, side-channel attacks, and mitigation
- Software/hardware co-design, domain specific language support
Clouds and Distributed Computing 云计算和分布式计算
Cloud and system software architecture, configuration, optimization and evaluation, support for parallel programming on large-scale systems or building blocks for next-generation HPC architectures.
Topics include:
- HPC, cloud, and edge computing convergence at infrastructure and software level, including service-oriented architectures and tools
- Job/workflow scheduling, load balancing, resource provisioning, energy efficiency, fault tolerance, and reliability
- Methods, systems, and architectures for big data and data stream processing in HPC and cloud systems
- OS/runtime and system-software enhancements for many-core systems, accelerators, complex memory space/hierarchies, I/O, and network structures
- Parallel programming models and tools at the intersection of cloud, edge, and HPC
- Self-configuration, management, information services, monitoring, and introspective system software
- Security and identity management in HPC and cloud systems
- Scalable HPC and machine learning case studies on distributed and/or cloud systems
- Virtualization and containerization to support HPC and emerging uses such as machine learning
Data Analytics, Visualization, and Storage 数据分析,可视化和存储
All aspects of data analytics, visualization, storage, and storage I/O related to HPC systems. Submissions on work done at scale are highly favored.
Topics include:
- Cloud-based analytics at scale
- Databases and scalable structured storage for HPC
- Data mining, analysis, and visualization for modeling and simulation
- Data analytics and frameworks supporting data analytics
- Ensemble analysis and visualization
- I/O performance tuning, benchmarking, and middleware
- Next-generation storage systems and media
- Parallel file, object, key-value, campaign, and archival systems
- Provenance, metadata, and data management
- Reliability and fault tolerance in HPC storage
- Scalable storage, metadata, namespaces, and data management
- Storage tiering, entirely on-premise internal tiering as well as tiering between on-premise and cloud
- Storage innovations using machine learning such as predictive tiering, failure, etc.
- Storage networks 存储网络
- Scalable Cloud, Multi-Cloud, and Hybrid storage
- Storage systems for data-intensive computing 计算密集型存储系统
Machine Learning and HPC 机器学习和高性能计算
The development and enhancement of algorithms, systems, and software for scalable machine learning utilizing high-performance and cloud computing platforms.
Topics include:
- ML for HPC / HPC for ML
- Data parallelism and model parallelism
- Efficient hardware for machine learning
- Hardware-efficient training and inference
- Performance modeling of machine learning applications
- Scalable optimization methods for machine learning
- Scalable hyper-parameter optimization
- Scalable neural architecture search
- Scalable IO for machine learning
- Systems, compilers, and languages for machine learning at scale
- Testing, debugging, and profiling machine learning applications
- Visualization for machine learning at scale
Performance Measurement, Modeling, and Tools 性能度量,建模和工具
Novel methods and tools for measuring, evaluating, and/or analyzing performance for large scale systems.
Topics include:
- Analysis, modeling, or simulation methods for performance
- Methodologies, metrics, and formalisms for performance analysis and tools
- Novel and broadly applicable performance optimization techniques
- Performance studies of HPC hardware and software subsystems such as processor, network, memory, accelerators, and storage
- Scalable tools and instrumentation infrastructure for measurement, monitoring, and/or visualization of performance
- System-design tradeoffs between performance and other metrics (e.g., performance and resilience, performance and security)
- Workload characterization and benchmarking techniques
Programming Systems 编程系统
Technologies that support parallel programming for large-scale systems as well as smaller-scale components that will plausibly serve as building blocks for next-generation HPC architectures.
Topics include:
- Compiler analysis and optimization; program transformation
- Parallel programming languages, libraries, models, and notations
- Parallel application frameworks
- Programming language and compilation techniques for reducing energy and data movement (e.g., precision allocation, use of approximations, tiling)
- Program analysis, synthesis, and verification to enhance cross-platform portability, maintainability, result reproducibility, resilience (e.g., combined static and dynamic analysis methods, testing, formal methods)
- Runtime systems as they interact with programming systems
- Solutions for parallel-programming challenges (e.g., interoperability, memory consistency, determinism, race detection, work stealing, or load balancing)
- Tools for parallel program development (e.g., debuggers and integrated development environments)
State of the Practice 工程实践
All R&D aspects of the pragmatic practices of HPC, including operational IT infrastructure, services, facilities, large-scale application executions and benchmarks.
Topics include:
- Bridging of cloud data centers and supercomputing centers
- Comparative system benchmarking over a wide spectrum of workloads
- Containers at scale: performance and overhead
- Deployment experiences of large-scale infrastructures and facilities
- Facilitation of “big data” associated with supercomputing
- Infrastructural policy issues, especially international experiences
- Long-term infrastructural management experiences
- Pragmatic resource management strategies and experiences
- Procurement, technology investment and acquisition best practices
- Quantitative results of education, training and dissemination activities
- Software engineering best practices for HPC
- User support experiences with large-scale and novel machines
- Reproducibility of data
System Software 系统软件
Operating system (OS), runtime system and other low-level software research & development that enables allocation and management of hardware resources for HPC applications and services.
Topics include:
- Alternative and specialized parallel operating systems and runtime systems
- Approaches for enabling adaptive and introspective system software
- Communication optimization
- Software distributed shared memory systems
- System-software support for global address spaces
- OS and runtime system enhancements for attached and integrated accelerators
- Interactions among the OS, runtime, compiler, middleware, and tools
- Parallel/networked file system integration with the OS and runtime
- Resource management
- Runtime and OS management of complex memory hierarchies
- System software strategies for controlling energy and temperature
- Support for fault tolerance and resilience
- Virtualization and virtual machines