We are seeking a highly skilled Senior Engineer to contribute to the design, development, and optimization of high-performance computing (HPC) systems that support advanced research and analytics. This individual will work across multiple infrastructure domains including compute, storage, and Kubernetes orchestration, both on-premises and in the cloud.
Responsibilities
-
Build, scale, and optimize the firm’s HPC environment supporting research workloads.
-
Develop and maintain a global storage platform spanning multiple datacenters and environments.
-
Contribute to the administration and optimization of Linux servers.
-
Develop and manage Kubernetes clusters across AWS and on-premises environments.
-
Enhance and support proprietary task scheduling systems for high-throughput computing.
-
Naturally curious and passionate about emerging technologies and continuous learning.
-
Strong collaborator with excellent communication skills.
-
Creative problem-solver who designs innovative solutions rather than reassembling existing ones.
-
Committed to automation and infrastructure-as-code best practices.
-
Deep expertise in Linux systems (kernel operations, performance tuning, troubleshooting).
-
Extensive experience in software and systems design.
-
Strong programming skills in at least one compiled language (C++ or Go preferred) and one interpreted language (Python, Ruby, or Perl).
-
Proven experience with Kubernetes architecture, customization, and advanced deployment patterns.
-
Skilled at diagnosing and resolving complex, low-level system issues.
-
Understanding of AI/ML workloads and their unique infrastructure requirements.
-
Background in performance engineering and modern HPC environments.
-
Enterprise storage systems and parallel filesystems (Vast Data, Weka, NetApp, Pure, Lustre, GPFS).
-
Infrastructure-as-code tools such as Terraform, Ansible, or Puppet.
-
Observability and monitoring stacks (Prometheus, Grafana, etc.).
-
Networking protocols and troubleshooting within distributed systems.
-
Experience with advanced I/O hardware configuration and optimization.