Senior Engineer, Advanced Computing - HPC/Kubernetes

Senior Engineer, Advanced Computing - HPC/Kubernetes

Job Type:

Direct-Hire

Location

New York

Industry:

Financial Markets

Category:

Cloud Engineer

Compensation Range:

$195,000 - $200,000 Per Year

Job id:

24902

Additional Compensation Info:

For InfraTech's benefits, please go to https://www.infratechsolutions.com/consultant-info

Rich Text Widget

Overview
We are seeking a highly skilled Senior Engineer to contribute to the design, development, and optimization of high-performance computing (HPC) systems that support advanced research and analytics. This individual will work across multiple infrastructure domains including compute, storage, and Kubernetes orchestration, both on-premises and in the cloud.

Responsibilities

  • Build, scale, and optimize the firm’s HPC environment supporting research workloads.

  • Develop and maintain a global storage platform spanning multiple datacenters and environments.

  • Contribute to the administration and optimization of Linux servers.

  • Develop and manage Kubernetes clusters across AWS and on-premises environments.

  • Enhance and support proprietary task scheduling systems for high-throughput computing.

Cultural Fit
  • Naturally curious and passionate about emerging technologies and continuous learning.

  • Strong collaborator with excellent communication skills.

  • Creative problem-solver who designs innovative solutions rather than reassembling existing ones.

  • Committed to automation and infrastructure-as-code best practices.

Technical Qualifications
  • Deep expertise in Linux systems (kernel operations, performance tuning, troubleshooting).

  • Extensive experience in software and systems design.

  • Strong programming skills in at least one compiled language (C++ or Go preferred) and one interpreted language (Python, Ruby, or Perl).

  • Proven experience with Kubernetes architecture, customization, and advanced deployment patterns.

  • Skilled at diagnosing and resolving complex, low-level system issues.

  • Understanding of AI/ML workloads and their unique infrastructure requirements.

  • Background in performance engineering and modern HPC environments.

Preferred Experience (Bonus Points)
  • Enterprise storage systems and parallel filesystems (Vast Data, Weka, NetApp, Pure, Lustre, GPFS).

  • Infrastructure-as-code tools such as Terraform, Ansible, or Puppet.

  • Observability and monitoring stacks (Prometheus, Grafana, etc.).

  • Networking protocols and troubleshooting within distributed systems.

  • Experience with advanced I/O hardware configuration and optimization.

Apply Now
Apply Now
Share this Job
Read More
SCHEMA MARKUP ( This text will only show on the editor. )
Back to Job Search Back to Job Search