HPC Production Engineer

Jump Crypto

Jump Crypto

New York, NY, USA
Posted on Friday, March 10, 2023

At Jump Trading, you will find a group of driven, dynamic people focused on researching, interpreting, and capitalizing on the global trading markets. Headquartered in Chicago since 1999, Jump has expanded over the years as a global operation with offices in Chicago, New York, Austin, London, Singapore, Shanghai, Amsterdam, Bristol, Sydney, Hong Kong, Paris, and Gurugram.

Jump's global HPC Team is looking to add a Production Engineer in New York City. The ideal candidate would be a hands-on individual, highly skilled in the details and nuances of managing Linux environments with a strong software development background necessary to support uniquely customized systems at scale.

What You'll Do:

  • Design, implement, maintain, and support high performance compute and storage systems
  • Implement and support performance monitoring and fault monitoring systems
  • Monitor systems and storage performance, up to and including network components
  • Build tooling to compile, package, install, and upgrade software and operating system components at scale
  • Write code to automate frequently performed tasks
  • Collaborate directly with researchers to optimize their use of HPC infrastructure
  • Develop and improve systems and user documentation
  • Develop and monitor the tools used to maintain a production computing environment
  • Provide operational support on a rotating basis and as needed
  • Other duties as assigned or needed

Skills You'll Need:

  • Experience in high performance computing (HPC), including parallel filesystems (e.g., Lustre, GPFS), batch systems (e.g., Slurm, Grid Engine), and high-performance network interconnects experience is a plus, but not required
  • Extensive experience with Linux systems administration
  • High proficiency with at least one programming/scripting language (e.g., Go, Python, C)
  • Extensive experience designing, building, and maintaining complicated, interdependent, and distributed systems
  • Extensive experience profiling and debugging application stacks (debuggers and profilers)
  • Experience with system configuration management tools (SaltStack, Ansible, Puppet, etc.)
  • A compulsion to perform root cause analysis
  • Reliable and predictable availability

Benefits

- Discretionary bonus eligibility
- Medical, dental, and vision insurance
- HSA, FSA, and Dependent Care options
- Employer Paid Group Term Life and AD&D Insurance
- Voluntary Life & AD&D insurance
- Paid vacation plus paid holidays
- Retirement plan with employer match
- Paid parental leave
- Wellness Programs

Annual Base Salary Range
$150,000$200,000 USD