HPC Production Engineer
Jump Crypto
At Jump Trading, you will find a group of driven, dynamic people focused on researching, interpreting, and capitalizing on the global trading markets. Headquartered in Chicago since 1999, Jump has expanded over the years as a global operation with offices in Chicago, New York, Austin, London, Singapore, Shanghai, Amsterdam, Bristol, Sydney, Hong Kong, Paris, and Gurugram.
Jump's global HPC Team is looking to add a Production Engineer in New York City. The ideal candidate would be a hands-on individual, highly skilled in the details and nuances of managing Linux environments with a strong software development background necessary to support uniquely customized systems at scale.
What You'll Do:
- Design, implement, maintain, and support high performance compute and storage systems
- Implement and support performance monitoring and fault monitoring systems
- Monitor systems and storage performance, up to and including network components
- Build tooling to compile, package, install, and upgrade software and operating system components at scale
- Write code to automate frequently performed tasks
- Collaborate directly with researchers to optimize their use of HPC infrastructure
- Develop and improve systems and user documentation
- Develop and monitor the tools used to maintain a production computing environment
- Provide operational support on a rotating basis and as needed
- Other duties as assigned or needed
Skills You'll Need:
- Experience in high performance computing (HPC), including parallel filesystems (e.g., Lustre, GPFS), batch systems (e.g., Slurm, Grid Engine), and high-performance network interconnects experience is a plus, but not required
- Extensive experience with Linux systems administration
- High proficiency with at least one programming/scripting language (e.g., Go, Python, C)
- Extensive experience designing, building, and maintaining complicated, interdependent, and distributed systems
- Extensive experience profiling and debugging application stacks (debuggers and profilers)
- Experience with system configuration management tools (SaltStack, Ansible, Puppet, etc.)
- A compulsion to perform root cause analysis
- Reliable and predictable availability
Benefits
- Discretionary bonus eligibility- Medical, dental, and vision insurance
- HSA, FSA, and Dependent Care options
- Employer Paid Group Term Life and AD&D Insurance
- Voluntary Life & AD&D insurance
- Paid vacation plus paid holidays
- Retirement plan with employer match
- Paid parental leave
- Wellness Programs