HPC Engineer at Symbolic Mind

We’re hiring! No official posting, but this is us:

I can’t go into too much detail on our approach, but at a high level we use discrete encoding instead of high-dimensional vectors, and we’re much more CPU focused than typical AI.

We need people to help us scale to new machines with hardware support for hundreds of threads. We’re building a team and value complementary expertise. Strong candidates in any subset of these technical areas are encouraged to apply.

DM me your CV if you think you’d be a good fit!
UPDATE: Looks like Discourse doesn’t like PDF attachments. You can send to chad ampersand symbolicmind period ai (trying to confuse the scrapers here)

Position Overview

  • Design and optimize massively parallel sequence processing algorithms for NUMA architectures (50+ threads)
  • Develop high-performance Julia implementations of string/sequence algorithms at terabyte scale
  • Solve memory hierarchy challenges in shared-memory systems with irregular data access patterns
  • Build NUMA-aware pipelines for multi-stage sequence processing workflows

Core Technical Requirements

NUMA & Parallel Systems

  • 5+ years scaling algorithms to 50+ core shared-memory systems
  • Expert-level understanding of NUMA topology optimization and memory controller bottlenecks
  • Hands-on experience with thread affinity management and CPU pinning strategies
  • Proven track record profiling and resolving memory bandwidth saturation issues
  • Experience with dynamic load balancing across NUMA nodes for variable-length workloads

Algorithm Optimization

  • Deep expertise in large-scale string/sequence processing algorithms (suffix arrays, dynamic programming, graph traversal)
  • Experience adapting algorithms for memory hierarchy constraints (L3 cache sharing, TLB optimization)
  • Background in lock-free data structures and work-stealing algorithms for high-throughput systems
  • Knowledge of pipeline parallelization with stage-wise NUMA affinity management

Julia Programming

  • 3+ years advanced Julia development with focus on high-performance computing
  • Expert knowledge of Julia’s threading model (@threads, @spawn, task scheduling)
  • Experience optimizing Julia code for minimal GC pressure under heavy multithreading
  • Understanding of SIMD utilization and loop optimization in Julia
  • Familiarity with Julia HPC ecosystem (BenchmarkTools.jl, ProfileView.jl, etc.)

Desired Experience

Domain Background (Any of the following)

  • Bioinformatics: Sequence alignment, genome assembly, phylogenetic algorithms
  • Computational Linguistics: Large-scale text processing, tokenization, parsing pipelines
  • High-Performance Computing: Computational physics, scientific computing, or similar scale-intensive domains

Performance Engineering

  • Experience with NUMA-specific profiling tools (Intel VTune, perf, RAR/MCLPKI analysis)
  • Track record of identifying scaling bottlenecks beyond 50+ cores
  • Knowledge of memory interleaving vs. locality trade-offs in practice
  • Understanding of cache performance optimization for irregular memory access patterns

Technical Leadership

  • Experience architecting systems that process TB+ datasets efficiently
  • Background in algorithm adaptation for specific hardware constraints
  • Contributions to open-source HPC projects or domain-specific libraries
  • Experience mentoring teams on performance optimization techniques

Key Responsibilities

  • Architect and implement NUMA-optimized sequence processing pipelines
  • Profile and resolve memory bandwidth bottlenecks in multi-hundred thread systems
  • Design work distribution strategies that minimize cross-NUMA synchronization
  • Collaborate with domain experts to translate research algorithms into production systems
  • Lead performance optimization initiatives for Julia-based computational workflows
  • Develop reusable libraries for high-performance sequence processing

Technical Assessment Areas

  • NUMA optimization scenarios: Memory placement strategies for variable-length sequence data
  • Julia performance debugging: Identifying and resolving GC pressure and threading bottlenecks
  • Algorithm adaptation: Modifying sequence algorithms for 50+ thread execution
  • Systems architecture: Designing pipelines that balance computation and memory bandwidth
13 Likes