HPC Engineer at Symbolic Mind

cscherrer · July 29, 2025, 1:08pm

We’re hiring! No official posting, but this is us:

I can’t go into too much detail on our approach, but at a high level we use discrete encoding instead of high-dimensional vectors, and we’re much more CPU focused than typical AI.

We need people to help us scale to new machines with hardware support for hundreds of threads. We’re building a team and value complementary expertise. Strong candidates in any subset of these technical areas are encouraged to apply.

DM me your CV if you think you’d be a good fit!
UPDATE: Looks like Discourse doesn’t like PDF attachments. You can send to chad ampersand symbolicmind period ai (trying to confuse the scrapers here)

Position Overview

Design and optimize massively parallel sequence processing algorithms for NUMA architectures (50+ threads)
Develop high-performance Julia implementations of string/sequence algorithms at terabyte scale
Solve memory hierarchy challenges in shared-memory systems with irregular data access patterns
Build NUMA-aware pipelines for multi-stage sequence processing workflows

Core Technical Requirements

NUMA & Parallel Systems

5+ years scaling algorithms to 50+ core shared-memory systems
Expert-level understanding of NUMA topology optimization and memory controller bottlenecks
Hands-on experience with thread affinity management and CPU pinning strategies
Proven track record profiling and resolving memory bandwidth saturation issues
Experience with dynamic load balancing across NUMA nodes for variable-length workloads

Algorithm Optimization

Deep expertise in large-scale string/sequence processing algorithms (suffix arrays, dynamic programming, graph traversal)
Experience adapting algorithms for memory hierarchy constraints (L3 cache sharing, TLB optimization)
Background in lock-free data structures and work-stealing algorithms for high-throughput systems
Knowledge of pipeline parallelization with stage-wise NUMA affinity management

Julia Programming

3+ years advanced Julia development with focus on high-performance computing
Expert knowledge of Julia’s threading model (@threads, @spawn, task scheduling)
Experience optimizing Julia code for minimal GC pressure under heavy multithreading
Understanding of SIMD utilization and loop optimization in Julia
Familiarity with Julia HPC ecosystem (BenchmarkTools.jl, ProfileView.jl, etc.)

Desired Experience

Domain Background (Any of the following)

Bioinformatics: Sequence alignment, genome assembly, phylogenetic algorithms
Computational Linguistics: Large-scale text processing, tokenization, parsing pipelines
High-Performance Computing: Computational physics, scientific computing, or similar scale-intensive domains

Performance Engineering

Experience with NUMA-specific profiling tools (Intel VTune, perf, RAR/MCLPKI analysis)
Track record of identifying scaling bottlenecks beyond 50+ cores
Knowledge of memory interleaving vs. locality trade-offs in practice
Understanding of cache performance optimization for irregular memory access patterns

Technical Leadership

Experience architecting systems that process TB+ datasets efficiently
Background in algorithm adaptation for specific hardware constraints
Contributions to open-source HPC projects or domain-specific libraries
Experience mentoring teams on performance optimization techniques

Key Responsibilities

Architect and implement NUMA-optimized sequence processing pipelines
Profile and resolve memory bandwidth bottlenecks in multi-hundred thread systems
Design work distribution strategies that minimize cross-NUMA synchronization
Collaborate with domain experts to translate research algorithms into production systems
Lead performance optimization initiatives for Julia-based computational workflows
Develop reusable libraries for high-performance sequence processing

Technical Assessment Areas

NUMA optimization scenarios: Memory placement strategies for variable-length sequence data
Julia performance debugging: Identifying and resolving GC pressure and threading bottlenecks
Algorithm adaptation: Modifying sequence algorithms for 50+ thread execution
Systems architecture: Designing pipelines that balance computation and memory bandwidth

Topic		Replies	Views
Decrease in performance using Threads.@threads in Linux Julia at Scale	16	2067	July 23, 2019
Parallel Computing with Threads.@threads in HPC is slow? Performance	12	799	July 21, 2024
[ANN] NumaAllocators.jl: Non-Uniform Memory Access extension of ArrayAllocators.jl for HPC Package Announcements announcement , array , memory-allocation , numa	1	656	May 5, 2022
How to achieve perfect scaling with Threads (Julia 1.7.1) Performance multithreading	33	2613	January 13, 2023
Computer specific slowdown on multi-threading on computer cluster (Linux)? Performance installation	27	3143	November 11, 2021