We’re hiring! No official posting, but this is us:
I can’t go into too much detail on our approach, but at a high level we use discrete encoding instead of high-dimensional vectors, and we’re much more CPU focused than typical AI.
We need people to help us scale to new machines with hardware support for hundreds of threads. We’re building a team and value complementary expertise. Strong candidates in any subset of these technical areas are encouraged to apply.
DM me your CV if you think you’d be a good fit!
UPDATE: Looks like Discourse doesn’t like PDF attachments. You can send to chad ampersand symbolicmind period ai (trying to confuse the scrapers here)
Position Overview
- Design and optimize massively parallel sequence processing algorithms for NUMA architectures (50+ threads)
- Develop high-performance Julia implementations of string/sequence algorithms at terabyte scale
- Solve memory hierarchy challenges in shared-memory systems with irregular data access patterns
- Build NUMA-aware pipelines for multi-stage sequence processing workflows
Core Technical Requirements
NUMA & Parallel Systems
- 5+ years scaling algorithms to 50+ core shared-memory systems
- Expert-level understanding of NUMA topology optimization and memory controller bottlenecks
- Hands-on experience with thread affinity management and CPU pinning strategies
- Proven track record profiling and resolving memory bandwidth saturation issues
- Experience with dynamic load balancing across NUMA nodes for variable-length workloads
Algorithm Optimization
- Deep expertise in large-scale string/sequence processing algorithms (suffix arrays, dynamic programming, graph traversal)
- Experience adapting algorithms for memory hierarchy constraints (L3 cache sharing, TLB optimization)
- Background in lock-free data structures and work-stealing algorithms for high-throughput systems
- Knowledge of pipeline parallelization with stage-wise NUMA affinity management
Julia Programming
- 3+ years advanced Julia development with focus on high-performance computing
- Expert knowledge of Julia’s threading model (
@threads
,@spawn
, task scheduling) - Experience optimizing Julia code for minimal GC pressure under heavy multithreading
- Understanding of SIMD utilization and loop optimization in Julia
- Familiarity with Julia HPC ecosystem (BenchmarkTools.jl, ProfileView.jl, etc.)
Desired Experience
Domain Background (Any of the following)
- Bioinformatics: Sequence alignment, genome assembly, phylogenetic algorithms
- Computational Linguistics: Large-scale text processing, tokenization, parsing pipelines
- High-Performance Computing: Computational physics, scientific computing, or similar scale-intensive domains
Performance Engineering
- Experience with NUMA-specific profiling tools (Intel VTune, perf, RAR/MCLPKI analysis)
- Track record of identifying scaling bottlenecks beyond 50+ cores
- Knowledge of memory interleaving vs. locality trade-offs in practice
- Understanding of cache performance optimization for irregular memory access patterns
Technical Leadership
- Experience architecting systems that process TB+ datasets efficiently
- Background in algorithm adaptation for specific hardware constraints
- Contributions to open-source HPC projects or domain-specific libraries
- Experience mentoring teams on performance optimization techniques
Key Responsibilities
- Architect and implement NUMA-optimized sequence processing pipelines
- Profile and resolve memory bandwidth bottlenecks in multi-hundred thread systems
- Design work distribution strategies that minimize cross-NUMA synchronization
- Collaborate with domain experts to translate research algorithms into production systems
- Lead performance optimization initiatives for Julia-based computational workflows
- Develop reusable libraries for high-performance sequence processing
Technical Assessment Areas
- NUMA optimization scenarios: Memory placement strategies for variable-length sequence data
- Julia performance debugging: Identifying and resolving GC pressure and threading bottlenecks
- Algorithm adaptation: Modifying sequence algorithms for 50+ thread execution
- Systems architecture: Designing pipelines that balance computation and memory bandwidth