Jane Street on GPUs, Trading, and Hiring: Summary of Dwarkesh Podcast
Jane Street on GPUs, Trading, and Hiring
This post summarizes the podcast interview between Ron Minsky (co-head of the Technology Group at Jane Street), Dan Ponttovo (head of the Physical Engineering team at Jane Street), and podcast host Dwarkesh Patel at Jane Street’s Texas data center.
The summary is organized in a hierarchical, structured format (major/minor categories) to serve as a comprehensive reference for self-directed learning. It covers how Jane Street uses GPUs, their approach to trading at various time horizons, data center infrastructure constraints, and their current hiring landscape.
1. Trading Time Horizons & Hardware Spectrum
An overview of the trade-offs between computational latency and decision complexity, and the optimized hardware stack for each trading horizon.
- Ultra-Low Latency Regime (Under 100 Nanoseconds)
- Hardware Stack: Instead of CPUs, the system relies on FPGAs (Field Programmable Gate Arrays) that are directly attached to the network fiber.
- Latency Level: Decisions are processed so fast that if an oscilloscope were attached to the incoming and outgoing wires, the start of the outgoing packet would leave before the incoming packet is fully consumed. In this regime, programming language choices (OCaml vs. Rust vs. C++) do not matter; the execution is handled entirely in hardware.
- Decision Complexity: Due to severe time constraints, the models must remain extremely simple.
- The Ensemble Approach
- Microseconds to Milliseconds Regime: As latency constraints loosen, systems transition to hybrid architectures (CPU, FPGA, GPU) to run progressively smarter and more complex models.
- Hours to Days Regime: Decisions that can be completed within hours or by the end of the day utilize large, complex models to maximize prediction quality.
- Strategy Design: An optimal trading strategy is an ensemble of these diverse horizons, balancing ultra-fast, simple decisions with slower, highly sophisticated predictions.
- Prediction Targets
- Fair Value Prediction: One of the most fundamental targets is predicting the true economic value of a financial asset. This serves as a highly composable building block across many trading workflows.
- Physical Allocation: Depending on latency and compute demands, model inference is assigned to CPU, FPGA, or GPU. For ultra-low latency, machines are placed in colocation facilities right next to exchanges, with fiber cable runs measured down to the millimeter. Slower, larger models can be physically located in remote research data centers where space, power, and cooling constraints are more flexible.
2. The $6B CoreWeave Deal & AI Strategy
Why Jane Street signed a massive $6 billion compute deal and how their AI development model differs from traditional foundation AI laboratories.
- Purpose of the $6B Compute Deal
- Goal: To provide researchers with rapid iteration times and enable a high level of experimentation and architectural diversity.
- Value Creation: Empowering researchers to try many different model designs quickly is essential for discovering novel ideas and driving financial innovation.
- Differentiation from Foundation AI Labs
- Foundation Labs: Focus on training a single, highly generalized model that performs all tasks.
- Jane Street: Prioritizes specialized, bespoke models tailored to consume specific financial data feeds.
- Characteristics of Financial Data
- High Noise: Financial data is extremely noisy compared to natural language or vision.
- Low Byte-to-Flop Ratio: The volume of raw data (bytes) ingested is massive, but the information density per byte is relatively low.
- Small Models, Large Scale: Models tend to be physically smaller but are trained on vast, noisy datasets, requiring unique data loading, storage, and throughput optimizations.
3. Inference Workload Engineering
The characteristics of real-time production inference workloads at Jane Street compared to consumer LLM services.
- Extreme Latency Sensitivity
- Unlike conversational AI chatbots, inference response times directly impact trade execution quality and profitability, making latency minimization critical.
- Symbol Disaggregation & Batching
- Models or segments of models are often disaggregated based on specific trading symbols.
- Pulling real-time feed data from multiple sources and batching them efficiently is a major engineering challenge.
- High Sequential Data Rates
- Conversational LLMs: High aggregate volume from millions of users, but individual user streams are slow and conversational.
- Trading Systems: Requirs consuming highly sequential, causally connected data feeds (e.g., NASDAQ data streams) in a single domain at extremely high speeds.
- Engineering Focus: The fundamental engineering questions are similar to hyperscaler architectures, but because the constants are different, systems are designed with a heavy emphasis on data-loading performance.
4. Infrastructure, Distributed Storage & Compute
How Jane Street manages distributed computing resources and storage across multiple data centers.
- Internal Object Store
- To support massive research workloads, Jane Street is building out its own custom large-scale internal object storage system rather than relying solely on commercial off-the-shelf vendor products.
- Data Center Disaggregation
- Physical Limits: It is impossible to pull enough power into a single facility to run all required GPUs.
- Architectural Impact: Compute and storage scheduling are tightly intertwined. Because moving large volumes of data between centers is expensive, scheduling algorithms must optimize for data locality across geographically distributed sites.
- Moving Beyond x86-64
- The Past: Jane Street simplified its stack for years by assuming x86-64 was the only hardware architecture.
- The Present: NVIDIA’s latest server platforms require supporting ARM, prompting Jane Street to transition to a heterogeneous, multi-architecture environment.
5. AGI & Financial Trading
The viability of financial trading in an AGI era and the enduring value of human cognitive ability.
- Trading as an AGI-Complete/NP-Complete Problem
- Trading requires predicting the future value of assets, which depends on a vast array of real-world variables, making the problem fundamentally AGI-complete.
- Human Judgment in Phase Transitions
- AI models perform well under normal market conditions, but human traders excel at managing “phase transitions”—periods of extreme market volatility and black swan events.
- Since highly volatile days represent the most profitable opportunities for liquidity providers, human meta-judgment (overseeing and adjusting automated models) remains indispensable.
- Non-Electronic & Alternative Asset Trading
- A significant portion of trading still occurs via human chat and voice communications.
- Assessing risks like “adverse selection” from counterparties requires qualitative human judgment.
- Certain asset classes, such as bonds, have been slower to automate compared to equities, keeping human intermediation crucial.
6. Data Center Infrastructure & Supply Chain Bottlenecks
How data center design has evolved and how Jane Street manages infrastructure constraints to deploy chips faster.
- Elevation of Data Center Engineering
- Cooling, power, and physical engineering have shifted from low-profile back-end operations to high-priority business decisions.
- Managing Supply Chain Lead Times
- Bottleneck Components: Diesel generators, electrical transformers, and specialized liquid cooling components.
- Business Trade-offs: Instead of waiting for long-lead-time generators to back up an entire facility, Jane Street may choose to bypass generator backup for non-critical GPU clusters. Accepting lower resiliency for training clusters can speed up GPU deployment by six months.
- Procurement & Modular Infrastructure
- Fungible infrastructure components are purchased in advance and warehoused.
- To reduce onsite construction time, Jane Street adopts modular data center infrastructure, building long-lead components offsite and shipping them for plug-and-play assembly.
- Addressing Power Density
- Density Trends: Designing systems to support up to 1-megawatt racks.
- Technical Adaptations: Utilizing larger liquid cooling pipes and preparing for 800V DC power delivery.
- Hardware Integration: Infrastructure decisions must be made more than a year before chips are ordered. For example, TPUs operate with lower temperature water and half the density of NVIDIA GB200 systems, requiring separate, dedicated facility plans.
7. Value of Excess/Reserve Compute
How Jane Street extracts business value from reserve or standby compute capacity.
- Constant Compute Constraints
- Jane Street is rarely concerned with over-provisioning compute. The pipeline of research ideas and experiments is so vast that the firm operates under constant compute limitations.
- Secondary Workloads (Fallback Tasks)
- Model Retraining: Financial models decay over time due to shifting market dynamics. Continually retraining models is a high-value task that consumes spare compute capacity.
- Bulk Inference: Non-real-time tasks, such as massive backtesting and simulations, are scheduled to run whenever real-time demands drop.
- Bifurcating Power and Chips
- To manage risk, Jane Street separates long-term commitments for power and data center capacity from the actual purchase of expensive, fast-evolving silicon.
- If compute demands drop, it is much easier to sublease data center space and power to third parties than to offload excess chips.
8. Hiring Areas & Cultural Aspects
Jane Street’s organizational bottleneck and the specific roles they are looking to fill.
- Mentorship and Culture as the Core Bottleneck
- Scaling GPU clusters from tens of thousands to hundreds of thousands is purely a capital and logistics problem.
- The actual bottleneck for growth is recruiting exceptional talent, integrating them into the firm’s culture, and maintaining adequate mentorship capacity.
- Key Hiring Areas
- Physical Engineering: Mechanical, electrical, structural, and project management professionals to design, construct, and operate data centers.
- Machine Learning & Trading: Researchers experienced in custom neural network architectures and LLM pre-training, along with quantitative traders from math, physics, and computer science backgrounds.
- Software Engineering:
- Generalist SWEs: Strong CS fundamentals to build and optimize general business systems.
- Fleet-wide Optimization: Engineers with experience optimizing massive compute clusters at hyperscaler scale.
- Hardware Engineering: ASIC designers and hardware engineers.
- Formal Methods: A newly formed, speculative team using mathematical proofs to verify code correctness and enhance software reliability in the era of AI-generated code.
- Front-end Engineering: Building modern, web-based tools and visual interfaces to help traders and researchers navigate complex datasets, moving away from legacy CLI/terminal-only tools.
- Culture of Puzzles and Human Tooling
- Puzzles: Puzzles are deeply embedded in the company culture, serving as a primary tool for public outreach and recruiting (e.g., janestreet.com/dwarkesh).
- Human-Centric Approach: Jane Street designs its internal AI tooling to enhance, rather than replace, human understanding, agency, and efficiency.
Leave a comment