
OpenAI and Broadcom Unveil Jalapeño: First Custom Chip Targets LLM Inference
Nine months from blank page to tape-out - OpenAI's first chip is designed for inference only, and was partly designed by OpenAI's own AI models.
OpenAI built its first custom AI chip. Developed with Broadcom, the OpenAI Jalapeño chip is an inference accelerator built from the ground up for LLM workloads - not adapted from earlier general-purpose silicon. Nine months separated initial design from manufacturing tape-out, a pace OpenAI describes as the fastest ASIC development cycle ever achieved in high-performance advanced semiconductors.
OpenAI's Own Models Helped Design the Chip That Runs Them
OpenAI used its own AI models to accelerate parts of the Jalapeño chip design and optimization process, compressing what would typically be a multi-year ASIC timeline down to nine months. Engineering samples already run live ML workloads in the lab - including GPT-5.3-Codex-Spark - at production target frequency and power. Nine months is fast. That speed reflects a feedback loop where the same models served to ChatGPT users are now helping design the infrastructure used to run future models.
OpenAI President Greg Brockman described the OpenAI Jalapeño chip as part of a full-stack infrastructure play. "By designing more of the stack ourselves, we can serve more intelligence with greater efficiency," Brockman said. Richard Ho leads OpenAI's hardware program. Ho said the team optimized the architecture around kernels, memory movement, networking, and serving patterns that matter most for frontier AI models.
Jalapeño Handles Inference, Not Training
Jalapeño handles inference - running pre-built models in response to user requests - not pre-training, where Nvidia GPUs remain OpenAI's default hardware. Pre-training stays on Nvidia. OpenAI's architecture reduces data movement and balances compute, memory, and networking to bring realized utilization closer to theoretical peak performance, which is where most AI accelerators leave significant efficiency gains on the table.
Broadcom contributes Tomahawk networking silicon and chip implementation expertise. Third partner Celestica handles board, rack, and system integration. Broadcom CEO Hock Tan confirmed plans to deploy Jalapeño at gigawatt scale with Microsoft and other data center partners, starting by the end of 2026.
OpenAI Joins Google and Amazon in Building Custom Silicon
Google has shipped TPUs for inference since 2016. Amazon followed with Trainium for training and Inferentia for inference - both designed to give AWS GPU-level performance without paying Nvidia rates. OpenAI joining that group puts it alongside two of Nvidia's largest customers who have spent years reducing their GPU dependency, and for Nvidia, losing inference workloads from its biggest AI customer is worth watching even if pre-training stays on GPUs.
OpenAI Jalapeño chip targets all LLMs, not just OpenAI's own models - meaning Broadcom could eventually offer the platform to other AI companies looking to reduce their Nvidia spend. Developers on the API should see lower per-token pricing over time, though OpenAI has announced no pricing changes yet. A detailed technical report will follow in the coming months.
Jalapeño deployment starts by the end of 2026, expanding across gigawatt-scale data centers run by Microsoft and other partners. Nvidia has not commented publicly.