-
AI SDK & Framework Engineer2026 届校园招聘2025年12月25日
The Opportunity
You will make our high-performance AI silicon accessible to the world. While the Runtime team builds the engine, you build the steering wheel. You will develop the Python SDK, integrating our C++ runtime into ecosystems like PyTorch, ONNX, or IREE. You define how data scientists interact with our chip—from "import aisoc" to running real-world LLMs and vision models end-to-end on our device.
Key Responsibilities
· Python/C++ Bridging: Build efficient bindings (using
pybind11) that allow Python users to drive our low-level C++ Runtime and memory allocator with minimal overhead and zero-copy where possible.· Model Ingestion: Build practical model import tools: weight packing/layout transforms, graph partitioning to supported ops, and integration with existing quantization/calibration workflows.
· Developer Experience (DX): Ensure that when a user makes a mistake, they get a helpful Python exception, not a silent segmentation fault.
· Golden Reference Examples: Build and maintain the "Hello World" and "Chatbot" demos that verify the entire hardware/software stack is functioning correctly.
What We Will Teach You
· The internals of modern ML frameworks (how PyTorch dispatch works, how ONNX graphs are structured).
· How to build and ship Python wheels and native extensions for our target runtime environment (Embedded Linux).
· Techniques for zero-copy memory sharing between Python (numpy) and hardware accelerators.
Must-Have Qualifications
· Strong proficiency in Python (you understand decorators, context managers, and the Global Interpreter Lock).
· Working knowledge of C++ (you can read a header file and understand what needs to be exposed to Python).
· Familiarity with ML Data Structures: You know that a "Tensor" is just a pointer to memory with shape and stride metadata.
Nice-to-Have (We Value Projects!)
· Experience with ONNX Runtime, TVM, or MLIR.
· Experience building Python wheels or C-extensions (
pybind11,Cython).campus@espressif.com
-
AI Kernel & Performance Engineer2026 届校园招聘2025年12月25日
The Opportunity
You will be the reason our chip is fast. You will write the hand-tuned kernels that power Large Language Models (LLMs) on our custom RISC-V hardware. You will work directly with hardware architects to exploit our proprietary Matrix (RVM) and Vector (RVV) extensions, squeezing every last FLOP out of the silicon.
Key Responsibilities
· Kernel Implementation: Write kernels for GEMM and common epilogues (bias/activation/quant); implement Softmax/RMSNorm; evolve toward attention kernels as the project matures.
· Micro-Optimization: Analyze assembly output. Did the compiler unroll the loop? Did we stall on a memory load? You fix it.
· Tiling & Layout: Calculate the optimal way to chop a large tensor into "tiles" that fit in our L1 cache/TCM.
· Benchmarking: Build the "speedometer" for the chip. Prove your kernel is faster than the baseline.
What We Will Teach You
· Our proprietary RVM (Matrix) and RVV (Vector) intrinsic APIs.
· How to use our cycle-accurate profilers and hardware counters.
· The specific memory hierarchy constraints of our AI SoC.
Must-Have Qualifications
· Strong C/C++ skills, specifically with a math/logic focus.
· Understanding of Computer Architecture basics: Registers, Cache Hierarchy (L1/L2), SIMD (Single Instruction Multiple Data).
· Comfortable reading/writing technical documentation (Instruction Set Architecture specs).
Nice-to-Have
· Experience with CUDA, OpenMP, or AVX/Neon intrinsics.
· Coursework in Linear Algebra or Numerical Methods.
campus@espressif.com
-
AI System Software Engineer (Runtime & HAL)2026 届校园招聘2025年12月25日
The Opportunity
You will build the heartbeat of our AI accelerator. While our compilers generate the "what" (the neural network graph), the Runtime determines the "how" (execution). You will write the low-level C/C++ code that manages DMA engines, synchronizes parallel cores, and drives high utilization on our 4-PE AISoC while maintaining correctness and stability.
Key Responsibilities
· Pipeline Orchestration: Implement the on-device scheduler that coordinates data transfers (DMA) and compute tasks. You will solve classic "producer-consumer" problems in silicon.
· Memory Management: Build the allocator that manages tight on-chip SRAM (TCM). You decide where every tensor lives and when it dies.
· Hardware Abstraction (HAL): Implement the low-level HAL and intrinsic wrappers used by our runtime and kernel library (no kernel-mode driver experience required).
· Consistency & Visibility: Define and enforce explicit memory visibility protocols between cores/TCM/DMA (clean/flush + fence/events) to prevent stale data reads on our non-coherent system.
· Debug & Profiling: Create the tools that tell us why the chip is stalling (trace markers, cycle counters).
What We Will Teach You
· Our specific DMA descriptor model and event synchronization hardware.
· How to manage memory visibility (cache maintenance) on a non-coherent architecture.
· The internal workings of our on-device scheduling framework.
Must-Have Qualifications
· Strong proficiency in C/C++ (you understand pointers, memory layout, and the stack vs. heap).
· Academic or project experience with Operating Systems concepts (mutexes, race conditions, context switching).
· Fearlessness in debugging: You don't just stare at a segfault; you attach a debugger and find the root cause.
Nice-to-Have
· Experience with embedded systems (Raspberry Pi, STM32, or bare-metal RISC-V).
· Knowledge of Python (for building test scripts to drive your C++ runtime).
campus@espressif.com

