Develop and evolve a state-of-the-art inference framework in modern C that extends Tensor. RT with autoregressive model serving capabilities, including speculative decoding, Lo. RA, Mo. E, and KV cache management. Design and implement compiler and runtime optimizations tailored for transformer-based models running on constrained, real-time platforms. Collaborate with teams across CUDA, kernel libraries, compilers, and robotics to deliver high-performance, production-ready solutions. Contribute to CUDA kernel and operator development for critical transformer components such as attention, GEMM, and Mo. E. Benchmark, profile, and optimize inference performance across diverse embedded and automotive environments. Stay ahead of the rapidly evolving LLM/ VLM ecosystem and bring emerging techniques into product-grade software. What we need to see:BS, MS, PhD, or equivalent experience in Computer Science, Electrical/ Computer Engineering, or a closely related field . years of relevant software...Software Engineer, Software, Engineer, Senior, Robotics, Technology