We’re working to improve shopping on Amazon using the conversational capabilities of large language models, and are searching for pioneers who are passionate about technology, innovation, and customer experience, and are ready to make a lasting impact on the industry. In this role, you will design and develop benchmark and evaluation frameworks to assess Large Language Model (LLM) capabilities, with a specific focus on e-commerce and shopping applications. You will utilize these frameworks to curate high-quality training data that enhances model performance in shopping-related tasks. Experience with LLM benchmarking, evaluation methods, and applied reinforcement learning is preferred. You'll be working with talented scientists, engineers, and technical program managers (TPM) to innovate on behalf of our customers. If you're fired up about being part of a dynamic, driven team, then this is your moment to join us on this exciting journey!