Remote, United States of America

Job Term


Company Website

At Domino Data Lab, we have an ambitious vision for data science. Our platform helps data science teams accelerate research, increase collaboration, and rapidly deploy predictive models.  Our customers are the most sophisticated analytical organizations in the world, including companies like Bristol Myers Squibb, Allstate, Bayer, and Red Hat. Backed by Sequoia Capital, Coatue Management, Bloomberg Beta, and Zetta Venture Partners, we are at the epicenter of the data science revolution, helping companies develop the next breakthrough in medicine, build better cars, or recommend the best song play next.

What we are building
The Quality team’s mission at Domino Data Lab is to deliver quality releases of Domino MLOps platform, which provides an end-to-end platform for data scientists to develop, train, deploy and monitor ML models, at enterprise scale.

What your impact will be

  • Collaborate with MLOps experts to devise customer-like workflows, and build automation to run them at scale to simulate thousands of users on the Domino platform.
  • Measure the capacity, scalability, and reliability of the Domino platform to verify service level objectives. Gain experience using observability platforms working alongside with reliability engineers.
  • Build turnkey CI-based automated solutions both to find bugs before they hit production and to allow developers to get insights about the performance impact of their branches.
  • Understand Kubernetes reliability and resource management concepts to troubleshoot and improve our service and user workload sizing requirements.
  • Collaborate with others to produce papers, guidelines, or best-practices documentation for scaling

What we look for in this role

  • Excellent written and verbal communication skills.
  • Strong Python programming background or demonstrated ability to adapt to new programming languages. We have a tests-as-code mentality, and the team uses Python. 
  • Systems troubleshooting skills and willingness to use the command line. Exposure to Kubernetes or other cloud or container technologies is a plus but not required.
  • Experience building automation to test a distributed system. Ideal candidates have developed tooling for and participated in load, performance, scale, or stress testing. We also consider candidates with other backend testing experience.
  • Experience with test result analysis. Ideal candidates have developed and used observability dashboards in New Relic, Grafana, Splunk, Data Dog, or equivalent systems.
  • 6+ years of experience

What we value

  • We value a growth mindset. High-performing creative individuals who dig into problems and see the opportunities for success
  • We believe in individuals who seek truth and speak the truth and can be their whole selves at work. 
  • We value all of you that believe improving is always possible. At Domino, everything is a work in progress – we can do better at everything. 
  • We emphasize an environment of teaching and learning to equip employees with the tools needed to be successful in their function and the company.
  • We strongly believe in the value of growing a diverse team and encourage people of all backgrounds, genders, ethnicities, abilities, and sexual orientations to apply
Apply now