SailPoint is the leader in identity security for the cloud enterprise. Our identity security solutions secure and enable thousands of companies worldwide, giving our customers unmatched visibility into the entirety of their digital workforce, ensuring workers have the right access to do their job – no more, no less.
We are seeking a highly motivated and experienced Senior Site Reliability Engineer (SRE) to join an Identity Security Cloud software development team. This is an embedded role, meaning you will be a full member of the development team, working closely with software engineers, infrastructure platform services, engineering managers, and other stakeholders to ensure the reliability, scalability, and performance of teams’ services. You will be responsible for leveraging the infrastructure, tooling, and processes that support our applications in dev and production. This role offers a unique opportunity to directly influence the design and architecture of our systems from a reliability and performance perspective.
Work with the development and service owners at the intersection of development and operations to solve performance issues and ensure system scalability.
Reliability Engineering: Design, develop, and implement solutions to improve the reliability, availability, performance, and scalability of our systems. Work with technical leaders and infrastructure platform services to develop alerts and dashboards.
Operational Excellence: Own and improve key operational metrics (SLIs, SLOs, Error Budgets, monitoring and alerting) for team related services and drive continuous improvement through post-incident reviews and blameless postmortems of non-functional issues. Develop and maintain comprehensive monitoring, alerting to proactively identify and resolve issues. Create and maintain dashboards, conducting ongoing reviews to address and optimize gaps. Improve operational processes and team practices by working with technical leaders and NOC teams.
Documentation: Review and contribute to clear and concise documentation for systems, processes, runbooks, and procedures.
Experience with monitoring and logging tools (e.g., Prometheus, Grafana, Honeycomb, OpenSearch).
Level of coding experience beyond simple scripts with one of the programming languages such as Go, Java, or Python to help build reliability engineering; to evaluate and identify where service code can be optimized for enhanced reliability practices.
Preferred Qualifications:
In the first 30 days you will:
SailPoint is an equal opportunity employer and we welcome all qualified candidates to apply to join our team. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, protected veteran status, or any other category protected by applicable law.
Alternative methods of applying for employment are available to individuals unable to submit an application through this site because of a disability. Contact hr@sailpoint.com or mail to 11120 Four Points Dr, Suite 100, Austin, TX 78726, to discuss reasonable accommodations.