Site Reliability Engineer

The Diamond Classic • Full-time • Fort Worth, Texas, United States • 1w ago

The Diamond Classic is seeking a talented Site Reliability Engineer to join our dynamic team in the sports industry. At The Diamond Classic, we pride ourselves on providing the most unique incentive program available, appealing to everyone in the equestrian world. Our concept was born from the challenge of balancing the desire to breed to favored stallions against the need for effective incentive solutions. As a Site Reliability Engineer, you will play a crucial role in ensuring the reliability and performance of our systems, enabling us to sustain our innovative programs and provide seamless experiences for our users. Your expertise will drive the enhancement of our infrastructure, helping us to support the unique challenges of the sports industry while collaborating with cross-functional teams to implement best practices in system design and monitoring. If you are passionate about leveraging technology to improve reliability and support a vibrant community, we invite you to apply and be part of our pioneering endeavors at The Diamond Classic.

Responsibilities

Design, build, and maintain scalable and reliable systems to support The Diamond Classic's unique incentive platform.
Monitor system performance and availability, ensuring optimal reliability for all users and stakeholders.
Implement automation tools and frameworks for infrastructure provisioning and deployment processes.
Collaborate with software engineers to address performance issues and improve application performance.
Develop and maintain reliable monitoring, alerting, and troubleshooting tools to ensure proactive system management.
Conduct root cause analysis for service interruptions and implement corrective actions to prevent future issues.
Participate in on-call rotations to provide support and resolution for critical incidents.

Bachelor's degree in Computer Science, Engineering, or a related field.
Proven experience in site reliability engineering or related roles within the sports or tech industry.
Strong proficiency in cloud services such as AWS, Azure, or Google Cloud Platform.
Experience with infrastructure as code tools such as Terraform, CloudFormation, or Ansible.
Solid understanding of containerization technologies like Docker and orchestration tools like Kubernetes.
Familiarity with monitoring and logging tools such as Prometheus, Grafana, or ELK Stack.
Excellent problem-solving skills and the ability to work collaboratively in a fast-paced environment.