Responsibilities
H-E-B digital solutions are growing in popularity and adoption, and our
Digital Tech teams work in a rapidly changing environment, learning new skills, and applying it all to solve large and impactful business problems.As a
Site Reliability Engineer, you’ll use your engineering skills to maximize reliability, availability, and efficiency of our systems, and improve workflow, automation, and scalability.Once you’re eligible, you’ll become an Owner in the company, so we’re looking for commitment, hard work, and focus on quality and Customer service. “Partner-owned” means our most important resources—People—drive the innovation, growth, and success that make H-E-B The Greatest Omnichannel Retailing Company.Do you have a:
HEART FOR PEOPLE… interpersonal skills?
HEAD FOR BUSINESS… a systematic problem-solving approach?
PASSION FOR RESULTS… drive for automation and continuous improvement?We are looking for:
- 5+ years of related experience
- Deep understanding of software engineering principles with emphasis on reliability, scalability, and performance optimization
- Experienced in one or more programming languages suited for SRE work (e.g., Python, Go, Java, Rust)
What is the work?Design & Development:
- Develop and implement comprehensive monitoring, SLO tracking, and capacity-planning strategies that are well-aligned with business goals.
- Conduct deep-dive analysis of performance bottlenecks to optimize system efficiency.
- Establish architectural and best practice guidelines for improving distributed system design for resiliency by applying software engineering principles.
- Contribute to long-term reliability roadmaps.
- Drive significant process improvements influencing wider team practices.
- Engages in the whole lifecycle of services, from inception and design through deployment, operation, and refinement, applying system architecture knowledge.
What is your background?
- M.S. or B.S. in Computer Science or related field (or equivalent experience in large-scale distributed systems).
- Deep understanding of software engineering principles with emphasis on reliability, scalability, and performance optimization.
- Command of one or more programming languages suited for SRE work (e.g., Python, Go, Java, Rust).
Do you have what it takes to be a fit as an H-E-B SRE?
- Extensive experience in the design and implementation of resilient, performant, and scalable software solutions, grounded in a deep understanding of systems and networks.
- Exceptional communication and collaboration skills: Ability to lead cross-functional teams, advocate for reliability best practices, and drive strategic initiatives from inception to completion.
- Proven analytical and problem-solving skills, paired with a focus on preventative solutions: Capacity to proactively identify systemic risks and inefficiencies, architecting comprehensive solutions that go beyond addressing surface-level symptoms.
- Proven ability to thrive in a high-growth, fast-paced technical environment: Independently handle complex and ambiguous challenges, prioritize effectively, and make sound judgment calls under pressure.
- Ability to strategically align reliability goals with business objectives: Demonstrate an understanding of how SRE practices impact both technical KPIs and broader company goals.
- Passion for mentorship and knowledge sharing within a strong software engineering culture: Lead by example, foster the growth of SRE team members, and shape the direction of reliability engineering at H-E-B.
Can you…
- Function in a fast-paced, retail, office environment
- Travel by car or plane with overnight stays
- Work extended hours; sit for extended periods; work rotating and on-call schedulesDEVS3232ISA3232