Isazi Consulting is an Artificial Intelligence company based in Johannesburg. We are a young, fun and innovative tech company that solves difficult, real world challenges across various industries through building machine learning and advanced analytics software solutions. We believe our success is best felt through enabling our clients to make decisions informed by data.
At Isazi Consulting, we are a dynamic team of data-driven, colourful geniuses who love to solve real problems in different ways. We are built on passion, trust and thinking outside of the box. We value unique, free-thinking individuals who chase the thrill of building from scratch. We are looking for new colleagues who value trust, creativity, autonomy, and mastery.
Purpose of role
The primary function of a Site Reliability Engineer (SRE) is tracking availability, latency, performance, monitoring and emergency response. The SRE will work closely with the DevOps and development teams to build out monitoring, tracing and alerting capabilities to ensure that we are within Service Level Indicators (SLI) and Service Level Objectives (SLO). This is a highly technical role with a support function as well. The SRE is key in ensuring that the system is resilient to failures and providing our clients with an optimal user experience.
- Work with DevOps and Development team to create Service Level Objectives across the various client environments to ensure that designed solution responds to non-functional requirements such as availability, performance, security and maintainability.
- Track these objectives and report on any SLI breaches.
- Develop strategies and metrics for monitoring, tracing and alerting.
- Continuously improve log analytic metrics.
- Work with DevOps to ensure that delivery pipelines are as efficient as possible.
- Perform application support including assisting users.
- Perform and maintain product configuration on behalf of a customer in the systems.
- Advise the customer on how to improve their outcomes with the products.
- Assist with product defect detection and remediation (including log review).
- Perform ad-hoc data queries.
Qualifications and Criteria:
The ideal candidate will possess the following qualifications and criteria:
- Bachelor's degree or equivalent qualifications in Computer Science, Information Systems, Information or Electrical Engineering.
- At least 1 year experience working as a DevOps Engineer, Support Engineer or SRE.
- Knowledge of Monitoring, Alerting and Tracing technologies: Prometheus, Grafana, Stackdriver, OpenTracing, CloudWatch.
- Some knowledge of cloud technologies: GCP and AWS.
- Some knowledge of supporting Kubernetes platforms and SQL (Postgres).
- Scripting: Bash, Go or Python is a must.
- Analytical thinking.
- Strong communicative abilities and interpersonal skills.
- Ability to work on multiple tasks simultaneously.
- Ability to work in high pressure environments and meet deadlines.
- Strong level of accountability and ownership.
- Proven track record of working in a DevOps, Support or SRE team.
- Ability to track failure points of a system.