Skip to navigation Skip to main content
Position

Senior Site Reliability Engineer(f/m/x)

Berlin

Remerge is a fast-growing mobile advertising scale-up that has become the no.1 app retargeting company over the last 8 years with offices spanning Berlin, San Francisco, New York, Singapore, Beijing, Seoul and Tokyo. Remerge enables app developers to re-engage up to 3.3 million users per second across 1 million apps globally in order to increase retention and boost user lifetime value. We love data, designing for the user and anything that helps drive intelligent decisions.

Job mission

At Remerge, we’re passionate about building software that solves problems. We count on our site reliability engineers (SREs) to ensure that production systems run smoothly with high availability, and stellar performance levels to pursue our mission.

As an SRE, you are a blend of a pragmatic operator and software developer who applies sound engineering principles, operational discipline, and mature automation to our operating environments and codebase. You share the academic and intellectual background with the rest of the development organization, doing work that has historically been done by a system operator, but you are quickly bored by performing tasks by hand, and you have the skill set necessary to write software to replace previous manual work, even when the solution is complicated.

About your role

  • Work closely with our development teams to design and shape our engineering culture and provide solutions for automation, deployment, and monitoring of services and tools.

  • Implement and enforce core tenets or site reliability engineering (focus on engineering, SLOs and errors budgets, monitoring and on-call, capacity planning and budgeting)

  • Improve operational processes such as deployments and upgrades and provide standardized development environments and workflows for other engineering teams.

  • Research, analyze, and implement new technologies to improve and evolve our infrastructure.

  • Communicate and interact with vendors and partners providing technical services.

  • Run our bare-metal and cloud infrastructure with Ansible, Terraform and GitHub CI/CD workflows.

  • Troubleshoot networking, hardware, and general performance issues.

  • Debug production issues across services and levels of the stack.

  • Be on an on-call rotation to respond to incidents that impact availability.

  • Use your on-call shift to prevent incidents from ever happening.

  • Build monitoring that alerts on symptoms rather than on outages and ensure our monitoring systems are 100% reliable.

  • Document every action so your findings turn into repeatable actions and then into automation.

Job requirements 

  • Experience in building and maintaining distributed service infrastructures, both as a developer and operator.

  • A good understanding of underlying software development and computer science concepts.

  • Hands-on experience developing with Go, Python, and/or Bash.

  • Experience with Unix/Linux operating systems internals and administration or networking.

  • Experience working with IaC and Configuration management tooling.

  • Proficiency in monitoring highly available systems, ideally with Prometheus.

  • Experience managing bare-metal servers and resources located on Google Cloud.

  • Willingness to manage and handle on-call duty and responsibilities.

  • Experience working in a team with a well-defined process.

  • Excellent communication and documentation skills.

  • Ability to define SLIs and SLOs

  • We use Aerospike, Kafka, Hadoop, Spark, Elasticsearch, Logstash, Prometheus, Grafana Stack, Druid, Ansible, Terraform, Nomad, Vault, Consul, Github Actions, Docker, Kubernetes, Ubuntu, Redis, Postgres, TensorFlow. Experience with any of these is a plus!

 

Our promise

  • Unlimited vacation days - for real. We give you the freedom to figure out the most productive work-life balance for you
  • Personal learning budget and conference attendance scholarship 
  • A truly modern place to work: work from home, from our brand new office in Berlin, or remotely - your work environment is yours to design. 
  • Competitive remuneration package including virtual shares
  • End of the year team bonus determined by company performance
  • STA program: travel to our offices around the globe for a short term assignment for up to a month each year 
  • Comfortable work environment - laptop, phone, screen(s), standing desk etc. A budget to upgrade your home office setup the way you need it
  • Wellness benefits such as sports memberships and internet reimbursements
  • Welcome lunch, company-wide offsites and parties

 

Remerge is an Equal Opportunity Employer that is committed to diversity and inclusion in the workplace: all applicants are considered for positions without regard of race, ethnic origin, gender, age, religion or belief, marital status, gender identification, sexual orientation, veteran status or disability. We're looking forward to your application!