blog.bejarano.io

Replacing hchk.io with Prometheus and Pushgateway

The other day, reading this Lobste.rs thread, I came across healthchecks.io (or hchk.io), a service for monitoring cron jobs.

The service is pretty simple, you specify the details of your cron job (schedule, grace time, etc.) and you are given a unique URL, which you then make your job ping once it’s finished, letting healthchecks.io know the job ran.

You can also first ping URL/start and then URL to track the time your job took to finish. As well as ping URL/fail to signal a failure.

This is all pretty easy logic. There’s nothing special about the service, and as I monitor my services with Prometheus, I thought I could hack together a similar solution with Pushgateway.

What is Pushgateway?

First, a little background on how Prometheus does gathering of metrics.

Prometheus does pull collection of metrics, that is, it actively reaches out to endpoints containing metrics in the Prometheus format.

Therefore all endpoints must be up and running at any time, should Prometheus go ahead and scrape them.

But how do I make Prometheus get metrics about my batch jobs, if they run only for a few minutes?

That’s where the Pushgateway comes into action. Pushgateway is a service that listens to metrics being pushed to it, stores them for some time and offers them to Prometheus, something like voicemail for metrics.

Why replace healthchecks.io?

Less cost

healthchecks.io has three pricing tiers (as of August 2019):

If you don’t exceed the Hobbyist tier, you’ll be fine, but at 16$/mo for 100 jobs, I’d rather write my own alternative.

With Prometheus, you get unlimited jobs, unlimited team members and unlimited alerts through whatever channels you choose for the cost of running a simple stack of three microservices and a bit less “turn-key” feel.

More features

Note: the only downside to this setup is Alertmanager’s subset of supported alert channels, but it’s not like it’s too hard to write your own client.

How to replace healthchecks.io?

Install the Prometheus stack

  1. Install and run Prometheus, Pushgateway and Alertmanager

  2. Connect Prometheus to Alertmanager, edit prometheus.yml:

  3. Make Prometheus scrape Pushgateway, edit prometheus.yml:

Make jobs push metrics to Pushgateway

Once you have all that interconnected, you can go ahead and push metrics to Pushgateway, for Prometheus to scrape:

See Pushgateway’s README.md for more information on how to push metrics.

Configure alerting upon job failure

  1. Add the following to prometheus.yml and rules.yml respectively:
  2. Finally, configure an Alertmanager alert receiver and start getting notified whenever your jobs fail

Note: check out my previous post to integrate Alertmanager with Amazon SES for email alerts without worrying about setting up SMTP servers.