This episode is for members only

Sign up to access "Build An Uptime Monitor with Inertia" right now.

Get started
Already a member? Sign in to continue
Playing
43. Recovery notification bug

Episodes

0%
Your progress
  • Total: 4h 59m
  • Played: 0m
  • Remaining: 4h 59m
Join or sign in to track your progress

Transcript

00:00
You may have spotted this bug if you've been following along but if we add an endpoint and it's initially successful, our recovery email is actually going to get triggered. So I've got everything up and running here in the terminal, I've got my email client open and we're going to go ahead and add an endpoint here to see what happens.
00:18
So let's actually delete some of these just so we can choose one that we've already got in this list. So I'm going to go ahead and add in the pro endpoint and I'm just going to wait for them 10 seconds to kick in and actually show that this was successful and what we should get through is an email saying that it is recovered and there it is. So pro has recovered, not good,
00:42
that's not what we want to see when we first add a successful endpoint. So if we go over to our check observer, let's just check what we've done here. Now we run this or more appropriately we dispatch this event. If the check was successful, which we have in that case because it's a valid endpoint and it's fine, the previous check was
01:03
not successful. Now we don't have a previous check but this will return null to us because we are assuming that there may not be a previous check and when we access this is successful method on here it's going to return us a null value. But what we're doing here is saying that the endpoint checks count needs to be greater than one. What we can actually say here is that the endpoint
01:26
checks count doesn't equal one because that is when we freshly add in a endpoint. So let's get rid of that and let's try this out. So once again let's just close this off for a little bit of clarity. We're going to stop our queue and restart it and let's go over and just give this a refresh. So I'm going to delete this endpoint, I'm going to re-add it in here and I'm just going to wait
01:48
for 10 seconds and we should not now see an email triggered. Okay so we're now onto a 200 status that hasn't triggered a recovered email. Now this was the one from before so let's get rid of that and let's just go through the whole process again of setting this up so it fails and then recovers. So you can see here ABC went down because I just changed it just as the check happened
02:10
and now we're going to recover it and actually make sure that that email comes in. So let's hit done give that a few seconds to do its thing and sure enough pro has recovered. So there we go a tiny mistake there in the way that we set up the check on the count. We don't want to trigger this if the only check that we have created was successful.

Episode summary

In this episode, we tackle a sneaky bug related to our recovery notifications. If you've been following along, you might've noticed that when you add a fresh endpoint (one that's working), you weirdly get a recovery email—when nothing was actually down to begin with! So, to demonstrate, we walk through adding a new endpoint while watching the logs and the email client. Sure enough, that "recovered" email pops up immediately, which isn't what we want.

Digging into the code, we check the logic around when to send out recovery notifications—specifically, in our check observer. The bug comes from a check that doesn't properly handle freshly added endpoints that have never failed before. There's a logic assumption that there would always be a previous check, but for brand new endpoints there isn't, so things break.

To fix this, we change the condition from checking if the count of checks is greater than one, to verifying that the checks count does not equal one. That way, we steer clear of sending recovery emails right when someone adds a fresh and healthy endpoint for the first time.

After a quick code tweak and restarting processes, we test the workflow: add an endpoint, confirm no "recovered" email is sent at first, then simulate a fail and recovery to ensure the notifications work correctly. Everything works as expected. In the end, a simple logic fix saves us from accidentally spamming users with unnecessary recovery notifications!

Episode discussion

No comments, yet. Be the first!