The Problem with Error Log Notifications
While there are several actions that can be used in Instalink to directly alert an administrator to an issue with a data flow, such as "Respond with Error", "Send to URL", or "Send Message", there are situations where receiving a direct notification on service errors may not be ideal. For example, imagine a data flow where the administrator receives an email every time a call fails to send to an external API. This may initially seem to be an intuitive way to set up error notifications. However, the problem quickly becomes evident when the external API goes down and returns an error for each request. Indeed, a user may receive thousands of emails during a service outage if a data flow is set up to send an email to an administrator on every occurrence of a logged error.
Sending a notification on the error log should be used only in situations where the error is expected to be encountered infrequently.
A methodology for aggregating error records should be the preferred design pattern. Instalink uses Observers to watch error logs and send notifications to administrators when specific criteria for an alert status is reached.
How to Set Up Observers
An Observer requires two components:
- A monitoring rubric that defines the criteria for when to go to an alert state.
- A Topic record that will receive the alert notifications.
Topics contain a list of email addresses and phone numbers that will receive a notification. In Instalink, go to "Observers" and then find the "Topics" link in the submenu. Click "+ New" on the Topics page to create a new Topic. Select the account that the Topic is assigned to and then add the desired email addresses and phone numbers to the Subscriptions list. Click "Save Changes" to commit the record.
Now that you have a Topic record, you may add an Observer to any data flow. Navigate to the desired data flow that you'd like to observe. Click on the top level "IF" action (usually a LISTEN or a CRON).
Navigate through the corresponding action form and expand the "Observers" sub-heading. Give the observer a title that is unique and will easily identify the process when listed amongst other observers. Adjust the settings of the observer to the desired criteria. A description field will update as the settings are modified to explain the meaning of the observer criteria.
- Iteration: How often (in minutes) that the observer will check the logs.
- Observe Data: Select what will be observed. You can select to observe either a percentage of log occurrences or a specific count of log occurrences.
- Condition: Select whether or not to trigger when the criteria is above or is less than the defined criteria.
- Value: Set the value of the percentage, or the number, of logs that have been observed.
- Consecutive Occurrences: Set the number of times in a row that an observed value must meet the condition.
- Duration: The number of previously elapsed minutes to consider when evaluating the iterations.
- Then: You can set the alert to trigger when the criteria is met. You can also invert this and trigger an all clear status when the criteria is met.
Select "Publish message on alert" checkbox to send a notification when the Observer enters the alert status. Select "Publish message on clear" to send a notification when the observer is no longer in an alert status. Both alert and clear notifications may be enabled on a single Observer.
Select the Topic that you wish to send the Observer notifications to for both the Alert and the Clear states. Set a message title and message content. This is the actual text that will appear on either the text message or email notification. Because observers are monitoring an aggregation of service logs, it is not possible to include data from specific logs or request records in the alert message.
You may toggle the Observer on and off by selecting the "Active" checkbox. Also, multiple Observers may be added to a single data flow.
Note that the notification will only be triggered once. This means that if an Observer that is continuously in an Alert state over the course of multiple iterations will only send one message. A new message can only be sent once the state returns to the "All Clear" state. This prevents redundant messages from being sent to administrators.
IMPORTANT: The Observer only reads records that are logged. Error or success actions that do not create log records are excluded from the monitor's data. Make sure that the logs that you'd like to be included in the criteria are set up to create log records.
Emails or text messages are sent when an observer enters a state that triggers a notification to be sent to a Topic. Links may be added to observer message to allow a user easy access to the data flow or other relevant information.
Observers can also be monitored within the Instalink interface. Go to the Observers -> Current Status page to see a list of all active Observers and their current state.
Clicking on any Observer in the list will present the administrator with the history of alert statuses. The timestamps of when alert states are entered and exited are clearly presented. Direct links to the project and the process are also included.
Due to the flexibility of the Observer system, there are various ways that they can be utilized. Here are some possibilities:
- Send an alert whenever a data flow has an error rate greater than 80% during a ten minute period. This would likely be indicative of an external API or database service being down.
- Send a notification when there are more than 1000 orders successfully submitted in a one hour period. Something like this could be used to notify an administrator that they need to allocate additional resources to handle an influx of shipments.
- Send an alert when an endpoint has no success logs over a 24 hour period. This could indicate that an external service has stopped sending inbound requests to an Instalink endpoint.