This incident has been resolved.
We will continue to monitor the environment over the weekend. We will also keep this incident open till Monday morning as long as no further issues are detected.
Health checks have been successful and technical teams have not observed any errors or issues in the logs since the change was deployed to production yesterday. We will continue to monitor the environment till the EOD tomorrow.
The fix has been deployed to Production. We are currently monitoring the change.
The fix is ready to be deployed to the non-production environment today. We will be monitoring the change over the next day few days before deploying it to production.
Teams are actively working on a code change which will be later deployed to Staging for further testing. Due to the intermittent nature of the issue, the plan is to let it run for a few days in staging to ensure we do not encounter the same errors before production deployment; and perform additional testing as needed.
Meanwhile, as a temporary workaround, we will continue to manually re-run the failed tasks to remediate any customer impact.
We are continuing to actively work on a fix for this issue.
The investigation has revealed that there have been only a few occurrences since Saturday, October 22nd, impacting a very small customer base. We have also identified the cause of the failures and are currently working on a fix. We are continuing to monitor the environment and rectify any problems requiring manual intervention.
Around 1:37 AM PDT today, integration tasks failed again with 'signing failed "500 Internal Server Error". The outage may have lasted for approximately 10 mins. Services have been stable since, and teams are investigating the error logs.
We have not observed any issues and tasks have been running successfully since the additional resources were deployed. Technical teams will continue to monitor the tasks for another day to ensure a successful run.
We are continuing to monitor for any further issues.
We have added additional resources in the environment to avoid any memory contention issues. Technical teams are monitoring further.
There have been no reoccurrences of this incident since Sunday, October 23rd. Integration tasks have been running successfully without errors. However, the root cause is still unidentified. We are continuing to investigate the root cause to avoid any reoccurrences.
Incident Description: Customers using SaaS Manager may receive intermittent errors while trying to retrieve daily Integration tasks data via Managed Applications. These tasks run only once every day. If the task gets stopped by the error one day, it will attempt again on its regular cadence the following day. As a result, the availability of new information could be delayed by a day.
Technical teams have been engaged and are currently investigating
IsDown is an uptime monitoring solution for your critical business dependencies. Keep tabs on your SaaS and cloud providers in real-time and never miss another outage again. Get instant alerts and stay informed when an incident impacts your operations.Start free trial
No credit card required · Cancel anytime · 2362 services available
Quickly identify external outages that impact your business. We are monitoring more than 2300 services in real time.
Your team on top of problems
IsDown aggregates the information from the status pages of all your services, making it easy to monitor the health of all your services in one place. Say goodbye to managing each status page individually - our service simplifies the process.
No more wasting time. Uptime monitoring in real time
Say goodbye to wasting time trying to diagnose issues with your services - our 24/7 monitoring service does the work for you. We'll notify you if there is an incident, so you can focus on other tasks.
Receive alerts in your preferred channels
Our outage monitoring keeps you informed, no matter where you are. Get instant notifications in your email, Slack, Teams, or Discord when an outage is detected, so you can take action quickly.
Easily integrate with your current tools and workflows
Enhance your processes with more information using our integration of Zapier, Webhooks, PagerDuty, and Datadog. Stay notified and in control. Upgrade your operations today.
Avoid notifications clutter
Maximize your control with customizable notifications from each service. Filter by components and severity to only receive the most important updates. Streamline your processes and stay informed with our advanced notification features.
Multiple dashboards, shareable with the world
Create one dashboard for each of your teams/clients/projects and monitor only the services that each uses. Have a dedicated dashboard with custom notification settings. Easily make your dashboard public and share it with the world.
Prepare for scheduled maintenances
Never again be caught off guard by unexpected maintenance from your services. A feed of the next scheduled maintenances is available.
Weekly Digest of the services' outages
Every Monday, you'll receive a weekly summary of what happened the previous week as well as the maintenance schedule for the following week.
DevOps & On-Call Teams
You already monitor your internal systems. What about the external services? Monitor the services your business depends on. Don't waste time looking elsewhere when external outages are the cause of issues.
IT Support Teams
Detect external outages before your clients tell you. Anticipate possible issues and make the necessary arrangements. Having proactive communication, builds trust over clients and prevents flow of support tickets.
5 minute setup,
instant value for your team
Start with a trial account that will allow you to try and monitor up to 40 services for 14 days.
There are 2362 services to choose from and you can start monitoring, and we're adding more every week.
You can get notifications by email, Slack, and Discord. You can also use Zapier or Webhooks to build your workflows.
You'll start getting alerts when we detect outages in your external dependencies! No more wasting time looking in the wrong place!
Try it out! How much time you'll save your team, by having the outages information close to them?