Use Cases
Software Products MSPs Schools Development & Marketing DevOps Agencies Help Desk
 
Internet Status Blog Pricing Log In Try IsDown for free now

Outage in Fundraise Up

Database Issue

Resolved Major
April 25, 2024 - Started almost 2 years ago - Lasted about 13 hours
Official incident page

Incident Report

Our monitoring system has reported a database malfunction, which has resulted in the unavailability of the dashboard and donor portal. We are currently investigating the incident and will keep you updated.

Need to monitor Fundraise Up outages?

  • Monitor all your external dependencies in one place
  • Get instant alerts when outages are detected
  • Be the first to know if service is down
  • Show real-time status on private or public status page
  • Keep your team informed
Latest Updates ( sorted recent to last )
RESOLVED almost 2 years ago - at 04/25/2024 08:53PM

Donations made in the one and a half hours before the incident are now showing up on the Dashboard. There are no longer any delays in displaying, searching, or CRM synchronization. We are closing this incident.

However, our work does not stop here. We still have to conduct a detailed analysis of what happened and develop a comprehensive set of measures to prevent such incidents in the future.

Thank you for following our updates. If you have any questions, our support team is always here to help — support@fundraiseup.com.

MONITORING almost 2 years ago - at 04/25/2024 02:09PM

As our engineers are working on data recovery, let's briefly explain what happened.

We store our clients' data across multiple database clusters. These consist of dozens of physical servers networked together, located in various data centers and countries. We routinely replace servers, update systems, and configurations — standard maintenance. We never perform updates on all clusters simultaneously. Likewise, before any significant change, we test everything we plan to do on test stands equivalent to the production environment.

Last night, during the routine reconfiguration of two clusters, we made a critical error in the configuration file. Due to several reasons, this error went undetected on the test stand. As a result, the "cluster collapsed," and data started to be deleted rapidly. Within 5 minutes, our incident response team was on a Zoom call, discussing emergency measures we needed to take.

We have several levels of backups set up. We make full backups of all databases daily, as well as incremental backups every hour. The data volumes are measured in terabytes, so the bulk of the time was spent simply transferring data across the network and rebuilding the clusters.

During the recovery process, we had to disable the ability to log into the Dashboard and Donor Portal for all organizations so that users would not make changes that we could not later reconcile with the data restored from backups. Also, many organizations whose data was stored in the damaged clusters could not accept donations during our recovery efforts.

The system is now fully operational, but it will take some more time to restore data that was changed an hour and a half before the incident began. We expect a full data recovery, with no data loss.

Incidents of this nature are extremely rare for us, and we've never faced such a significant problem in our history. Nonetheless, we thoroughly investigate every incident, identify the reason it occurred, and develop a comprehensive set of measures to prevent similar incidents in the future.

MONITORING almost 2 years ago - at 04/25/2024 12:01PM

The root cause of the incident has been addressed, and all major systems are back online. All organizations now have access to the dashboard, and both Checkout and Checkout Pages are fully operational. Donors are able to access the donor portal.

However, we still have some recovery work to do. Some organizations may temporarily not see donations made in the one and a half hours before the incident displayed in the Dashboard. Additionally, some organizations may experience delays in donations appearing in search, insights, and CRM synchronization.

We are diligently working to resolve these issues. Also, once things have settled down, we will provide a detailed account of what happened and the measures we will take to prevent such incidents in the future.

IDENTIFIED almost 2 years ago - at 04/25/2024 10:16AM

We're still working on resolving the issue.

IDENTIFIED almost 2 years ago - at 04/25/2024 09:32AM

We're still working hard to resolve the issue. Unfortunately, it's going to take us a bit longer to fix.

IDENTIFIED almost 2 years ago - at 04/25/2024 08:46AM

Our incident response team is in sync and working on resolving the issue. Currently, we estimate that the incident will be resolved in about 30 minutes.

IDENTIFIED almost 2 years ago - at 04/25/2024 08:15AM

We have identified that two of our database clusters were damaged due to a configuration error. This has led to the system being unavailable for some organizations. They are unable to accept donations. We are working on resolving the issue.

INVESTIGATING almost 2 years ago - at 04/25/2024 07:54AM

Our monitoring system has reported a database malfunction, which has resulted in the unavailability of the dashboard and donor portal. We are currently investigating the incident and will keep you updated.

The Status Page Aggregator with Early Outage Detection

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 5850 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook