Outage in Malomo

Platform outage

Resolved Minor
November 30, 2023 - Started 5 months ago - Lasted 12 days
Official incident page

Need to monitor Malomo outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Malomo, and never miss an outage again.
Start Free Trial

Outage Details

We are currently investigating an issue. You may experience issues accessing our dashboard, viewing tracking pages and receiving events to our integrations during this time.
Components affected
Malomo Dashboard Malomo API
Latest Updates ( sorted recent to last )
RESOLVED 5 months ago - at 12/12/2023 11:03PM

We are now considering this incident “Resolved” following Friday’s restoration of orders placed beginning Nov. 1 leading up to the outage. We are continuing to restore all historical data for orders placed prior to Nov. 1, and will share an update once this is completed.

Our dashboard and reporting features are operational, but are limited to the order data currently available in our system. Please see our previous status update for more details regarding expected behavior for orders placed beginning Nov. 1.

You may notice some older orders appear in our system before the full restoration is complete. This will happen if we receive an order update from Shopify for orders placed prior to Nov. 1. Our system will create a new order and trigger a ShipmentCreated event to be sent to integrated apps, regardless of when the order was fulfilled. If you are using the Malomo: ShipmentCreated metric to send Shipment Confirmation emails in Klaviyo, we recommend adding a new trigger split to your flows to check for the latest carrier status. If the status is “in-transit”, “out for delivery” or “delivered”, do not send the email.

We are working on an incident report and root cause analysis in partnership with a third-party consultant. This report will detail the corrective and preventive measures we have either already implemented or plan to implement. We will share this report when it is completed.

MONITORING 5 months ago - at 12/08/2023 02:11PM

We have finished importing data for all orders placed beginning Nov. 1 leading up to the outage.

During import, we identified a scenario affecting a small number of orders where a shipment was not attached to the order. At the moment, we do not plan to re-register shipments for these orders because doing so might trigger unwanted events and messages in connected apps which may confuse those affected customers. We will investigate potential workarounds.

For the majority of orders, excluding those missing shipments, tracking pages will update as new shipment events are received by our system. New order and shipment events received by our system after the time of import will be processed and sent to integrated apps as expected. Any order or shipment events received by our system prior to the time of import will not be sent to integrated apps.

Orders placed during the outage up until last Friday’s partial database restoration have been fully restored and will continue to receive updates from Shopify and send events to integrated apps. Orders placed after the partial database restoration on Friday, 12/1 will continue to work as expected.

As you review the Malomo dashboard, you may notice that some November orders are duplicated on the Orders Beta page. One record is the newly imported order, which includes all data as expected on the Order Details page. The other record is leftover from the outage and will display an “Order Not Found” error when clicked. This duplicate record has no impact on your Malomo data or integrated apps. Our team is currently working to remove all duplicate records from the Orders Beta page. There will be no downtime from this process.

Now that priority data has been restored to accounts, our team will begin importing historical data prior to November 1st.

For more information on smooth transition back to Malomo powered notifications, please view our Merchant Action Plan here: https://drive.google.com/file/d/1djmD0ztpP9rDlJOSmkVJPO-pSkU_WGKD/view

MONITORING 5 months ago - at 12/07/2023 03:28PM

We are continuing to import data into our platform for all orders placed in November leading up to the outage. At this point, 78% of November orders have been imported.

Once this first import is completed, tracking pages will update with the most recent shipment status provided by the carrier. New events received after the time of import will be processed and sent to Klaviyo as expected. Shipping update events received prior to the time of the import will not be sent to integrated apps.

As a reminder, you can access our Merchant Action Plan here: https://drive.google.com/file/d/1djmD0ztpP9rDlJOSmkVJPO-pSkU_WGKD/view

MONITORING 5 months ago - at 12/07/2023 01:38AM

We have released an update to resolve an issue with our Postscript integration that resulted in duplicate events being sent. The Postscript integration has been re-enabled, and the majority of duplicate events were removed from our system. Events received by our system while the integration was paused have been processed and sent, although a small number of customers may notice duplicate events sent from this period. New events received by our system after the integration was re-enabled are now being processed and sent as expected. The Postscript integration was paused between approx. 4:10 pm - 8:11 pm EST.

MONITORING 5 months ago - at 12/07/2023 12:41AM

We have released an update to resolve an issue with our Attentive integration that resulted in duplicate events being sent. The Attentive integration has been re-enabled, and duplicate events were removed from our system. Events received by our system while the integration was paused have been processed and sent. New events received by our system after the integration was re-enabled are now being processed and sent as expected. The Attentive integration was paused between approx. 4:10 pm - 7:01 pm EST.

We are continuing to work on resolving the same issue for the Postscript integration.

MONITORING 5 months ago - at 12/06/2023 10:10PM

We are now importing data into our platform for all orders placed in November leading up to the outage. Once this first import is completed, tracking pages will update with the most recent shipment status provided by the carrier. New events received after the time of import will be processed and sent to Klaviyo as expected. Shipping update events received prior to the time of the import will not be sent to integrated apps.

Note for Klaviyo customers:
If you are using our event metric “Malomo: ShipmentCreated” to trigger shipping confirmation emails, we recommend adding a flow filter to your Klaviyo flows to check whether an order has been delivered, and if so, filter customers out from the flow. As we import orders, this will prevent a unique situation we’ve identified that might trigger an email to customers if the order was placed prior to the outage yet fulfilled in the past week.

Note for Attentive and Postscript customers:
We are also working to resolve an issue with our Attentive and Postscript integrations resulting in duplicate events being sent. We have temporarily paused all outgoing events to these integrations at approximately 4:10 pm EST in order to troubleshoot the issue.

We are working on an action plan that you can use to transition your post-purchase experience back to Malomo. We will share this plan shortly.

MONITORING 5 months ago - at 12/06/2023 03:24PM

Our database has been fully restored from backup. We are now preparing to import orders from the backup, prioritizing those placed beginning Nov. 1 up until the outage occurred.

MONITORING 5 months ago - at 12/06/2023 02:46AM

Our team is continuing to monitor the restoration process. We do not have any new information to share at this time.

Please see our previous updates for more details on our recovery strategy.

MONITORING 5 months ago - at 12/05/2023 10:30PM

Our team is finishing a full database restoration and once complete, will begin importing orders, prioritizing the last 2 weeks of data prior to the outage. Once those orders have been imported, tracking pages will update with the most recent shipment status provided by the carrier. New events received after the time of import will be processed and sent to integrated apps as expected. Any events received prior to the time of import will not be sent to integrated apps. Once we complete the initial import of the most recent orders, we will begin to import all prior historical data.

MONITORING 5 months ago - at 12/05/2023 06:56PM

Our team is finishing a full database restoration and once complete, will begin importing orders, prioritizing the last 2 weeks of data prior to the outage. Once those orders have been imported, tracking pages will update with the most recent shipment status provided by the carrier. New events received after the time of import will be processed and sent to integrated apps as expected. Any events received prior to the time of import will not be sent to integrated apps. Once we complete the initial import of the most recent orders, we will begin to import all prior historical data.

MONITORING 5 months ago - at 12/05/2023 03:45PM

Our team is finishing a full database restoration and once complete, will begin importing orders, prioritizing the last 2 weeks of data prior to the outage. Once those orders have been imported into our platform, tracking pages will be restored for those orders and we should start to receive events for new carrier updates.

MONITORING 5 months ago - at 12/04/2023 05:12PM

Our engineering team is continuing to restore missing data for orders placed prior to the outage beginning on Nov. 30 at approximately 3:00 am EST. At this time, our platform does not yet have order and shipment data for orders placed prior to the outage. Tracking pages for fulfilled orders prior to the outage have not yet been restored and we are not sending carrier events for those shipments.

Our team is finishing a full database restoration and once complete, will prioritize importing the last 2 weeks of data prior to the outage. Our team is targeting the import to begin today, Monday 12/4. Once those events have been imported into our platform, tracking pages will be restored for those orders and we should start to receive events for new carrier updates.

Orders placed during the outage up until Friday’s partial database restoration have been fully restored and will continue to receive updates from Shopify and send events to integrated apps. Orders placed after the partial database restoration on Friday, 12/1 will continue to work as expected. Please see our previous status updates for more details.

MONITORING 5 months ago - at 12/02/2023 02:44AM

Incident status has been updated to Monitoring while we continue to restore our database.

IDENTIFIED 5 months ago - at 12/02/2023 02:35AM

Incident severity has been downgraded to a Partial Outage.

IDENTIFIED 5 months ago - at 12/02/2023 02:21AM

Our engineering team is actively working to restore missing data for orders placed prior to the outage beginning on Nov. 30 at approximately 3:00 am EST.

Orders placed during the outage up until today’s partial database restoration have been imported into the database, and should continue to receive updates from Shopify and send events to integrated apps. Orders placed after today’s partial database restoration should continue to work as expected. Please see our previous status update for more details.

We do not yet have a clear timeline for the full restoration of our database but we have increasing confidence that our restoration process is working. We will continue to post updates here throughout the weekend and until all issues are fully resolved.

IDENTIFIED 5 months ago - at 12/01/2023 10:51PM

Our database has been partially restored from backup. We are actively working to restore missing data for any orders placed prior to the outage beginning on Nov. 30 at approximately 3:00 am EST.

The Malomo dashboard is accessible with limited data until the full database restore is complete. Orders placed during the outage have been imported and can be viewed in the dashboard via the Orders page, but not the Orders Beta page. Please note that the “Order Placed” timestamps displayed in our dashboard correspond to the time of import rather than the time the order was placed. Corresponding events sent to Klaviyo include the correct timestamp.

Tracking pages for orders placed during the outage are beginning to show shipment updates as expected. Tracking pages for orders placed prior to the outage will continue to experience issues until the full database restore is complete.

Our Events Processor is now operational, and you will begin to see events flowing back into your integrated apps. Events will be processed and sent in the order in which they were received, so new events triggered after restoration will continue to experience some delays until the system is fully caught up.

If you placed your Klaviyo flows in manual mode, you can begin working through your queue in the Needs Review tab to manually send messages. If you did not make any changes to your Klaviyo flows, they will begin triggering as events are received by Klaviyo.

For more information on manually sending flow messages in Klaviyo, as well as turning messages from manual to live, please visit the Klaviyo Help Center (https://help.klaviyo.com/hc/en-us/articles/115002779331).

We will continue to post updates here throughout the day and until all issues are fully resolved.

We appreciate everyone’s patience, understanding and your kind words as we continue to push to full resolution.

IDENTIFIED 5 months ago - at 12/01/2023 08:15PM

Our engineering team continues to work on our approach to restoring our systems, and we feel very confident that our current approach is working. At this time, however, we do not yet have a clear ETA for resolution. We will continue to post updates here throughout the day and until all issues are fully resolved.

IDENTIFIED 5 months ago - at 12/01/2023 05:26PM

Our engineering team continues to work on our approach to restoring our systems, and we feel very confident that our current approach is working. At this time, however, we do not yet have a clear ETA for resolution. We will continue to post updates here throughout the day and until all issues are fully resolved.

IDENTIFIED 5 months ago - at 12/01/2023 02:47PM

Our engineering team continues to work on our approach to restoring our systems, and we feel very confident that our current approach is working. At this time, however, we do not yet have a clear ETA for resolution. We will continue to post updates here throughout the day and until all issues are fully resolved.

IDENTIFIED 5 months ago - at 12/01/2023 10:04AM

Our engineering team continues to work around the clock on our approach to restoring our systems, and we feel very confident that our current approach is working. At this time, however, we do not yet have a clear ETA for resolution. We will continue to post updates here throughout the day and until all issues are fully resolved.

IDENTIFIED 5 months ago - at 12/01/2023 02:23AM

Our engineering team continues to work diligently on restoring the database. Until it is restored, you will experience issues accessing our dashboard, viewing tracking pages and receiving events to our integrations during this time. This size of our database makes this a time-intensive process since we are transferring huge files and loading them into a fresh database. Once the database is restored, we’ll begin bringing the Malomo application back up. While the engineering team has made material progress on the issue, we do not yet have an ETA.

We will share an action plan for merchants to follow once our application is online.

IDENTIFIED 5 months ago - at 11/30/2023 11:39PM

We are continuing to work on restoring the database, but do not yet have an ETA.

Important note for Klaviyo customers who have switched email notifications back to Shopify:

We recommend keeping your Klaviyo flows in MANUAL mode. This will prevent emails from automatically sending once our platform is restored. This will also allow you to bulk send any missed notification emails as our system catches up and sends events from the past 15 hours to Klaviyo. Important: Please make sure to delete any old notifications that have collected in the “Needs Review” section prior to today before turning your flows to Manual mode.

For more information on manually sending flow messages in Klaviyo, please visit the Klaviyo Help Center (https://help.klaviyo.com/hc/en-us/articles/115002779331).

IDENTIFIED 5 months ago - at 11/30/2023 09:23PM

We are actively working on restoring the database, but do not have an ETA at this time. We are working on multiple approaches to speed up the restore.

In the meantime, we recommend that merchants temporarily switch back to Shopify emails for their order and shipping notifications. Please see our Knowledge Base article (https://help.gomalomo.com/csc/how-to-restore-shopify-email-notifications) for full instructions.

If you need help with this, our support team is here to assist you in implementing this workaround at help@gomalomo.com.

IDENTIFIED 5 months ago - at 11/30/2023 08:05PM

We truly apologize for the inconvenience and frustration this issue has caused to you and your customers. We want to provide you with the most recent information we have, what we’re doing to address the outage, and what you can do in the meantime.

This update will be fairly technical in nature. In the spirit of transparency, we plan to over-communicate rather than under communicate at this time. But first, here’s what you can do right now:

Workaround
----------------
For immediate relief, we recommend that merchants temporarily switch back to Shopify emails for their order and shipping notifications. Please see our Knowledge Base article (https://help.gomalomo.com/csc/how-to-restore-shopify-email-notifications) for full instructions.

If you need help with this, our support team is here to assist you in implementing this workaround at help@gomalomo.com.

What To Expect
-----------------
We will continue to update you on this page every 1-3 hours until all issues are resolved.

Full Incident Details
---------------------
There are a few issues occurring, and we are actively and urgently working on addressing them.

- The first issue, related to notifications not being delivered, has been occurring intermittently since Nov. 28, and our team has been working on resolution around the clock.
- Impact:
- All types of outgoing events from Malomo to integrated apps are affected.
- Some events are delayed, but are still being sent to integrated apps.
- Some events may have been lost, and are not being sent to integrated apps.
- We previously thought this only affected order confirmation events, but have confirmed that all events are affected.
- We have been investigating the issue since Nov. 28 and have implemented some manual fixes.
- While investigating the issue, we discovered that the application that processes all events in Malomo (the “Events Processor”) experienced a memory leak, which caused the Events Processor to crash.
- Impact:
- When the Events Processor crashes, we lose events in the processing queue.
- The Events Processor has experienced intermittent crashes since Nov. 28.
- The majority of events were processed, but some events were lost when crashes occurred.
- We profiled the system and isolated the memory leak to the process that handles outgoing webhooks to our integrations.
- In order to address the webhook issue, we increased processing capacity, split processing loads and optimized memory utilization of the underlying code.
- At that point things appeared to stabilize.
- Simultaneously, we started working on a long-term fix that will pull all event queues out of memory and into a more resilient storage system that can survive system crashes.

- The second issue is caused by a database crash occurring at approximately 3:00am EST on Thursday, Nov. 30.
- Impacts:
- Malomo platform outage
- No events/notifications are being sent to integrated apps
- Tracking pages are not working
- We are still investigating the cause of the crash, but are currently restoring the database from backup and awaiting that restore to complete. This is a time-consuming process, and unfortunately the restore script does not provide any ETA for the restore.
- Once the restore is complete, we anticipate being able to restore full functionality to the platform while we continue investigating the fix for the first issue.

Please rest assured that we are taking this, and every outage, very seriously and are investing all of our resources to get to a full and speedy resolution.

IDENTIFIED 5 months ago - at 11/30/2023 04:07PM

The issue has been identified and we are currently in the process of restoring the platform.

INVESTIGATING 5 months ago - at 11/30/2023 03:03PM

We are currently investigating an issue. You may experience issues accessing our dashboard, viewing tracking pages and receiving events to our integrations during this time.

The easiest way to monitor Malomo and all cloud vendors

With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.

Start free trial

No credit card required · Cancel anytime · 3154 services available

Integrations with Slack Microsoft Teams Google Chat Datadog PagerDuty Zapier Discord Webhook

Setup in 5 minutes or less

How much time you'll save your team, by having the outages information close to them?

14-day free trial · No credit card required · Cancel anytime