Need to monitor Parade outages?
Stay on top of outages with IsDown. Monitor the official status pages of all your vendors, SaaS, and tools, including Parade, and never miss an outage again.
Start Free Trial
Issue Summary
We had a major delay in processing CSV files for Available Load integrations with some customers.
This only affected customers on a CSV load integration, and not all customers using the integration were affected. Customers sending over larger files were more likely to be affected.
Timeline
We first detected slowdown in CSV file processing with 1 of our customers on 1/13/2023. Over the weekend this issue got worse, and the majority of Available Load CSVs were not processing on the Monday of 1/16/2023.
We resolved this issue on the night of 1/19/2023 with a hotfix deployment.
Root Cause
We discovered that the root cause of the issue was a bugfix that was deployed on the night of 1/12/2023. This bug fix helped improve the consistency and timing of loadboard postings after a load was made re-available over our CSV load integration.
However, what we failed to recognize was that the code change resulted in a higher usage of memory. This increase of memory caused our application to exceed the allocated memory threshold for our provisioned computing resources. Out of Memory errors were more common for customers with larger files. This resulted in files being partially processed, before getting interrupted due to memory constraints, and therefore customers saw a delay in load updates coming into Parade.
Resolution and recovery
On 1/13/2023, only one customer was affected and a support ticket was raised to our team. When more customers were affected on the morning of 1/16/2023, the ticket was immediately re-prioritized to be P0. Some optimizations were deployed the night of 1/16/2023, but did not consistently solve the problem.
From 1/17/2023 to 1/18/2023, out team continued to monitor processing times, and noticed that larger CSV files were still seeing major delays in processing. Some small optimizations were implemented that benefited a few customers, but not all.
The root cause was identified and tested on 1/19/2023, and deployed that night. This resulted in all customer data being updated successfully. Since CSV files are snapshots of customer load data, no load data was lost.
Corrective and Preventative Measures
We are working on better preventative measures and monitoring for resource-constraint issues. This includes re-evaluating any CPU and Memory thresholds for our integration pipeline. We have also implemented preventative measures to increase the overall memory allocation for crucial parts of our platform.
With IsDown, you can monitor all your critical services' official status pages from one centralized dashboard and receive instant alerts the moment an outage is detected. Say goodbye to constantly checking multiple sites for updates and stay ahead of outages with IsDown.
Start free trialNo credit card required · Cancel anytime · 4522 services available
Integrations with