Monitoring - We have identified the root cause of the intermittent 502 errors affecting POST requests in both Test and Production environments. The issue was traced to an incompatibility between the Google Cloud Load Balancer and the NGINX controller. This incompatibility was related to the handling of the keepalive connection window, which resulted in the TCP connection sending a FIN (finish) signal from the gateway to the load balancer while connections were still active.

This condition created a race situation where the load balancer prematurely closed the connection, leading to 502 errors.

To address this, we have increased the NGINX keepalive time to exceed the configured timeout interval of the load balancer. This adjustment ensures that the keepalive window in NGINX remains open longer than the load balancer's timeout, preventing premature connection terminations.

We continue to monitor the system for stability and will provide further updates as needed. Thank you for your patience and understanding as we worked to resolve this issue.

We are pleased to report that in the past hour, we have observed a significant reduction in the number of 502 errors compared to our regular data. While this is an encouraging sign that the fix is effective, we will continue to monitor the system closely to ensure sustained improvement and stability.

Nov 10, 2024 - 15:38 GMT-03:00
Identified - We have identified an underlying structural issue affecting our platform running on Google infrastructure (SaaS BR and US). This issue causes some POST requests originating from the internet to sporadically return 502 errors in both the Test and Production environments

While the overall impact affects a very low percentage of total requests, customers with high traffic volumes may experience this error more frequently. It is important to note that only POST requests are primarily impacted. Despite this, our platform’s availability remains above 99.95%, which is higher than the contracted SLA.

Current Status: We are actively working on a permanent resolution. In the meantime, we recommend customers experiencing higher impact implement a fast retry mechanism, as subsequent retries will successfully process the request.

Please note that not all 502 errors are related to this specific issue.

Nov 08, 2024 - 15:08 GMT-03:00

About This Site

Welcome to Digibee Platform Status Page

Here you can verify current platform's status and historical data on past incidents. We keep this page updated with realtime information collected from our systems, so you can check regularly or sign up for SMS or email updates.

SaaS BR Operational
90 days ago
99.96 % uptime
Today
BR - Portal Operational
90 days ago
100.0 % uptime
Today
BR - Test Environment ? Operational
90 days ago
99.95 % uptime
Today
BR - Prod Environment ? Operational
90 days ago
99.94 % uptime
Today
BR - Core APIs ? Operational
90 days ago
99.96 % uptime
Today
SaaS US Operational
90 days ago
99.99 % uptime
Today
US - Portal Operational
90 days ago
100.0 % uptime
Today
US - Core APIs Operational
90 days ago
100.0 % uptime
Today
US - Prod Environment Operational
90 days ago
99.99 % uptime
Today
US - Test Environment Operational
90 days ago
99.99 % uptime
Today
Operational
Degraded Performance
Partial Outage
Major Outage
Maintenance
Major outage
Partial outage
No downtime recorded on this day.
No data exists for this day.
had a major outage.
had a partial outage.
Past Incidents
Nov 21, 2024

No incidents reported today.

Nov 20, 2024

No incidents reported.

Nov 19, 2024
Completed - The scheduled maintenance has been completed.
Nov 19, 09:00 GMT-03:00
In progress - Scheduled maintenance is currently in progress. We will provide updates as necessary.
Nov 19, 08:00 GMT-03:00
Scheduled - We are going to rotate our SSL Certificates for our Core APIs, Portal and Test/Prod Environments. No disruption is expected during this maintenance.
Nov 11, 23:38 GMT-03:00
Nov 18, 2024

No incidents reported.

Nov 17, 2024

No incidents reported.

Nov 16, 2024

No incidents reported.

Nov 15, 2024

No incidents reported.

Nov 14, 2024

No incidents reported.

Nov 13, 2024

No incidents reported.

Nov 12, 2024

No incidents reported.

Nov 11, 2024

No incidents reported.

Nov 10, 2024

Unresolved incident: Intermittent 502 Errors for POST Requests.

Nov 9, 2024

No incidents reported.

Nov 8, 2024
Resolved - This incident has been resolved.
Nov 8, 15:06 GMT-03:00
Update - A fix has been implemented and the error rate lowered
Nov 7, 13:29 GMT-03:00
Monitoring - mitigation in place, 502 rate is lowering.
Nov 7, 12:31 GMT-03:00
Update - Large payloads and long latency pipelines are more affected.
Nov 7, 12:16 GMT-03:00
Investigating - We are currently investigating the issue.

External HTTP Load Balancer is presenting a 0.5% error rate at peak.

Nov 7, 12:15 GMT-03:00
Nov 7, 2024