On Thursday afternoon, October 28th, users began having trouble connecting with our platform. This immediately became our highest priority. Teams began working around the clock to identify the source of the problem and get things back to normal.
This was an especially difficult outage in that it involved a combination of several factors. A core system in our infrastructure became overwhelmed, prompted by a subtle bug in our backend service communications while under heavy load. This was not due to any peak in external traffic or any particular experience. Rather the failure was caused by the growth in the number of servers in our datacenters. The result was that most services at Roblox were unable to effectively communicate and deploy.
Due to the difficulty in diagnosing the actual bug, recovery took longer than any of us would have liked. Upon successfully identifying this root cause, we were able to resolve the issue through performance tuning, re-configuration, and scaling back of some load. We were able to fully restore service as of this afternoon.
We will publish a post-mortem with more details once we’ve completed our analysis, along with the actions we’ll be taking to avoid such issues in the future. In addition, we will implement a policy to make our creator community economically whole as a result of this outage. There are more details on this to come. As part of our “Respect the Community” value, we will continue to be transparent in our post-mortem.
To the best of our knowledge, there has been no loss of player persistence data, and your Roblox experience should now be fully back to normal. You can always contact our support team if you experience any hiccups using Roblox now or in the future.
We are grateful for the patience and support of our players, developers, and partners during this time.