Outage: Julia package server offline [resolved]

Status: Resolved

Description

Around 2023-03-06T08:16:00Z the DNS resolver for the package servers went offline.

Mitigation

As a temporary measure we are resolving the DNS names for the pkgservers directly. This minimizes impact while we are working on resolving the core issue.

Outdated

Impacted users can set the environment variable JULIA_PKG_SERVER="" to bypass the pkgservers.
Please remember to unset the environment variable after the outage has been resolved.

Resolution

At 2023-03-06T15:40:00Z the load balancing DNS server resumed functionality and full service was restored.

Next steps

This outage highlighted a single point of failure in our pkgserver infrastructure. We will add fallback servers to avoid future outages.

Events

  • 2023-03-06T08:16:00Z Outage reported on Slack.
  • 2023-03-06T09:51:00Z Investigation started.
  • 2023-03-06T10:34:00Z First report update.
  • 2023-03-06T10:51:00Z First mitigation attempt for us-east
  • 2023-03-06T10:56:00Z Outage mitigated
  • 2023-03-06T15:40:00Z Outage resolved

Conclusion

Thank you to all the community members who quickly reported this issue! Special thanks to @fredrikekre, Pradeep, @aviks who collaborated to mitigate this issue quickly. Thanks to @staticfloat for the final resolution of the issue and his continued efforts in maintaining the JuliaLang infrastructure.

The JuliaLang open-source infrastructure is maintained by volunteers and it was greet to see folks come together over this.

42 Likes

This topic was automatically closed after 1 minute. New replies are no longer allowed.