Hey everyone – the overwhelming recent feedback we’ve gotten is that reliability and stability are the #1 thing Gather needs to work on right now, so in the interest of solving that really well, we’re trying to focus single-mindedly on it for the foreseeable future. This means pausing API office hours and being much less responsive on the forum for a while. We will still make just as much of an effort to avoid breaking changes, and will still respond urgently to any fires or bugs in production, so please do continue to report those (through the usual Partners channel if it’s urgent, we’ll respond faster there than here).
Thanks for the information. Sad to hear the change.
What specific metrics are being tracked in terms of reliability and stability? The Status page suggests that there is a near +99% uptime, so I am interested in what is needing improvement, or what types of issues we (the dev community) might be on the lookout for?
great question – this is the longer tail of issues now, for example people being unable to connect at all, or loading in extremely slowly (>30s). Also stuff like deploys being not totally smooth, a long tail of non-fatal bugs like people’s avatars not being set sometimes, and extensions not working properly. Plus a lot of going back and simplifying things and cleaning up techdebt along the way.
some concrete metrics, just to fully answer the question
p90 and p95 for time it takes to connect and sync initially
fraction of people who get disconnected for >10s, especially during deploys
how many logs are “error” level, which represents a bug or unexpected behavior