We're Back! Extended Downtime Postmortem and Future Plans

Tell the community about tournaments, new go sites, software updates, etc.
User avatar
apetresc
Lives with ko
Posts: 256
Joined: Wed Apr 21, 2010 3:42 pm
Rank: AGA 1k
GD Posts: 1190
KGS: apetresc
IGS: apetresc
OGS: apetresc
Universal go server handle: apetresc
Location: Waterloo, Ontario (Canada)
Has thanked: 110 times
Been thanked: 146 times
Contact:

We're Back! Extended Downtime Postmortem and Future Plans

Post by apetresc »

Hey everyone! As you've all certainly noticed by now, Life in 19x19 was down for about 11 days. The good news is that we're now back and (hopefully) better than ever!


What Happened

Without going into too much technical detail, there was some sort of quota that was hit on the database server hosted by GoDaddy. Jordus (the admin who owned the actual hosting account) was having trouble getting timely action from GoDaddy, so not much progress was being made. After about a week, Brian Kirby and I offered to move everything to my AWS account, where we would have complete access to both the hardware and software, instead of being at the mercy of GoDaddy's tech support. At first we were scraping together old backups from Kirby's hard drive and what was left of the FTP server (which were over a year old! :( ), but eventually Jordus saved the day and gave us a raw dump of the database just prior to the outage. After some scripting, I had the full site up on an EC2 instance (for phpBB/Apache) and RDS instance (for MySQL).

The Future
Obviously the length of this downtime re-emphasized the need to reduce our bus factor. There were a couple of us with admin access to the board, but only two that could directly connect to the database, only one that controlled the domain, and 0 of us that had root on the physical hardware everything ran on (thanks GoDaddy). Going forward, we're going to be:
  • Posting the phpBB source, with all our theming and modifications, to a public GitHub organization, so anyone can clone it and help develop plugins and themes.
  • Scheduling automated daily backups of the DB. (This is actually already done as of today)
  • Make a sanitized (i.e, PMs and password hashes removed, etc) copy of this daily backup publicly available, as Linus Torvalds himself recommends ;)
  • Get a few more technically-savvy admins access to the AWS account this is running on. More on this soon.
  • We're running on a downright ancient version of phpBB (3.0.8, from November 2010 !!). Once everything's calmed down, I've verified the backup procedures work, and the versioning to GitHub is complete, I'm going to run an upgrade to the latest phpBB 3.2. Security and performance are the main benefits. This should be complete by end of day Monday February 20th.
Are we missing anything? Are there any other points of failure the community wants to see plugged?

Known Issues
Pretty much every feature that existed before the downtime has been restored. The below are the issues that I'm aware of, but decided weren't urgent enough to delay launch any further:
  • The database backup we were operating on had some character encoding issues; as such, you may notice some UTF-8 characters (e.g, accented names like Törmänen, or CJK ones like 古力) in usernames and topics be malformed. If that is the case, please contact an admin/mod and we will take care of it. I've fixed a couple of the glaring ones already, but I'm sure some have escaped me. NOTE: This should not affect actual post text, since that was binary-encoded.
  • The search index is being rebuilt overnight. Until then, search terms won't work for any topics older than today's.
  • For the next ~48 hours, your DNS settings may flip back and forth, leading you back to the old site (i.e, the error page). This will just solve itself within a day or two, as the new address reaches all the corners of the world.
If you notice any other problems not on this list, please reply to this thread with it!


Good luck and have fun, everyone :D
-Adrian
The road to wisdom? Well, it's plain, and simple to express: Err, and err, and err again; but less, and less, and less!
Image Image Image Image
jeromie
Lives in sente
Posts: 902
Joined: Fri Jan 31, 2014 7:12 pm
Rank: AGA 3k
GD Posts: 0
Universal go server handle: jeromie
Location: Fort Collins, CO
Has thanked: 319 times
Been thanked: 287 times

Re: We're Back! Extended Downtime Postmortem and Future Plan

Post by jeromie »

Glad the site is back. There are both content and members here that would be sorely missed if the go community were to lose them.

I'm moderately tech-savvy; let me know if there's anything I can do to help.
Kirby
Honinbo
Posts: 9553
Joined: Wed Feb 24, 2010 6:04 pm
GD Posts: 0
KGS: Kirby
Tygem: 커비라고해
Has thanked: 1583 times
Been thanked: 1707 times

Re: We're Back! Extended Downtime Postmortem and Future Plan

Post by Kirby »

I'd like to extend thanks to Adrian, once again, who offered to host us on his AWS instance, and who also set things up with a pretty quick turnaround.

He received the database backup yesterday, and we are up and running today!

With multiple people having access to the AWS account, along with a daily backup of the database publicly available, we should be able to avoid the problem we had this time around - we won't be bottle-necked in fixing the site if an unexpected area fails.
be immersed
bayu
Lives with ko
Posts: 163
Joined: Wed Jul 20, 2011 11:33 am
GD Posts: 0
Has thanked: 19 times
Been thanked: 32 times

Re: We're Back! Extended Downtime Postmortem and Future Plan

Post by bayu »

Thank you all for resolving this!

I've got one point to add to the list:
Hopefully there won't be a next time, but I d appreciate it in case of long downtime if there was, after some delay of course, a more informative error message saying "give us a week or two" or pointing to senseis or reddit where there was some status update.
If something sank it might be a treasure. And 2kyu advice is not necessarily Dan repertoire..
User avatar
ez4u
Oza
Posts: 2414
Joined: Wed Feb 23, 2011 10:15 pm
Rank: Jp 6 dan
GD Posts: 0
KGS: ez4u
Location: Tokyo, Japan
Has thanked: 2351 times
Been thanked: 1332 times

Re: We're Back! Extended Downtime Postmortem and Future Plan

Post by ez4u »

Many, many, many thanks to Adrian and Brian (Apetresc and Kirby)!!!
:clap: :clap: :clap: :clap: :clap: :clap:
Dave Sigaty
"Short-lived are both the praiser and the praised, and rememberer and the remembered..."
- Marcus Aurelius; Meditations, VIII 21
User avatar
apetresc
Lives with ko
Posts: 256
Joined: Wed Apr 21, 2010 3:42 pm
Rank: AGA 1k
GD Posts: 1190
KGS: apetresc
IGS: apetresc
OGS: apetresc
Universal go server handle: apetresc
Location: Waterloo, Ontario (Canada)
Has thanked: 110 times
Been thanked: 146 times
Contact:

Re: We're Back! Extended Downtime Postmortem and Future Plan

Post by apetresc »

Marcel Grünauer wrote:When logged in, my surname "Grünauer", shows up as "Grünauer" in the upper left and also in the username for this post.
Yup, that's the sort of encoding problem I was referring to in the "Known Issues" part. Thanks for pointing it out, I've fixed it now :)
bayu wrote:Hopefully there won't be a next time, but I d appreciate it in case of long downtime if there was, after some delay of course, a more informative error message saying "give us a week or two" or pointing to senseis or reddit where there was some status update.
Yeah, for sure. Sometimes it's not possible to put an error message on the lifein19x19.com URL itself, depending on the nature of what the problem is, but at the very least Reddit/SL.
The road to wisdom? Well, it's plain, and simple to express: Err, and err, and err again; but less, and less, and less!
Image Image Image Image
Gomoto
Gosei
Posts: 1733
Joined: Sun Nov 06, 2016 6:56 am
GD Posts: 0
Location: Earth
Has thanked: 621 times
Been thanked: 310 times

Re: We're Back! Extended Downtime Postmortem and Future Plan

Post by Gomoto »

Thanks a lot, keep up your outstanding work!
User avatar
joellercoaster
Lives with ko
Posts: 230
Joined: Mon Sep 16, 2013 5:50 am
Rank: OGS 2k
GD Posts: 0
OGS: Joellercoaster
Location: London
Has thanked: 288 times
Been thanked: 65 times
Contact:

Re: We're Back! Extended Downtime Postmortem and Future Plan

Post by joellercoaster »

*dances*
Confucius in the Analects says "even playing go is better than eating chips in front of tv all day." -- kivi
User avatar
apetresc
Lives with ko
Posts: 256
Joined: Wed Apr 21, 2010 3:42 pm
Rank: AGA 1k
GD Posts: 1190
KGS: apetresc
IGS: apetresc
OGS: apetresc
Universal go server handle: apetresc
Location: Waterloo, Ontario (Canada)
Has thanked: 110 times
Been thanked: 146 times
Contact:

Re: We're Back! Extended Downtime Postmortem and Future Plan

Post by apetresc »

Progress Report
  • Fixed a serious bug with the
The road to wisdom? Well, it's plain, and simple to express: Err, and err, and err again; but less, and less, and less!
Image Image Image Image
User avatar
Bonobo
Oza
Posts: 2223
Joined: Fri Dec 23, 2011 6:39 pm
Rank: OGS 9k
GD Posts: 0
OGS: trohde
Universal go server handle: trohde
Location: Germany
Has thanked: 8262 times
Been thanked: 924 times
Contact:

Re: We're Back! Extended Downtime Postmortem and Future Plan

Post by Bonobo »

Wow, SO happy to be here again.

HUGE THANKS to everybody involved

I had reloaded my “view unread posts” tab almost every hour, only to get “The host does not exist.”, but just a minute ago Schachus wrote on the German DGoB forum that the URL without the WWW, i.e. http://lifein19x19.com/, DOES work, while the URL with WWW doesn't.

So:
http://www.lifein19x19.com/ BAD
http://lifein19x19.com/ GOOD
Would be nice to have this resolved, too.

• Also, could we get https?

• And about the “Donate” button … does it point to the correct PayPal acct already?


Thanks folks, you’re cool!
“The only difference between me and a madman is that I’m not mad.” — Salvador Dali ★ Play a slooooow correspondence game with me on OGS? :)
User avatar
apetresc
Lives with ko
Posts: 256
Joined: Wed Apr 21, 2010 3:42 pm
Rank: AGA 1k
GD Posts: 1190
KGS: apetresc
IGS: apetresc
OGS: apetresc
Universal go server handle: apetresc
Location: Waterloo, Ontario (Canada)
Has thanked: 110 times
Been thanked: 146 times
Contact:

Re: We're Back! Extended Downtime Postmortem and Future Plan

Post by apetresc »

Bonobo wrote:I had reloaded my “view unread posts” tab almost every hour, only to get “The host does not exist.”, but just a minute ago Schachus wrote on the German DGoB forum that the URL without the WWW, i.e. http://lifein19x19.com/, DOES work, while the URL with WWW doesn't.

So:
http://www.lifein19x19.com/ BAD
http://lifein19x19.com/ GOOD
Would be nice to have this resolved, too.
Good catch. I've just added the DNS entry and VirtualHost for www.lifein19x19.com too, it should start working in a few hours, again as DNS propagates. Thanks! :)
Bonobo wrote:• Also, could we get https?
Yes! Now that Let's Encrypt is giving out free certificates, that's a possibility. I'll add that to the roadmap.
Bonobo wrote:• And about the “Donate” button … does it point to the correct PayPal acct already?
Nope, haven't sorted that part out at all yet. I guess I could remove the button for now.
The road to wisdom? Well, it's plain, and simple to express: Err, and err, and err again; but less, and less, and less!
Image Image Image Image
User avatar
Bonobo
Oza
Posts: 2223
Joined: Fri Dec 23, 2011 6:39 pm
Rank: OGS 9k
GD Posts: 0
OGS: trohde
Universal go server handle: trohde
Location: Germany
Has thanked: 8262 times
Been thanked: 924 times
Contact:

Re: We're Back! Extended Downtime Postmortem and Future Plan

Post by Bonobo »

apetresc wrote:[..]

I've just added the DNS entry and VirtualHost for http://www. [..]
Awesome :)
• [..] https?
[..] I'll add that to the roadmap.
:-)
• [..] “Donate” button [..]?
[..] I guess I could remove the button for now.
NOOOOOO, totally inacceptable, a new and valid button, please :twisted:
“The only difference between me and a madman is that I’m not mad.” — Salvador Dali ★ Play a slooooow correspondence game with me on OGS? :)
jeromie
Lives in sente
Posts: 902
Joined: Fri Jan 31, 2014 7:12 pm
Rank: AGA 3k
GD Posts: 0
Universal go server handle: jeromie
Location: Fort Collins, CO
Has thanked: 319 times
Been thanked: 287 times

Re: We're Back! Extended Downtime Postmortem and Future Plan

Post by jeromie »

Thanks for getting everything running again!
xiayun
Lives in gote
Posts: 384
Joined: Fri Jul 29, 2016 10:24 pm
Rank: KGS 2d
GD Posts: 0
Has thanked: 22 times
Been thanked: 98 times

Re: We're Back! Extended Downtime Postmortem and Future Plan

Post by xiayun »

Thanks so much for all the efforts to get the site back online!
User avatar
daal
Oza
Posts: 2508
Joined: Wed Apr 21, 2010 1:30 am
GD Posts: 0
Has thanked: 1304 times
Been thanked: 1128 times

Re: We're Back! Extended Downtime Postmortem and Future Plan

Post by daal »

I just want to add my appreciation that Jordus, Brian and Adrian had the will and technical wherewithal to turn a database dump back in to L19 gold. Thanks for your time and work!
Patience, grasshopper.
Post Reply