So it's kinda stressful, and I wouldn't recommend spending your whole life like this, but that adage about learning everything all at once when you're working at a tiny startup is kinda true. If I don't write all this down, I'm going to forget it, so here's what I've been up to in the last few weeks:
On September 15th, Mode Media, where I'd worked since 2011 (or 2009 if you include the time at Ning before we got acquired by Mode), went out of business. I was lucky enough to be rescued from the ashes of the company because an investor, Cyndx, wanted to buy Ning and keep it running, and immediately sent six of us ex-Mode employees contracts for the next week, and put in a bid to buy Ning. A week later, a completely unrelated investor, Noosphere Ventures, swept in and bought the company instead, which was exciting to say the least, but seems to be working out well.
This meant that Ning went from being a low priority maintenance mode job, supported by a ton of people at Mode who worked on other things most of the time, to being extremely important, and also super understaffed. We were down to three engineers and no ops, IT, or QA. I had been taking care of most of the backend of the system for a while, but now I was also responsible for ops, DBA, and IT work. So, basically, we went from a 150-person company to a 6-person startup. Luckily our entire engineering team is pretty seasoned (I think that each of us has at least 10-15 years of industry experience), our CEO/GM has decades of experience negotiating with suppliers to keep everything under control, and we got to keep our two most experienced support folks, to keep the lines of communication open with customers.
Notable stuff I've done since then:
- Hunted around to find passwords, SSH keys, and SSL certificates for everything in the system. Edited databases to give myself administrator access where necessary.
- Poked my way through the system until I found the LDAP servers, and brought up an OpenDJ replica on EC2, along with copies of our Confluence and JIRA servers.
- Debugged a ton of MySQL server outages, mainly caused by failing disks or partitions getting full after a replication failure caused binary log purging to stop. Rebuilt several 900GB databases from backups and fixed up replication.
- Moved everything from two AWS accounts onto a third account (for the new company). This meant coding up a hacked up MapReduce-y type set of Python scripts to copy 140 terabytes (400 million files) from an S3 bucket in one AWS account to a bucket in another account in a couple of days, learning enough about CloudFormation to bring up a copy of an old stack in a new account, rebuilding a bunch of Wordpress blogs from backups and disk snapshots, and copying a ton of random data around in S3.
- Learned that granting someone from another account access to your S3 bucket gives them the ability to create files that you have no access to, unless you set a bucket policy to force them to give you full control. Good thing I'd optimized my copy scripts so I could rerun the whole thing in 12 hours...
- Learned some tricks for setting up an AWS VPC, like making sure to turn on DNS hostnames, to avoid getting a warning about not being able to resolve the hostname every time you run sudo, and also some other apps refusing to start up.
More to come, I'm sure... once this is all taken care of, I imagine there'll be way more cloud work, and tasks to bring as much of our infrastructure as possible into 2016, and enable future development. It's definitely keeping me busy, but it's exciting for sure!