Scaling, Upgrades, Downtime, and Grey Hair
I was watching bug reports, user questions and feedback streaming in while I was trying to figure out how to add a feature to Crate’s increasingly complex codebase. As I was looking at the different parts of the app and dreading deploying it to receive a sea of Node.js errors, I kept thinking there had to be a better way.
While I reflected on what I could do I took my own approach of splitting off a process-heavy piece of the puzzle months ago and thought of applying it to the rest of the platform. Yes, this was when I decided that Crate would be rebuilt as microservices.
Ok, what now?
So after I made the decision to rebuild Crate as microservices, I had to figure out what that would look like. Thankfully after a bit of research and some prototyping, I came up with a pretty solid solution. Over the next 3 weeks I proceeded to build this new system while fighting fires in a monolithic app that was beginning to buckle under the pressure of popularity.
In the last 3 weeks of 2015 I was able to rebuild the platform as microservices from 2 large apps, reconfigure the clustering situation and improve some of the more fragile data routines.
As it was being tested in a development environment everything looked good!
There was no difference in the look and feel of the app, but one MAJOR difference surfaced: Crate was MUCH FASTER.
I was happy.
I was proud.
I was thankful that all my work had in fact turned out how I had hoped and this update spelled the end of me waking up to dread my inbox.
Then the other shoe dropped.
Soooo… what happened?
I debugged it. I tested it. It was tested by a test group. It looked and worked great!
So we notified our users and I pushed the codebase into production at 8pm on the first Tuesday of the year. It worked great I thought. Then there were 150,000 data requests going through the system all at once!
So I have to take the entire system down. It’s offline. Dead. Oh, and we had some pretty important press going out that couldn’t be stopped.
So here we are with this amazing attention being shined on us and the app is offline… and it’s my fault. It stayed fully dysfunctional for 48 hours and it was ugly.
I proceeded to spend the next week putting out fires and tracking down what’s wrong with my architecture at scale. As it turned out, it wasn’t so much the architecture or any one thing. It was a logic error that caused the ridiculous amount of requests and it was a misunderstanding of reactivity in the app that caused all my grief.
Now that all of that has been ironed out, we have an application that works great and loads fast, and I’m not waking up fearing that first email check of the day. As an added bonus to the speed and stability, I’ve been able to take some of the feedback we’ve gotten from our users and implement improvements to the platform!
As promised, the changes were easy to make and the code was compartmentalized which made it easier to test and much easier and safer to deploy.
What have we learned from this exercise? Let me lay it out for you.
Be flexible and agile (in the true sense of the word). We were forced to pivot our entire platform architecture within a few days due to the usage patterns we saw emerging. Even though we basically had to rewrite the entire platform in a couple of weeks, the ideas were there already and just needed to be split into different applications.
Monitoring your app’s performance and resources is important! I can’t stress this one enough. If it weren’t for Kadira, we would not have had any idea a) why our app was slow and b) which parts of the app were causing problems c) what to do to solve our problem. Thanks to the monitoring Kadira does, I was able to see that we had a problem with runaway processes using up all the RAM in our app which was causing it to crash and get restarted.
Listen to your users. Thankfully my co-founder Ross had the foresight a year ago to sign up for Intercom which allowed us to easily receive feedback from, and communicate with, our end-users. Every single one of our users was very understanding of our growing pains and they provided us with valuable feedback and debugging that we were able to use to solve many of the issues we were seeing with the app.
Digital Ocean has amazing tech support. Every time I’ve reached out to DO for help they have gone above and beyond what they actually support in order to provide me with some insight into problems I’ve had. I can’t say enough good things about their service and support. If you need VPS hosting, go sign up with Digital Ocean.
All of these improvements are in preparation for some much bigger improvements coming in the next few months.
Crate is going to be growing and changing as we continue to add features our users are asking for and improve the overall user experience. I want to take the opportunity right now to thank every single one of our users for their patience and understanding as we wade through the waters of building something new and exciting.
Sorry, We’re Not Mobile Friendly Yet (Here’s Our Roadmap)
We just released Crate into the wild and we couldn’t be more excited! Except for our mobile experience on the platform… Our mobile experience sucks, we know! With that in mind, I’d like to lay out the Product Roadmap for Crate over the next 6 months:
1. Design / layout.
When we started building Crate, the idea was to bring a minimalistic approach to finding and viewing great content based on users’ specific interests. While the initial design has evolved along the way from inception to beta, we want to bring a more fulfilling experience to our users.
The plan for the desktop is to evolve into an experience that feels more like checking and organizing your email. We will be working to also bring our users a mobile experience that works. Not only will we bring a mobile experience that works, but we’ll bring a mobile experience that is a joy to use!
For now, we encourage our provide as much feedback as possible to use about the product as they are forced to open their laptops to use it. Rest assured, the mobile experience in on our priority list; Number 1 with a BULLET.
2. Multiple social accounts.
We realize that content marketers likely manage more than one Twitter account. In fact, you probably have so many accounts on a dozen different services that it makes your head spin trying to manage them all! We get it. We will be rolling out the ability to manage multiple accounts from multiple services in the coming months.
We have already made a point of allowing you to connect your Buffer account to our platform. This was a major development based on user feedback as it was one of the top tools being used and also armed users with a pro account with the ability to share to Facebook and LinkedIn. In August 2015, Buffer announced that they were closing down their suggestions feature. Our current users couldn’t be happier that not only have we been able to replace Buffer’s suggestions, but improve it by offering content that is tailored to your needs.
You have also identified to us that managing multiple accounts and services is a necessity and we’ve heard you loud and clear! We are working towards allowing you to connect multiple Twitter accounts to your main account and then expanding that reach into Facebook, Google+, LinkedIn, and Pinterest.
We want to be the one-stop-shop for all your content marketing needs.
That’s why the ability to manage multiple accounts is a must when it comes to features on our platform.
3. Up/down voting articles.
As the Crate platform progresses, we want to make the system learn what you like and what you don’t like. Our goal is to provide you with the best and most relevant content possible. In order to achieve our goal, we need to allow our users to provide feedback on the results the platform is generating so that it can learn from those and filter out the noise.
The ability to provide feedback on the quality of the results in your Crates will allow the system to learn from all users’ feedback. The more feedback provided, the better the results will be in your Crates!
Tell us what you want…
As always, we welcome and encourage your feedback. We have a vision for Crate, but if you need and/or want a feature, TELL US! We want to make this platform the best it can be for our users.
“We will now be conducting our lockdown drill.”
I was with my 2 year-old son at the local Early Years drop-in when the principal came over the intercom of the school informing us that this drill was about to begin. It was a little strange for me, and my son kept asking why we were locking the doors and turning off the lights. I didn’t really know how to explain it to him, so I just told him we were practicing quiet time.
In spite of the obvious discomfort of even having to do a “lockdown drill” my mind wandered a bit to the security of Crate. How easy would it be for someone get steal users’ information? What kind of data could they get? Does it matter if someone “sniffs” data being transferred?
As my 4-year-old daughter would put it: I thought about it in my brain, my brain said we should probably secure our application and its data with SSL, and I agreed with my brain that it was the right decision. Later that day I came home and started working toward securing our application’s data.
This past weekend, we reconfigured our web servers and our application to run strictly over SSL for all connections. This means that any data being passed from your browser to our servers will be encrypted with 256-bit SSL which will defend us against anyone stealing data; sensitive or not.
The type of data that is generally moving over the internet between our users and our application is not sensitive; it’s a command to share a story to Twitter, or add some blog post to their Buffer queue. We use Twitter’s OAuth login service and have always used secured URLs to connect to our databases and APIs to ensure that no data is ever in jeopardy.
In reality there was no specific threat or pending doom that made us decide to run on SSL, it simply came down to a simple rule that I read a long time ago regarding web-based applications:
“If you ask someone to provide you with any info, make sure it’s passed over SSL.”
In this age of internet security, hackery, and identity theft, we want to take nothing for granted. We have taken every precaution to ensure that your data stays safe so that you can go about your business of sharing great social content.