1. 程式人生 > >How I avoided AWS RDS upgrade with SQL tuning?

How I avoided AWS RDS upgrade with SQL tuning?

How I avoided AWS RDS size upgrade with SQL tuning

I work at a fast growing startup where technology plays a crucial role in handling data, documents and enforcing processes. Working at a startup means limited resources. There are projects queued to be finished, enhancements on existing projects.

While working on a project I tend to think over every aspect of it, from the UI, to backend and until the database structure. The SQLs that are going to run the processing of data that will be done in the backend and finally the display on the UI. Being in the tech field for over a decade there are two things that I have realised:

  • Leave room for enhancement.
  • Performance is not a separate project.

There is a particular project in the company which required to gather some data from the database and push it to the CRM (Customer Relationship Management Software) every 15 mins. What I did in this project is a classic example of what not to do while coding. With the growing customer base it was becoming very difficult to manage things and a CRM could prove to be of great help.

Company got the account created and I got the API credentials and documentation. There were already other things in line for development for the product and this was importing for managing the existing customers. It was a classic time crunch scenario and managed to finish and deploy the CRM update project in half a day.

I did not give any thought to the fact that it is going to be running every 15 mins. Neither did I leave any space in the code for future enhancements. It was one SQL and one API format function and thats it. Few days later another request to push some more data came and I added some more joins in the SQL. Over the next few months this kept happening and I kept adding more and more to the SQL. Finally it had become a giant mess of joins and sub queries which ran every 15 mins.

I knew fully well what I had done was wrong and with time as we would get more users this was going to come back to bite me. Since this job was running in the background there was no visible page which would indicate a performance issue. Couple of weeks back the site just started running slow and critical scheduled jobs were not getting completed in time. I checked the load on the site, checked for memory leaks disk spaces everything was fine. Finally while checking the database I found that we had run out of AWS CPU Credit balance and AWS was throttling the CPU usage for the database.

RDS CPU credit balance before and after SQL tuning

I had already enabled slow query logging and upon checking my mistake with the CRM project was here to bite me. The SQL which used to complete execution in half a second a year ago was now taking on an average 150 seconds to complete. SQL explain showed that cardinal product was returning 1.8 million rows.

Not the exact plan but something on the similar lines

After a meeting with the product team to understand if the requirements had changed over time for this project it turned out that there was no longer a need for the SQL mess that I had written and simple select query on one table would suffice. Also the CRM update was not needed to be made every 15 mins rather every hour and on demand on certain user actions.

I quickly changed the SQL and deployed it in production. Within hours of deployment I could see the CPU credit balance filling up and there was no need to upgrade the database. I took a couple of days more to work on other slow running queries and created indexes as required or modified queries where needed to obtain optimal performance. This exercise has been so helpful that we are currently observing the usage of the RDS in order to evaluate if we need to downgrade the class of the hardware.

Take aways from this whole exercise:

  • Product requirements are not always the same and will evolve over time. It is important to revisit the requirements periodically.
  • It is not always about adding more code to solve issues, reducing code is just as important.
  • Setting up alarms on key metrics.
  • Reiterating on the what I had already mentioned earlier in the blog — Performance is not a separate project