In my previous post Databases in the Cloud, I went over some of the benefits Amazon Web Services has to offer. Amazon EC2 is just another great service offered by Amazon. Simply speaking, EC2 is basiacally a virtual server that offers a variety of operating systems and computational power. EC2 allows users to build apps, automate scaling according to changing needs and peak periods, deploy computational intensive models, streamline development processes, and create virtual servers to manage storage, lessening the need to invest in infrastructure.
Amazon RDS is another amazing service provided by AWS (Amazon Web Services) that is designed to set up, manage, and scale a relational database such as MySQL, PostgreSQL, Oracle, SQL Server, and more in the cloud. Not only that, but Amazon RDS assumes the day-to-day management tasks associated with MySQL such as backups, failure detections, recovery, and scaling. I decided to make the switch onto Amazon RDS using the free tier so I could make my database easily accessible for my shiny app. In this post, I will provide two ways for you to make this necessary switch and why you should.
If you’re an R programmer then you’ve probably crashed your R session a few times when trying to read datasets of a over 2GB+. It can get a little frustrating when all you want to do is harness the true power behind R through building statistical models on these large datasets and your session crashes with a window stating ‘R SESSION ABORTED’. Since R executes code in-memory, which is the computers available RAM, you will encounter failures when reading datasets larger than the available memory. Also, once you have enough dataframes stored then your R session can become extremely slow and affect your work severely. One of my classes at Pace University showed me the value in storing your larger datasets in a MySQL database and I decided to learn how to stream these datasets in R so we do not have to store the larger datasets in-memory.