This blog entry stems from a course assignment in INF 202 Introduction to Data and Databases. My topic is “MySQL is not scalable enough for YouTube”, link “http://www.computerworld.com/s/article/9234747/YouTube_scales_MySQL_with_Go_code?taxonomyId=173″.
Google acquired YouTube in 2006 and since then one of the challenge Google faced was maintaining its service as it keeps growing bigger and bigger. Over 800 million unique users visit YouTube each month with over 4 billion hours of video being watched. Statistics shows that about 72 hours of video are being uploaded to the service every minute. YouTube stores all the videos on its own file system and uses MySQL to store all the metadata which are needed to serve each video and preferences and other information such as country customizations and advertisements. While MySQL is very reliable and also quite scalable, it still causes complications when dealing with extreme huge data such as YouTube. The problem mostly arises when MySQL reaches a certain point of usage, when managing hardware and number of instances becomes very time consuming. MySQL is a little less effective when being highly scaled and as for YouTube its not feasible. Tweaking MySQL could also be dangerous as it could create more problems and complications. As an alternative, without changing the core of MySQL, YouTube engineers are developing a set of software called Vitess. It is written in Go code, a fairly new programming language that can be implemented in large-scale environments. After Google acquired YouTube, they have already implemented a component from Vitess which has proven to be a success. It consolidates thousands of MySQL queries into smaller batches and allows them to be executed more efficiently and reducing work load by reusing old query results. Even though Vitess still has scope for improvements, it is quite well-thought-out, says YouTube architect Sugo Sougoumarane.
This article shows how managing huge amount of data can be so much problematic. As community keeps getting bigger and bigger, YouTube had to come up with solutions to cope up with such drastic change. It also shows how a new programming language, such as Go, can be so much effective, that 105 line routine code that periodically trims log files, couldn’t have been written in as few lines by using C programming language. The first beta for Go was launched in March this year and it seems very promising. It might provide the edge that other languages were lacking while being used in large-environments.
It seems that not only YouTube, but also other data giants such as Facebook and Twitter are also facing similar problems. Over the last decade, so much data has been created and it is expected to grow at even higher rate. Storing all this data and managing them relevantly is the biggest challenge. A data not well managed loses it meaning and information. Most widely used MySQL is not a complete solution anymore. Just like YouTube, others will have to find a way to face this obstacle.
Name: Syed Pallab
Graduation Year: 2014
Degree: BA Computer Science