Sherif's Tech Blog

Just another guy on the Internet with a keyboard…

Why You Need a Database

There are a lot of developers that start off building their applications with the notion that a database is only necessary if they have a lot of data to work with or that the data they have will be easier to manage if they can avoid the complexities of building and maintaining a database or dealing with a DBMS (Database Management System). In the area of web-based development, this is rarely the case. The reason for this is that web-based applications tend to grow very rapidly. This is easy, because there are billions of people with access to the Internet and virtually anyone with access to the Internet usually gains such access from a web-enabled device. Having access to the Internet has become synonymous with having access to the world-wide web. Since the number of potential users is so huge the potential for data is equally huge. Not only that, but beyond the sheer amount of data that maybe collected from users of the application software and stored for use by the system there is the factor of maintainability. Databases make organizing and maintaining long-term data easier. This comes in several forms. Without a database solution you have to worry about concurrency issues for replication. You would also have to consider race conditions, access time, permissions, and scalability among others.

Databases Are Overkill

For those who start off building small web-based applications or even trying to put together a tiny CMS (Content Management Systems) they sometimes fall victim to the illusion that having a very small amount data would mean that building a database for this data would be overkill. This is simply not true anymore. Today databases are easier than ever to build, grow, and manage. With lite-weight solutions like SQLite you actually improve on performance with small amounts of data and make it easier to manage. SQLite is actually a small foot-print library written in C that implements an embedded DBMS. It’s only a few hundred KB in size and implements most of the SQL standard. You can use it to store databases in memory or on disk and still get the full benefits that relational databases offer with a minimalistic foot-print and without compromising on performance for small data sets. It’s adopted by PHP, Python, Perl, Ruby and even Javascript as well as many other languages. So there really is no excuse to avoid using a database when the solution is widely available in so many popular platforms and especially in web development.

Databases Are Slow

This could not be farther from the truth. A relational database can maintain indexing for records across different tables. This means rather than looking through the entirety of the data set and then trying to expose some underlying structure in order to find a particular set of data the relational database takes advantage of composing structures as you build your data sets. These structures make things like fetching a record with a primary key much much faster than you would get by using a flat-file solution.

Lets examine the alternatives. Even if you had a very small amount of data – say just a few hundred lines of text. Even if the data structure was overly simplistic – we’ll assume each line represents what would be a single row in a database table. Even if the data will only ever be maintained by a single developer – you. You are still overlooking so many problems that are not easily solved by using a flat-file to maintain this data. First lets consider the race-condition. You have a script that opens a specific file on the server and appends a new line each time a record is added. The script can also open the file for reading and retrieve the entire contents of the file into memory. The script can then do any necessary sorting and filtering to return the required data sets to the user. The most apparent problem with this approach is the race condition. It is entirely plausible that two requests could be made simultaneously to the same script – one to open the file for writing and append a record and one to open the file for reading and retrieve the data. If the data is stored into memory before the line is appended the result is stale and potentially corrupt. If the new data is appended before the read, no problem. However, what happens when you want to delete a record. Now the problem is three-fold. If three individual requests all come in at the same time – one to read, one to write, and one to delete a record – it is now likely the case that your entire data structure has been corrupted. Remember that HTTP is built on a request-response model and no two requests are treated as if they are tied to any previous requests. So there’s no central point of control over your script’s ability to manage which process can access the data and to what extent.

In a DBMS, on the other hand, the control is transferred away from the script and to the central management system of the database. The DBMS then gets to decide how requests will be served and the order of treating the data. This creates more dependable data that has a far lesser chance of corruptibility. Now, it’s entirely possible that you may not be concerned the integrity of your data for a small application, but then you might as well not waste your time building it.

I’ll Use A File Now And Learn To Use A Database Later

If you’ve said this phrase it’s already too late. It doesn’t take a lot of time to get started with a database in the first place. If you’re using languages like PHP, Python, Perl, or Ruby you probably already have the necessary libraries installed on your system to work with a database. These libraries and drives are usually packaged with these software stacks as standard. It’s actually uncommon to not have some DBMS solution already available in most of these environments. So why would go out of your way to reinvent the wheel when the solution is already at your fingertips? Not only that, but it takes very little time to set these DBMS solutions up and get them to run smoothly on virtually any platform. You will probably spend more time trying to write a script that stores, retrieves, sorts, filters, locks, and validates data using a flat file then you would installing the DBMS and getting a simple schema started.

If you’re using PHP interfacing with a database has become easier than ever. It only requires a couple of lines of code to open a database connection to virtually any database you have a PDO driver installed and loaded for in PHP. So whether you’re using SQLite, MySQL, PgSQL, etc… you shouldn’t need to spend a lot of time learning how to interface with each of these databases if you simply stick with the PDO extension. You use the same functions regardless of the database. This is opposed to having to learn the individual database-specific extensions in PHP to interface with each of those databases. Not to mention PDO supports many of the popular new database features such as prepared statements and is a lot easier to learn and use than extensions like MySQLi.

PHP and Databases

Being a PHP developer, I also take notice of many PHP developers that tend to have the misconception that when they start using a database (usually it’s the case that their first database is MySQL) they should start by learning the old mysql extension in PHP. This is simply not true. Some of the underlying reasons this is widespread, however, is mainly due to the fact that the old mysql extension has been around for quite a long time in PHP and it’s fairly common to see a lot of PHP code demonstrating the use of a database in PHP using this extension. It’s also become familiar to a lot of old PHP developers and is bound to be present in their older applications. However, the use of the old mysql extension is highly discouraged for new development. It’s an old extension that’s no longer well-maintained and has been planned for deprecation for years. There’s no guarantee that if a new bug creeps up that someone will go back and fix it. This leaves your application vulnerable and exposed. If the code base gets large enough this might leave developers scrambling for migration path. Additionally, the extension does not support prepared statements or parameterized queries. This makes things like making sure you properly escape user data to avoid SQL injection, prone to error. The extension lacks in many areas that are not conducive to future development. Learning the old mysql extension first before you learn the new improved mysql extension in PHP or before you learn PDO will gain you nothing. In fact, it will cause you to have to unlearn some of the very poor design of the old extension and its implementation details in order to become more accustomed to the newer extensions.

Some developers also complain that PDO seems too complicated or more difficult to use than the old mysql extension. This might come from the lack of understanding as to what PDO even is or how its used. Since PDO can only be used with the newer OOP features of PHP (you have to use objects and methods instead of procedural-style functions) it makes it seem unapproachable or even scary to developers who aren’t used to OOP in PHP. There is also the idea that PDO has a lot more features due to its vendor-agnosticism and the fact that requires further configurations such as installing and loading the individual drivers needed for interfacing with your specific database (where the drivers aren’t already packaged or loaded). I can understand the intimidation, but most of this has been alleviated with new versions of PHP coming pre-packaged and loaded with most of the popular drivers and the documentation offering up examples that are now easier to follow and get started with. Most of the intimidation is actually coming from having to unlearn old habits that older extensions like the old mysql extension once taught.

Once you get past the initial intimidation phase and actually get started with a PDO and with a database you’ll find that it doesn’t take nearly as much time as you’d think to get up and running. Most of the reservations people have are 90% of what’s holding them back. Not that the investment to get started is actually that significant. Beyond that you’ll find that learning to build on data normalization not only make development easier, but makes your users happier. When you can organize and maintain data that’s more clearly structured and accessible you can serve your users more effectively and efficiently. That will keep users coming back and eventually help you grow your application!

Category: MySQL, PHP

Your email address will not be published. Required fields are marked *

*