Sherif's Tech Blog

Just another guy on the Internet with a keyboard…

Why Is the Global State Bad?

It’s very easy to add global states and very difficult to test in them. This pretty much reflects on why the global state is a bad idea in almost any language. However, in the spirit of being focused on PHP we should make a few things clear that are specific to PHP. In PHP functions, by default, do not retain the global scope. There is a very good reason for this and that’s to avoid adding to the global state by accident. Unlike the javascript world where, by default, everything is attached to the global state of the window the script was compiled in PHP actually attaches the state of a function outside of its global context. In order to take variables from the global scope within a function, in PHP, you have to specifically declare the variable using the global keyword. In javascript in order to separate the variable from the global state you have to declare it using the var keyword, which is the exact opposite of the behavior in PHP. However, I’m not going to try drawing comparisons between different languages. Instead let’s just focus on why this is bad in PHP and then you can draw your own similarities to why it’s bad in virtually every other language you may use.

In PHP we have something called superglobals which are still made available within any scope even though they may not have been declared as global in user-land. The superglobals are things like $_POST, $_GET, $_FILES, $_COOKIE, $_REQUEST, $_SERVER, $_SESSION, etc…

Now, why are these overriding the rules of separating function scope from the global scope, you might ask? The reason is simply because they make using the specific web features of PHP easier. The variables themselves are not normally populated in user-land, but actually populated by PHP using information obtained from the SAPI/webserver, in this case. It’s true that you can certainly write to these variables. In the case of using sessions it makes sense that I would want function foo() { $_SESSION[‘var’] = ‘bar’; } to be available in any scope. After all I don’t expect the garbage collector to cleanup my session if I move through different scopes unless I specifically write-close the entire session file. You just don’t ever expect these variables to be different in any scope unless you specifically change them and so they’ve take on the superglobal concept that we have in PHP. So then does this mean they are bad? Not necessarily. Now, keep in mind PHP also provides a feature called register_globals which has been deprecated as of PHP 5.3.X and is highly discouraged. Please see the manual on register globals for more details. Basically, this directive makes PHP code a nightmare as it injects all sorts of variables into your script (and you would have no way of knowing what the variable names will be). It’s much safer to rely on superglobals and this behavior is now deprecated so it would never be on by default in latest releases (expect it’s removal some time soon).

Let’s start with the simplest concept…

Hello World Scripts are Boring!
This is true, but since almost any idiot’s guide to programming (at one point) started by demonstrating how to write your first program in the form of print “hello world!” we’re going to stoop down to this level for a moment to understand something. For those of us who are a little more advance just bare with me, but for those of us who are a little newer to programmer or to PHP this might be easier to relate to.

	echo 'hello world'; // Yay we can haz PHP!
Let’s say I just installed PHP and will write my very first PHP script ever! I want to write a hello world script. So I open a file called helloworld.php and put the following code in the file. I then run the file from my command line to test it out and make sure it works… Guess, what? It works! But, it’s also very boring. So I want to be able to make it say hello <my name> instead.

Now, I modify the same script and by passing my name in as a command line argument when calling the script with ‘php helloworld.php GoogleGuy’ I get to see my terminal greeting me… yay!

	$name = $argv[1]; // We're just going to take the first command line argument passed to the script and use it as a name
	echo "hello $name"; // Even more PHP awesomeness!
OK, but now I get a little more advanced and start writing longer scripts. At one point I find I am simply repeating the echo “hello $name” code in a lot of places. After learning about functions I decide to write a function so that I can reuse this code.

	$name = $argv[1];
	function greeting() {
		return "hello $name";
	}
	echo greeting(); // Outputs 'hello ' DUH! Uhhh, wait... what happened to $name?

Now, we begin to see our first problem. The variable I’m using was actually defined in the global scope and since it’s easy to add to the global scope I had no problem repeating my code everywhere. Quickly I realize the fix is to just pass the variable along to the function so that it can reference it.

	function greeting($name) {
		return "hello $name";
	}
	echo greeting($argv[1]); // Ahhh now we're getting somewhere!

Well, why didn’t we just declare $name as global inside the function? The reason is because I’m actually putting this function in a different file called greetings.php and including it in my actual script foo.php where I will run lots of other code. Now I want to test that it works before I actually include it and simply make calls directly to the function from the global scope. To do this I have to pass something to the function. If I just test it with echo greeting(‘GoogleGuy’); it will still work. If I use echo greeting($name) and $name is defined as $argv[1] then that works too! Now, what if I decide to use this script in a webSAPI instead of CLI? All I really have to do is change the variable used in the function call to $_POST[‘name’], for example. Or I can define $name as $_POST[‘name’] and call greeting($name) each time. The point is I don’t have to worry about which variable I declared from the global scope in my function. This would require me to go back and look at the function definition any time I make changes to the code using this function. What’s even worse is in order to test it I specifically have to make sure that this variable is never used anywhere else in the global scope. So now, the order in which I write my code has an impact on the behavior of my function and as I keep adding to the global state it isn’t clear how my function is being used unless I go back and review the function definition.

So there are two things to take away from this:

  • Easy to add to the global state, which can cause unexpected behavior of my functions
  • Difficult to test, because calling the same function with the same arguments doesn’t necessarily produce the same result (when we’re relying on the global scope)

Now, this is all very apparent and makes one think “well duh, of course I wouldn’t do that in my code,” but in all reality I meet programmers all the time that understand the problems of relying on the global state yet still insist on using singletons in their code. This is the – not so apparent part. Singletons aren’t always bad just like superglobals aren’t bad in PHP, but what you have to recognize is that they do rely on the global state and that could lead to testability problems, unreadable code, and lengthy debugging. These things are pretty interconnected so having one of these problems extroplates the other.

Remember that the garbage collector doesn’t come around to clean up the global scope until after the entire script exits. This is where all your resources, variables, objects, streams, etc are closed and removed from memory by the garbage collector. If you still work with mysql_* functions it’s common to see people simply declaring the mysql connection resource as global within their functions rather than passing the resource to their functions/methods. This isn’t a good practice to get in the habit of. When you get into OOP you learn more about dependency injection and how that makes instantiating your objects easier and more testable, because when you do unit testing of your individual methods you don’t want to spend a lot of time figuring out what your objects rely on from the global scope. In fact, if your objects rely on anything in the global scope their methods will be much more difficult to unit-test and the objects will be painfully difficult to instantiate. If you’re using singletons to work-around this problem and avoid having to figure out scope/path resolution it’s only going to come back and haunt you during testing.

One of the blogs I enjoy reading the most on and around the topic of clean code and avoiding mistakes like relying on the global state is Misko Hevery’s blog who is a coach at Google and has worked for some pretty big players. He does a lot of talks about clean code and you can find a lot of his vidoes on youtube and on his blog.

Category: PHP

Your email address will not be published. Required fields are marked *

*