A Closer Look Into PHP Arrays: What You Don’t See

Inside PHP Arrays

Inside PHP Arrays
PHP is one unique language where the array data type has been highly generalized to suit a very broad set of use cases. For example, in PHP you can use an array to create both ordered lists as well as dicts (key/value pairs or maps) with a single data type. A PHP array isn’t an array in the traditional sense, but in fact it’s actually implemented as an ordered hashmap. There are good reasons for this. One of those reasons is that arrays traditionally do not allow you to mix types. They also don’t normally provide a simple means of random access such as mapping a key to it’s value. At least not in the sense that we’re used to doing in PHP. So I’m going to share with you some of the underlying details of how the PHP array data type works, why it works the way that it does, how it’s different from other languages, and what behaviors the PHP array has that you may not be fully aware of.

To start off with a basic example: you can do the following in PHP…

$array[12] = 1;
$array[1] = 2;
$array[17] = 3;

foreach ($array as $num)
  echo "$num\n";

This outputs the following…

1
2
3

As you can see, despite the numbering of the keys in our array, the elements of the array remain in the same order we defined them.

You can’t do the same thing in a language like Python.

array = []
array[12] = 1

We would get an index error…

IndexError: list assignment index out of range

You also can’t do this in a language like C, for example, because in those languages arrays are not made up of keys, but offsets. These offsets are serial. So you can not have an array of three elements that start with an offset of 12, followed by 1, and end with 17. In PHP, however, these are not offsets at all. They are, instead, referred to as keys. They map to a value and the keys themselves do not depict order (as opposed to offsets, which do conform to order).

So What Are Arrays Exactly?

In order to eloborate on some of the internal workings of a PHP array we’ll first need to get a general understanding of what arrays really are and how they’re seen on a very low-level. I’ll use C arrays to demonstrate this general understanding of what arrays are and what they look like.

In C an array is quite simple. It’s just a designated block of memory that is divided up equally into pieces where each piece must represent a primitive data type. This is sometimes referred to as chunk memory. So for example, in C an int is a primitive data type that may represent 4 or more bytes. That means in order for you to store one integer variable you would need at least that many bytes. If you wanted to store an array of 4 integers using a single variable you would need an integer array of size 4. This means we would normally get a block of memory that’s 4 * 4 bytes wide (16 bytes total), where the variable then becomes a pointer to the first integer in our array. If you don’t know what a pointer is don’t worry. It’s not incredibly important for the purposes of our discussion, but think of a pointer as something that keeps track of which memory address we need to go to in order to find the data we’re looking for. Keep in mind that memory is divided up into pieces and assigned addresses just like a neighborhood is divided up into blocks and each home is given an address (a street and a number). The same thing happens with our memory in a computer.

Now, we can access each integer in our integer array using an offset where the first integer sits at offset 0 and the last integer sits at offset 3. The way this works is that the offset is basically multiplied by the size of the array type (in our case that’s 4 because an integer is made up of 4 bytes) and then added to the value of the address assigned to our variable (the pointer to the first element in the array). This allows us to seek to any integer in the array simply by calling the variable with it’s designated offset in order to dereference the value we need from the block of memory where all the integers are stored.

Here’s an example of this in C.

#include <stdio.h>

int main() {
  /* This initializes an integer array of size 4 */
  int array[4] = { 1, 2, 3, 4 };
  printf("%d\n",array[0]); /* Prints 1 */
  printf("%d\n",array[3]); /* Prints 4 */
  return 0;
}

Here the variable array is first declared with a size of 4 and then initialized with 4 different integers. Each integer is stored at it’s designated offset in order, starting from offset 0 all the way through offset 3. So if we picture this array as one contiguous block of memory that starts at address 0xd4e3c8f and ends at address 0xd4e3c9f then we can say that the variable array is a pointer to the address 0xd4e3c8f, which is the first element in our array. That means in order for us to get array[0] we would do (0xd4e3c8f + (0 * 4)), which is really just (0xd4e3c8f + (0)). To get the second element in our array we do array[1] which is similar to (0xd4e3c8f + (1 * 4)), which equals 0xd4e3c93 and that’s the second integer in our array.

C Array Structure

The above diagram illustrates what the array would look like in memory. Where the individual blocks (in purple) depict the bytes in memory with their designated starting addresses, and the offset depicts where each integer is stored (in blue). So as you can see the entire 16 byte block of memory is evenly divided up into 4 bytes, each signifying our 4 integers and now it’s really simple to understand this array.

How Are PHP Arrays Different?

PHP arrays are very different from this simplistic concept we examined above. They are far more complex than just a contiguous block of memory that stores a single data type. PHP arrays map a scalar key value to any of PHP’s primitive data types. They also maintain order. Additionally, they use a hash in order to provide random access to their elements by corresponding keys. This makes them ordered hashmaps. Let’s see exactly what that means if you’re not familiar with hashmaps in general.

A PHP array is first made up of a hashtable. That hashtable is simply a container of information about the array. To put it more precisely, it is a C struct that tells us the array size, the first element of the array, the last element of the array, the internal array pointer’s position, and the next free element in the array (along with some other internal meta data we won’t get into). The hashtable also stores an address in memory to the array of buckets that belong to the array. A bucket is another container that stores information about each element in the array including it’s key, which value that key maps to, and some other internal metadata such as the hashed value of the key, and if there are any other elements that share the same hashed key value. The value that any key points to is made up of a ZVAL, which is yet another container for a PHP variable. That container stores the necessary metadata that tells PHP where to find the value we need for that variable. So as you can see we’ve already peeled away at least three layers of the PHP array. It’s quite a complex beast and it takes on a lot of overhead. PHP arrays sacrifice memory for speed, however. You can read about how big PHP arrays are on nikic’s blog where he does a fine job of revealing all the gory details.

Differences From Other Languages

The main difference is that PHP defines arrays in a way that makes them generalized enough to suit all the major use cases instead of having multiple types. For example, in Python you have arrays and you also have dicts. Not to mention you also have tuples on top of that, which a lot of people will couple with a list in order to achieve something similar to what a PHP array represents. PHP, however, only has a single type called array that behaves more like a dict would in Python, but still shares many of the characteristics of arrays in other languages. It’s a hybrid of both, really.

To give you an example, in Python, we still don’t quite get the same behavior as we did in the PHP example earlier if we used a dict.

dict = {}
dict[12] = 1
dict[1] = 2
dict[17] = 3

for key in dict:
  print dict[key]

As you can see the results aren’t typical…

2
1
3

The order of elements in a dict — acording to CPython spec — is not guaranteed. It may be assorted or it may be ordered. Often people will use a list of tuples in Python to maintain order and while that may result in a similar effect to what a PHP array can do — it’s still not quite the same.

How Do PHP Arrays Work?

To give you an idea of just how PHP arrays work the way they do let’s explore a very simple example in PHP and then we can break down exactly what’s going on internally that makes this possible.

$array = array(
               4     => 1,
               'foo' => 'bar',
               -16   => true,
                        'baz'
              );
echo $array[-16]; // prints 1

Here, we’ve initialized an array of 4 elements. We have 1 integer, 2 strings, and 1 boolean element in the array. Every element in a PHP array is associative. There is no such thing as chunk memory in PHP arrays. So that means every element has a key whether we assign it one or not. Notice we only assigned keys to 3 of our elements, yet if we look at the output of var_dump($array) here we will see that all 4 elements indeed have a key.

var_dump($array);
array(4) {
  [4]=>
  int(1)
  ["foo"]=>
  string(3) "bar"
  [-16]=>
  bool(true)
  [5]=>
  string(3) "baz"
}

Notice that the last element has a key of 5 even though we never assigned it one in the initialization of the array. Why did PHP chose 5 and not any other number? The answer lies in the hashtable!

/* Lines 66 - 82 of Zend/zend_hash.h */
typedef struct _hashtable {
    uint nTableSize;
    uint nTableMask;
    uint nNumOfElements;
    ulong nNextFreeElement;
    Bucket *pInternalPointer;   /* Used for element traversal */
    Bucket *pListHead;
    Bucket *pListTail;
    Bucket **arBuckets;
    dtor_func_t pDestructor;
    zend_bool persistent;
    unsigned char nApplyCount;
    zend_bool bApplyProtection;
#if ZEND_DEBUG
    int inconsistent;
#endif
} HashTable;

Take notice of line 6 in the above code. The nNextFreeElement member of this struct stores a unsigned long containing the next integer value to use when we append to this array. It starts at 0 and only gets modified whenever we append a new element to the array using a positive integer value. We assigned the first element in our array with the integer key 4. At the time that we did this the nNextFreeElement member of the HashTable struct was modified to 4 + 1, giving us a new next free element of 5. So the next time we append another element to this array without supplying a key PHP uses it as the next key for this new element and increments by one again. That way we should always have a new unique key ready for any elements we append to our array.

The PHP Array Structure

Here is a graph illustrating what this PHP array (from the example above) would look like internally to PHP.

PHP Array Structure

As you can see this is quite a complex structure despite our data appearing very superficial (just by looking at our PHP array). There’s also a lot here that I intentionally left out for simplicity. However, this is also what makes PHP’s arrays very resilient. We just mixed both numeric and string keys along with strings, ints, and bools, all in the same array and with remarkable ease. To do the same in a language like C, on the other hand, you would have to apply quite a bit more effort than the simple statement we used to initialize our array here in PHP.

Inspite of this remarkable ease in which PHP arrays make compound data structures a breeze, there is an inherent flaw in their design. Not to fret though. It’s a flaw that comes with a trade-off. If you notice from the graph above we have 4 elements in our array, but the C Bucket Array, which is the chunk memory array we described in the very first part of this article, only contains two elements and the rest are empty. Notice that the two elements are Bucket1 and Bucket3 and that they do not begin from offset 0 of our array. This is a result of hash collision, which is remnant of every hashing function.

The collisions means that when we attempted to hash two or more of the keys in our PHP array, they ended up resulting in the same hash. Because of this collision we end up with two or more buckets stored in the same place in our C Bucket Array (in orange). The buckets (in purple) then become a doubly linked list. Notice that Bucket1 has a Next member that points to Bucket2. Inversely, Bucket2 has a Last member that points back to Bucket1. So when a key in our PHP array produces a hash collision we simply traverse the doubly linked list of buckets until we find the key that matches the one we’re looking for. See that each of the buckets have a Key member that stores the actual key we used in our PHP array. Believe it or not these collisions happen quite frequently and the smaller the array the more likely the possibility a collision will occur. These collisions have an adverse performance impact since it causes PHP to traverse the linked list of buckets in order to find the specific bucket we need each time. That means the cost could be as great as   ( ( n – 1 ) * ( n – 2 ) / 2 )   or less.

It is entirely possible to have 100% hash collision in a PHP array and it’s a lot simpler than you think. PHP sees array keys as either one of two things. Either it’s an int or it’s a string. If it’s an int producing 100% collision is a rather trivial task. You simply take the size of the array to the nearest power of 2 and produce keys that increment in multiples of that size until you’ve filled the size of the array. At this stage you have 100% hash collision, meaning you’ve exhausted the above cost, which is the worst possible scenario. To give you an idea hashing a ~65K (2^16) element array with 100% collision can take up to ~30 seconds in PHP (that’s a potential DDoS vector).

If you’re curious about what hashing function PHP uses to hash array keys, it’s DJBX33A and it’s not just used for arrays. It’s actually used everywhere throughout PHP. This is a very simple hash that was used because it’s fast. It’s not a cryptographically secure hash and it was never meant to be. If we were to write an implementation of this hashing function in PHP it would look similar to the following…

/* DJBX33A Hash function implemented in PHP */
function DJBX33A($key) {
  $hash = 5381;
  if (is_int($key)) {
    $key = pack("I*", $key);
  }
  for ($i = 0, $c = strlen($key); $i < $c; $i++) {
    $hash = (($hash << 5) + $hash) + ord($key[$i]);
  }
  return $hash;
}

So if we used the above function to get the hash for each of the keys we used in our PHP array earlier we’d see they come out to the same Hash numbers in our Buckets in the graph above. So the question is, how do we find these buckets in our C Bucket Array?

The answer is by using what’s called a hash table mask. The mask is simply the size of the hash table minus one. Every PHP array starts off at 8 elements and doubles every time the number of elements exceed the size of the array. So in our example our mask is 7. We simply take our Hash produced from the DJBX33A hash function we demonstrated above and apply a bitwise AND of the mask to get its offset in the C Bucket Array. You could also just use the Hash MOD the size of the array, but a bitwise operator will be much faster than a modulus, which is the reason we use it.

So for example, here’s how we got the offset for the key -16 in our PHP array.

echo DJBX33A(-16) & 7;

We get…

2

And there you have it! Now we can get the pointer to the first bucket at offset 2 of our C Bucket Array, which points to Bucket3. Once we get to Bucket3 we simply verify the Key member of the bucket to make sure it’s the element we’re looking for and if it’s not we check the Next member to get the next bucket and keep going until we find the element we need. In our case Bucket3 does indeed have a Key member of -16, which is exactly what we want.

Iterating Arrays

There are quite a few misconceptions when it comes to traversing a PHP array and what is or is not the fastest/slowest or most efficient means of traversal. I’m going to do my very best to help debunk any myths or misnomers you may have heard about such processes. The key thing to remember here is that for the majority of use cases no micro-optimizations are necessary since most of the time each of the methods described here should work just fine for the bulk of the PHP user base.

First, I’d like to start by debunking the myth that a foreach loop is faster than using a for loop, once-and-for-all. If you want the tl;dr version it’s that for loops are faster than foreach loops in every scenario. However, this does not mean that one should chose a for loop over a foreach loop to iterate arrays strictly based on the performance factor. Let’s examine the details a bit more closely to understand why.

Here I ran a bench mark against both a foreach loop and a for loop using the same array on both PHP 5.3 and 5.4 release branches. The first row shows a test where all we did was iterate over the array with no statements in the body of the loop. The second tests shows what happens when we make modifications to the array from within each loop.

PHP 5.3 PHP 5.4
foreach loop for loop foreach loop for loop
0.025086 seconds 0.012185 seconds 0.007306 seconds 0.004201 seconds
0.139499 seconds 0.027206 seconds 0.048462 seconds 0.011421 seconds

Here the tests were conducted on an array of 100,000 elements. After redacting the first and last test samples the times were averaged out over the number of tests run. All tests done were on release branches PHP 5.3.10 and PHP 5.4.5, respectively.

In the first test sample foreach doesn’t seem to be too far behind. In the second test sample we can start to notice some bigger losses. Since in the foreach loop we are accessing the array directly by key, we have no significant factors affecting performance. So you would think that this means they should — at the very least — both perform equally. However, the actuall performance loss in the foreach scenario has nothing to do with how we access the array for modification, but more to do with the fact that the foreach construct works with a copy of the array and not the original array. So this means we invoke COW (Copy On Write) behavior in the scenario where we write to the array in the foreach loop.

If you aren’t familiar with Copy-On-Write behavior let me give you a brief demonstration of how it works. Copy-On-Write just means the PHP runtime engine makes optimization on our behalf in order to conserve as much memory as possible. Take the following example…

$str1 = "Hello World";
$str2 = $str1;

Here PHP only makes one copy of the string “Hello World” even though two different variables are using the same value. This saves us some memory since the engine is already smart enough to do the right thing. Now what happens when we modify $str2?

$str2 .= "!";

Now PHP realizes we must break the refcount in order to make the string $str2 uses different from the one $str1 uses. This causes PHP to only copy the string when we have actually written to the variable. Hence Copy On Write!

Getting back to our bench mark, however, since the array we’re looping over is pretty big this means we’re going to be copying over a lot of memory. It’s this copying of memory that actually causes the performance loss. Now, you might wonder — so then why is the foreach loop slower when we’re not modifying the array? — and I’m going to address that question, in full detail, a little further ahead.

This is what the foreach test looks like when we attempt to modify the array inside the loop.

$array = range(1,100000);
$start = microtime(true);
foreach ($array as $key => $value) {
  $array[$key] += 1; // Invokes COW
}
$end = microtime(true);
$time = $end - $start;
printf("Completed in %.6f seconds\n", $time);

This is what the for-loop test looks like when we attempt to modify the array inside the loop.

$array = range(1,100000);
$start = microtime(true);
for ($i = 0, $c = count($array); $i < $c; $i++) {
  $array[$i] += 1;
}
$end = microtime(true);
$time = $end - $start;
printf("Completed in %.6f seconds\n", $time);

So, one solution to resolve this performance problem with foreach loops, where you want to make modifications to the array from inside the loop — and since they are actually more convenient to use in most cases — is to use a reference. References also solves the double memory problem. Since we invoke cow when we break the refcount of the ZVAL that means we’ve now doubled the amount of memory necessary to iterate over the loop. With a reference there is hardly any extra memory in use.

The following chart demonstrates the differences in memory consumption between using a foreach loop where copy-on-write is invoked, a foreach loop where references are used instead, and a for loop where the array is modified directly. All tests were done using 1,000 element arrays.

$array = range(1,100000);
$start = microtime(true);
foreach ($array as $key => &$value) {
  $value += 1;
}
$end = microtime(true);
unset($value); // make sure you destroy the reference
$time = $end - $start;
printf("Completed in %.6f seconds\n", $time);

Pay special attention to line 7 where we unset the variable we used as a reference in order to destroy the reference. Otherwise, you could fall into some unexpected behavior if you continue to use the same variable later on.

Here are the results of the bench mark using a foreach loop with a reference in order to modify the array during the loop. Compare that to the use of a for-loop to do the same thing and they are actually quite comparable this time. In the tests where foreach invokes COW behavior we see a performance difference of up to 400% in PHP 5.4 and more than 500% in PHP 5.3 (there have been significant optimizations in the PHP engine since 5.4 that account for these dramatic increases in performance). Here we’ve closed the gap quite a bit and can hardly see any real performance differences.

PHP 5.3 PHP 5.4
foreach loop for loop foreach loop for loop
0.034114 seconds 0.027206 seconds 0.011769 seconds 0.011421 seconds

Note that by using a for loop to iterate over the array, where you do make modifications to the array from inside the loop, you stand to seriously break your loop if you happen to append/remove elements from the array. However, with foreach you do not have the same problem since you’re only iterating over a copy. Any modifications made to the original array do not affect your copy and the loop remains in tact. This key difference may actually make a foreach loop far more desirable to a developer in most given scenarios than for loops despite any performance differences that may or may not arise. Also, consider that unless you’re working with incredibly enormous arrays, the performance gains are hardly worth the extra code given what we’ve seen from these bench marks. In my own personal opinion, I find foreach loops afford you so much more convenience in many scenarios.

For example, a foreach loop automatically resets the internal array pointer for you before it begins iteration. This ensures that we always start at the beginning of the loop. It also stores a separate copy of the internal pointer in order to prevent you from breaking the loop by moving the pointer yourself with calls to next(), prev(), or reset(), for example…

$array = array(1,2,3,4);
echo 'key($array): ' . key($array) . "\n";
/* let's move the pointer */
echo 'next($array): ' . next($array) . "\n";

foreach ($array as $value) { // foreach reset it for us
  echo "$value\n";
  /* notice foreach doesn't care about this pointer */
  if (!next($array)) reset($array); // lets keep moving the pointer
  echo 'key($array): ' . key($array) . "\n";
}
echo 'key($array): ' . key($array) . "\n";

We get…

key($array): 0
next($array): 2
1
key($array): 2
2
key($array): 3
3
key($array): 0
4
key($array): 1
key($array): 1

Notice the foreach loop continues to work just fine.

$array = array(1,2,3,4);
foreach ($array as $value) {
  /* This should be pretty obvious */
  unset($array);
  echo "$value\n";
  var_dump(isset($array));
}

Look Ma’ no arrays!

1
bool(false)
2
bool(false)
3
bool(false)
4
bool(false)

A Low-Level Analysis of foreach

So, I promised to address why foreach loops were still slightly slower than for loops even when we didn’t make any modifications to the array (invoking COW). The answer lies in the guts of the PHP engine. It reveals itself to us when we look at the opcodes generated by the foreach construct that allow us to iterate over the array.

Your PHP script is run in two phases. The first phase is the parsing phase, where the interpreter reads, tokenizes, and then lexes your PHP code. The second phase is the compilation and execution phase where the interpreter compiles your PHP code down into bytecodes (called opcodes) and then executes them. During the execution phase you can run a hook into the Zend engine and ask it to give you the opcodes as they are generated/executed.

Here’s the code we used…

<?php
$array = array(1,2,3,4);
foreach ($array as $key => $value) {
  echo "$key => $value\n";
}
?>

Here’s what the opcodes for the foreach loop would look like…

Line # OPCODE Return
2 0 INIT_ARRAY IS_TMP_VAR ~0
1 ADD_ARRAY_ELEMENT IS_TMP_VAR ~0
2 ADD_ARRAY_ELEMENT IS_TMP_VAR ~0
3 ADD_ARRAY_ELEMENT IS_TMP_VAR ~0
4 ASSIGN
3 5 FE_RESET IS_VAR $2
6 FE_FETCH IS_VAR $3
7 ZEND_OP_DATA IS_TMP_VAR ~5
8 ASSIGN
9 ASSIGN
4 10 ADD_VAR IS_TMP_VAR ~7
11 ADD_STRING IS_TMP_VAR ~7
12 ADD_VAR IS_TMP_VAR ~7
13 ADD_CHAR IS_TMP_VAR ~7
14 ECHO
5 15 JMP
16 SWITCH_FREE IS_UNUSED
6 17 RETURN

The key to the extra performance loss in foreach is what’s happening with opcodes 6 – 9 in the above table, which all take place in the foreach construct upon every iteration. PHP has to go and fetch the data from the iterator and then assign it to the variable in our construct with every pass. That means we’re doing this 100,000 times in this loop. Those are hundreds of thousands of extra opcodes that wouldn’t happen in our for loop. However, do not panic! PHP executes the opcodes very very fast. As you can see in our bench mark it only takes about 7 milliseconds — that’s (1 / 1000 * 7) seconds — to complete the entire loop. Granted, that’s still about ~3 milliseconds slower than our for loop test, but completely unnoticeable for you. If your PHP code really did have any serious performance problems this wouldn’t likely be a major one to focus on.

Arrays Within Arrays

Using multidimensional arrays is possible in PHP because, as we’ve seen earlier, an array is just a hashtable, right? Well, when I said I was simplifying in my diagram earlier, which demonstrated how the PHP array structure looked, I wasn’t lying. To give you an even more elaborate picture (yet I’m still simplifying) let’s take a look at a multidimensional array.

$array = array(
         4        => 1,
         'foo'    => 'bar',
         -16      => true,
                     'baz',
         'array2' => array(
                     "PHP",
                     "Arrays"
                     )
        );

var_dump($array);

Alright so all I did here was take our previous array and add another array to it. So then the above code would show us that we have a multidimensional array…

array(5) {
  [4]=>
  int(1)
  ["foo"]=>
  string(3) "bar"
  [-16]=>
  bool(true)
  [5]=>
  string(3) "baz"
  ["array2"]=>
  array(2) {
    [0]=>
    string(3) "PHP"
    [1]=>
    string(6) "Arrays"
  }
}

Now just imagine what this looks like when I present it to you in a diagram similar to our first array…

PHP Multidimensional Array

Here I’ve factored in how each portion of this array is broken up into different units of memory and how they are all related. As you can see we start at the very top with the variable that we just defined $array, which is in blue. The variable points to a ZVAL, which is in red (actually the variable name is compiled out into a hashtable that points to a ZVAL — but again I’m trying to simplify). The ZVAL points to a HashTable, which is in light-blue. The HashTable points to a BucketArray, which is in orange. The BucketArray allows us to get to a whole bunch of Buckets, which are in purple. The Buckets themselves point to other ZVALs. Notice that I singled out strings in green for the ZVALs that point to strings. The reason for this is because the memory for the string itself can be allocated separately from the ZVAL.

Here’s one thing you should notice right away by looking at this picture. You can’t make a connection to any element in the array $array["array2"] to get back to $array directly. The reason is that big light-blue HashTable that gets in your way. Remember the HashTable leads us to the BucketArray, which leads us to the Buckets, but there’s no way to get to the Buckets without the HashTable. This is also what makes arrays with references behave a little different than references anywhere else.

Take the following example where we create a reference to a string value and try to modify a copy of the variable in a function.

$string = "Hello World!";

function modify_string($string) {
  $string = "Hello PHP!";
}

$string_reference = &$string; // Creates a reference
modify_string($string_reference);
var_dump($string_reference);

As expected…

string(12) "Hello World!"

However, let’s see what happens when we try the same thing with an array.

$string = "Hello World!";

function modify_array(array $array) {
  $array[0] = "Hello PHP!";
}

$array[0] = &$string;
modify_array($array);
var_dump($array); // WTF?
array(1) {
  [0]=>
  &string(10) "Hello PHP!"
}

If you don’t understand what’s happening here I’ll refer you to the diagram below (perhaps that might give you a clearer picture of what’s going on). Remember that each element in the array is represented by a Bucket, that points to a ZVAL. Here this bucket just happens to point to a ZVAL that’s also being used by the variable $string (that’s what happens when we assign something by reference), that ultimately points to our string. So if we change one or the other ($array[0] or $string) we end up changing the same string since they both take us to the same ZVAL, which means there’s only one string. Now, as you can imagine this might seem weird, but I assure you it’s perfectly intended behavior. You might even wonder, how on earth we managed to break the local scope, but that’s the problem with references. They do not abide by scope. They follow the ZVAL no matter which scope it may be in.

PHP Array Using References

Note that here we do not use pass-by-reference, we do not return anything from the function (by reference or otherwise), and we do not use call-time-pass-by-reference either. None of those behaviors are at play here. What’s really happening is that the array makes it possible for references to travel with it even when it’s copied, which is something you might not have expected. To prove that this variable is indeed copied into the function’s local scope and not being passed by reference we can test the following code.

$string = "Hello World!";

function modify_array(array $array) {
  $array[0] = "Hello PHP!";
  $array[] = "This element only exists in the local scope";
  var_dump($array);
}

$array[0] = &$string;
modify_array($array);
var_dump($array);

As you can see the first array contains the new element we appended to the array inside the function’s local scope. But when we return from the function, the array in the global scope is left without this new element.

array(2) {
  [0]=>
  &string(10) "Hello PHP!"
  [1]=>
  string(43) "This element only exists in the local scope"
}
array(1) {
  [0]=>
  &string(10) "Hello PHP!"
}

So you are definitely making a copy of the array. It just so happens that when you copy the Bucket with the shared ZVAL, you end up at the same ZVAL for that one Bucket!

So now, you should be clever enough to know why this doesn’t work the other way around.

$string = "Hello World!";

function modify_array(array $array) {
  $array[0] = "Hello PHP!";
  $string = "This element only exists in the local scope";
  $array[] = &$string;
  var_dump($array);
}

$array[0] = &$string;
modify_array($array);
var_dump($array);
array(2) {
  [0]=>
  &string(10) "Hello PHP!"
  [1]=>
  &string(43) "This element only exists in the local scope"
}
array(1) {
  [0]=>
  &string(10) "Hello PHP!"
}

Remember it’s a copy of the array in the function’s local scope we modified. Not the array in the global scope. Don’t let the references confuse you.

No More Arrays!

If you feel like you’re either exasperated or really excited about PHP arrays now — then either way I’ve done my job!

:)

Perhaps next time I’ll introduce you to the innards of the PHP Object…

If you have any comments, questions, or suggestions about anything that I’ve explained here or what more you’d like me to discuss about PHP internals aspects of the array please feel free to leave me your comments below. If you have any ideas about what you’d like me to discuss in respect to PHP objects for next time do leave those as well and I will try to make my next post as informative and entertaining as possible.

17 Responses to“A Closer Look Into PHP Arrays: What You Don’t See”

  1. Hirvine
    October 29, 2012 at 3:23 pm #

    Awesome article. I knew about the struct and C-Arrays, but you’re images are brilliant. I’m not sure how long you took your time to write this down, but it’s much appreciated. Great post!

  2. October 29, 2012 at 6:15 pm #

    Incredible post, many thanks to you as I am just introducing myself to the C side of PHP and I don’t quite understand yet many of the new things I’ve seen :)

    Really looking forward to your next post of PHP objects!

    Greetings.

  3. michael stevens
    October 29, 2012 at 9:23 pm #

    Very informational, thanks!

  4. October 29, 2012 at 10:59 pm #

    nice posting. a complete discussion about array. but you can also discuss about php function of array.

  5. Kate
    November 6, 2012 at 8:15 am #

    Thanks for sharing the great news! It was included into a digest of the hottest and the most interesting PHP news: http://www.zfort.com/blog/php-digest-november-5-2012-zfort-group/

  6. Vladimir S.
    December 19, 2012 at 12:15 pm #

    Thank you for the article! It’s very interesting.

    But I didn’t understand, how PHP maintains elements’ order.
    In your example array keys are ordered in such a way, that their hash offsets appear to be sequential 1 – 1 – 2 – 2. And everything looks simple on the figure `PHP Array Structure`. However, if to change the keys’ order to be 4, -16, foo, 5 the figure won’t change, as I understand, but PHP will preserve the new order.
    Could you explain this magic? :)

  7. Valdimir S.
    December 25, 2012 at 7:00 am #

    Hello, again :)

    I’ve found an answer for my previous question, so I’m posting it here in case anyone else is interested in this too.

    Bucket structure is defined as follows:
    typedef struct bucket {
    ulong h;
    uint nKeyLength;
    void *pData;
    void *pDataPtr;
    struct bucket *pListNext;
    struct bucket *pListLast;
    struct bucket *pNext;
    struct bucket *pLast;
    const char *arKey;
    } Bucket;

    As you see there are two pairs of pointers: pListNext/pListLast and pNext/pLast. The former pair is one, that’s mentioned in this article to maintain relation between buckets under one record in hash table. And the later pair is the one, which helps to maintain array order and is used for iteration over the array.

  8. January 20, 2013 at 7:14 am #

    I am completely blown away, its rare to come across something this hyper-informative and well-written at the same time. This is a digital treasure, thank you!

  9. Martin Konecny
    January 23, 2013 at 8:59 pm #

    Awesome article , thanks!

  10. Aura Acosta
    January 31, 2013 at 12:17 pm #

    I have a question, how I can prevent a float value change decimals? is very important to preserve the exact value, eg

    [3] => Array
    (
    [0] => 23.00
    [stock_id] => 23.00
    [1] => HUEVO EL CALVARIO 23 KGR
    [description_item] => HUEVO EL CALVARIO 23 KGR
    [2] => 232.4
    [quantity] => 232.4
    [3] => 24.915232358003
    [unit_price] => 25
    [4] => 5790.3
    [tot_partida] => 5790.3
    [5] => A08
    [loc_code] => A08
    [6] => 2013-01-15
    [tran_date] => 2013-01-15
    [7] => 25
    [8] => 0
    [unit_tax] => 0
    [9] => 4
    [category_id] => 4
    [10] => HUEVO
    [description] => HUEVO
    [11] => kg
    [units] => kg
    [12] => 0
    [provision] => 0
    )

    I need these two are equal
    [3] => 24.915232358003
    [unit_price] => 25

    the difference greatly affects my result

    thanks

  11. Rahul A
    June 29, 2013 at 7:27 am #

    nice blog.awesome through this article i understood the strong of array….

  12. prabhakar n. rao
    October 11, 2013 at 5:45 am #

    Being in the software teaching, I really didn’t know the functioning of
    PHP array. I liked this blog and thanks lot,
    Please continue this yoman’s service, god bless you.

  13. sigmato
    October 17, 2013 at 7:57 am #

    PHP arrays are really complex in structure. This is really great article. Most of them really do not know these even they program a bit.

  14. Hieu Vo
    April 10, 2014 at 11:35 am #

    definitely awesome, I learned a lot of useful things from this article!

Pingbacks/Trackbacks

  1. Sherif Ramadan: A Closer Look Into PHP Arrays: What You Don’t See : Atom Wire - October 29, 2012

    [...] a new post Sherif Ramadan takes an in-depth look at PHP arrays and what happens behind the scenes when they’re put to [...]

  2. Bookmarks for October 29th | Chris’s Digital Detritus - October 29, 2012

    [...] A Closer Look Into PHP Arrays: What You Don’t See – This entry was posted in Web Bookmarks and tagged php, programming by chris. Bookmark the permalink. [...]

  3. Best-of-the-Web 11 | David Müller: Webarchitektur - November 3, 2012

    [...] A Closer Look Into PHP Arrays: What You Don’t See – PHP Arrays von allen Seiten beleuchtet mit ein paar Insights in die interne Struktur von PHP selbst. [...]

Leave a Reply

Your email address will not be published. Required fields are marked *

(Required)

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>