The “keep me signed in” Problem
When you visit most websites that require you to log in in order to access user-specific content, which requires some form of authentication, you are bound to stumble across the “remember me” feature during the authentication (or sign-in) process. This is usually seen as a checkbox along with the username/password fields next to the sign-in button on the authentication page.
If you check this box, to remain signed into the service or website you are authenticating to, then what happens is the server sends you a cookie back in the response header of the HTTP request, that provides some information which allows the server to re-authenticate you, if needed, upon future requests where your existing user session has expired. I’ve seen a lot of developers implement this feature incorrectly with PHP so I’m going to try to explain in detail how you should be doing it and what you should try to avoid. Here’s a hint for a common implementation mistake; this feature has nothing to do with your existing session. It should be an extending component of your authentication mechanism, but it should definitely not rely on the existing session since it’s quite possible this session can expire or be lost at any time and without warning. Here’s a more detailed explanation of how it should work.
The Sign-In Process
First, your users have to authenticate in order to gain access to a server-based session. If you don’t already know how to do this with PHP it can be as simple as validating that the user submitted their username/password credentials correctly in the HTTP request by matching them against the credentials stored for the user in your database, or however else you implement this. If the authentication is successful you would normally have started a session in PHP by using the session handler. A simple call to session_start() will start the session handler, check the request headers for a valid session ID (or SID) according the session_name directive in PHP, open the existing session file on the server for reading or create a new one if one does not already exist for the supplied SID (the session file is also locked), unserialize the session data and populate it to the $_SESSION superglobal (if any session data exists in the session), and once script execution ends or the session handler is called to close the session any data left in $_SESSION is serialized and written back to the session file and the exclusive lock on the session file is released. It is then your job to store some data in this session file to indicate to your subsequent PHP code that this user has authenticated successfully and any future requests using this existing session will not require the user to re-authenticate by entering their username/password, for example. This could be as simple as adding the user-id to the $_SESSION superglobal. You could also use some boolean value for authentication like $_SESSION['authenticated'] = TRUE; but you still need to know the user ID from database in most cases so it makes sense to store it there.
The Problem
This all works great until the garbage collector comes around and eventually removes those session files on the server. The reason is because session data is meant to be temporary. It should be something you expect to lose at some point and you should not worry about losing it at any time since your real persistence layer should be your database or some other permanent form of storage. PHP’s session handler comes equipped with a number of directives to control how the sessions will work. I’ll explain some of the important directives for the session handler that you should be concerned with in this situation.
- session.gc_divisor
- The gc_divisor is the divisor used by the Garbage Collector along with the gc_probability as a probability factor to determine how often the garbage collector should run. This number should usually be set to 100.
- session.gc_probability
- The gc_probability is used along with the divisor to get the probability for running the garbage collector each time a request is run through PHP. This number is usually set to 1 and should be increased or decreased accordingly depending on the load of the server, number of users, and number of requests. The formula is basically gc_probability over gc_divisor equals the chance the garbage collector will be invoked. So if these numbers are set to 1 and 100, respectively, the chances the garbage collector will come around is 1 out of every 100 requests. The reason you may want the garbage collector to run more often is if you have a large number of session files on the server and you don’t want to slow PHP or the server down by having to delete huge amounts of these files at a time. The higher probability may help in this situation. Adversely, if you get too many requests too fast the GC (Garbage Collector) may be slowing you down and the probability may need to be lowered.
- session.gc_maxlifetime
- The maxlifetime directive is telling the garbage collector how long (in number of whole seconds) to allow a session file to remain in the session directory before it is considered garbage and removed by the garbage collector. This is usually determined by checking the file’s atime or mtime (depending on your file system this information may not always be available and may be ignored completely thus eliminating the garbage collector) against the system’s current time. If the difference exceeds the number of seconds specified in this directive when the garbage collector comes around, then it is considered flagged for deletion and should be removed by the garbage collector.
- session.save_path
- This is the path PHP will use to store session data when the session handler is set to save sessions using files, which is the default behavior. This path needs to be writeable by PHP and there are some things you should consider when this path is shared by multiple scripts using different session directives. For example, if you have two different scripts each specifying a different session.gc_maxlifetime directive using the same session.save_path the garbage collector uses the lowest value to determine when to mark files for deletion. Also since PHP’s session handler does not protect against things like session collisions you should not increase the session.gc_maxlifetime directive beyond 20 or 30 minutes (depending on the server load). Otherwise you increase your chances of creating a session collision substantially if the traffic to the server is high enough. It’s not entirely unreasonable for two session requests to be generated at the same exact microsecond and chances are this could likely result in two sessions assigned the same session ID and now one of two users may be logged into the other user’s account through no fault of their own.
- session.cookie_lifetime
- This should not be confused with directives affecting your server’s session file. The session cookie is sent to the user. It contains the session name and session id of their session. The session name is specified by the session.name directive. The session id is the name of the session file on your server (but does not include any path), which is just a random hash generated by PHP using the session.hash_function and session.hash_bits_per_character directives. The default hash function is normally 0, which might be MD5 – a 128-bit hash. You can get a list of all the supported hashing algorithms on your system using the hash_algos() function in PHP. This should return an array of all the available algorithms and their corresponding index. This is the number you specify for the directive. The session.hash_bits_per_character may be set to either 4, 5, or 6 and that just tells PHP how many bits from the binary data produced by the hashing algorithm to store in each alphanumeric-character sent to the session handler for producing the session ID. The default is usually 4. So for a 128-bit MD5 sum this would result in a session ID of 32 alpha-numeric characters.
So once your users’ sessions have expired the PHP code that depends on the session for validating authentication will fail and the user will be forced to log in again or re-authenticate. This can be a little frustrating for the user depending on how often they use the service and how long it’s been between requests. I’ve seen a lot of developers in PHP make the very poor choice of raising the session.gc_maxlifetime directive to something ridiculous like 24 hours or even 30 days to avoid having the user re-authenticate and keeping them logged in. This is just plain wrong and can lead to horrible compromises of your system. This is NOT the way to implement a remember-me feature for your users. For one, if you allow the GC (garbage collector) to keep files around for hours or even days at a time you have exponentially increased the probability that PHP can generate a new session id for one of your users that already exists. Remember, PHP’s session handler does not protect against session collisions. It has no idea if a file already exists using this session id in your session.save_path. It doesn’t bother to check. It just generates a random hash and if the file already exists it’s overwritten. By lowering the max_lifetime of your sessions you force the garbage collector to remove older data and narrow down the possibility of a collision.
Think of it this way. If your server is getting around 100K hits per hour and there are about 1K unique visitors per hour and they each generate an average of four or five session hits per hour the probability that one of them will make a request to an expired session is fairly high depending on the frequency between hits. This is all assuming your session’s max_lifetime directive is set to 20 or 30 minutes (at the highest). If you increased this directive to 30 days the chances that any of them will hit an expired session within the hour (causing the session handler to generate a new session) has become 0. What this means is PHP is generating a new session file every time you call session_start() unless this user has sent a valid SID in their request header. So if a user has cookies turned off or their browser has expired the cookie sent by your session handler the server is constantly generating new sessions for all of these users and it’s not deleting any old sessions until they’ve reached this 30 day period. It’s possible to fill up your server with hundreds of thousands of new session files each day in this manner. This means in a month you could have millions of session files and the possibility of a collision is far more apparent.
The next mistake I see some PHP developers making down this path is trying to move the session handling away from the session handler and into their database and user-land code. This is also a pretty bad idea. For one, your database is already your biggest bottleneck 9 out of 10 times in virtually any application. Second, if you aren’t using the session handler you’re wasting a lot of time writing code, in user-space, that’s already been written for you. You are only increasing the chances that you will have more buggy code without thorough testing or ultimately just wasting more time reinventing the wheel. PHP’s session handler is a lot more powerful than most people think. You can even define your own custom session handler if you wanted, but normally the default session handler works just fine even on large scale systems in all the cases I have I worked on. I’ve had no real trouble as of yet getting sessions to work even on large clusters just using PHP’s default session handler. If you’re going to be adding some memcache or reddis on the side for whatever purposes your application requires, that’s fine, but the session handler works great for handling sessions. The additional functionality you may choose to include through your database or other cache are just added bonuses, which is what the remember-me feature is all about. It should have nothing to do with your session at all. It is a completely separate functionality that may or may not work along-side of your existing user session code, but isn’t a replacement or extension of the session itself.
The Solution
What you do want for the remember-me feature is a completely random value that’s stored in your database along with the user’s credentials. So, for example, you probably already have a user table in your database that stores the username and password for each user. Simply add a column to that table for the remember-me feature. You normally want this to be a VARCHAR or TEXT column since it will store an arbitrary amount of data from a binary source of entropy. You likely also want to set a unique constraint on this field so that your dbms will prohibit you from storing any two rows with the same exact value. This will prevent users from inadvertently being signed into each others accounts in case of a collision. Next you want to produce a completely random string for this field. It should have nothing to do with the existing data about this user. I see a lot of people making the mistake of trying to hash the username/password along with a unix timestamp, for example. This is just wrong and you don’t ever want to do this. The reason is because it’s easier to try and reverse engineer the algorithm you used to build the hash than it is to try and bruteforce my way through if I’m trying to compromise your system. If I figure out how you formulate the hash I can easily reproduce that hash and attempt to gain unlawful access to a system. This might not sound very plausible to the average person, but for a skilled Infosec guru this is child’s play. A hash using time() in PHP is not cryptographically secure since if I make enough requests to your server to generate multiple hashes I can ultimately figure out how to reverse engineer that hash. There’s a lot of trial and error involved, but it’s still easier than attempting a brute force of 2 to the power of 128 per user. The unix timestamp makes your algorithm time-sensitive and now I’ve revealed a reversible factor in the engineering process. However, data from /dev/urandom, for example is not time-sensitive in any meaningful way to an engineer. I can’t tell you what /dev/urandom will have produced 10 seconds ago any more than I can tell you what it will produce now or in the future. But I can, however, tell you exactly what time() will have produced 10 seconds ago as well as now and in the future.
So instead to produce this value we can use the following code
/*
* Works on both Windows and UNIX/linux platforms.
*/
$key = bin2hex(mcrypt_create_iv(100,MCRYPT_DEV_URANDOM));
// Or use base64_encode() for transport instead
$key = base64_encode(mcrypt_create_iv(100,MCRYPT_DEV_URANDOM)); // Notice the key is not hashed
/*
* Will only work on UNIX/linux platforms
*/
$fp = fopen('/dev/urandom', 'rb');
/*
We use the 'b' flag for binary so that PHP won't attempt
to translate any of data such as the line break characters
on Windows/*nix platforms.
*/
$key = bin2hex(fread($fp, 100)); // Notice the key is not hashed
// Or use base64_encode() for transport instead
$key = base64_encode(fread($fp, 100)); // Notice the key is not hashed
fclose($fp);
On a linux system you can also get the data from /dev/urandom directly using fopen(), for example, and produce the value from the entropy source that way. The above example, however, will work on both Windows and Linux as of PHP 5.3 and produces completely random data. Notice, I’m not hashing the value of the data at all. I’m simply encoding the binary data into a human-readable hexadecimal representation since it will need to be transported over HTTP (it must be URL-encoded). You can leave it as is and just URL-encode the binary data before transport, but that will likely take up a lot more space and you usually want to keep your HTTP response headers fairly small. I chose 100 characters as the payload size and you can either increase or decrease this accordingly to suit your needs, but you usually want to keep at no less than 100 characters in most cases. This extrapolates the possibility for a bruteforce attempt, exponentially so the higher the better.
Next you want to store this value in your database along with the user’s row in your user table whenever the user re-authenticates themselves. So when ever they type in their username/password to sign in and have the “remember-me” box checked upon sign-in you would use this code to generate the random string and store in the database for that user. You then want to send it to them as a cookie during that request.
// Be sure to store the $key value in your database
setcookie("rememberme", $key, time()+3600*24*30); // Set the cookie to expire after 30 days
Now, in the authentication mechanism for your application wherever you check the session to verify the user is still logged in you would ad a subsequent check for this cookie in the event the user’s session does not exist. If the cookie value matches the key stored in your database for that user then you know they’re still authenticated since they chose to remain signed in for X number of days. You then simply regenerate a new session for that user and proceed as normal.
session_start(); // Start the session handler
if (!empty($_SESSION['userid']) && is_valid_userid($_SESSION['userid']) {
// This means both the session file exists and contains a valid userid in the database.
// So the user is authenticated and we can proceed as normal.
/*
Handle authenticated procedures here...
*/
}
else {
// This means the session file doesn't exist, is empty, or there is no valid userid
// This is where we will check for a 'rememberme' cookie if one exists
if (!empty($_COOKIE['remember']) && is_valid_rememberme_cookie($_COOKIE['rememberme']) {
// The user has a valid rememberme cookie and the token matches a user in the database
/* The session is already started so just generate the session data accordingly as if the user has already authenticated */
$_SESSION['userid'] = get_userid_from_database($_COOKIE['rememberme']); // You need to implement this functionality yourself
}
else {
// User is not authenticated and has no remember me cookie proceed to redirecting the user to the login page.
header('Location: http://www.example.com/login');
}
}
Notice we don’t add any username in the cookie since we already have a unique constraint in the database to prevent duplicate values of the same field in the table. We always know the value is going to be unique and whatever row it matches is the user it belongs to. Of course, since this cookie is regenerated each time the user re-authenticates it helps avoid the problem of the user being remembered on multiple machines. If they chose to be remembered from one machine and then re-authenticate on another machine also chosing to be remembered a new cookie is generated, rendering the old one ineffective.
It’s important to note you should have a mechanism in place to either recover from the problem of the value being rejected by the database as a duplicate or notify the user of unrecoverable fatal error due to this problem. The chances of this happening, however, are pretty slim given the size of the random string and the entropy sources involved. Set it high enough and even with a huge database of users it’s fairly low-risk. Given that I can produce over 100K unique keys using /dev/urandom on a dual-core machine in less than 600 milliseconds and consistently produce unique random data over the course of a few days for millions of rows in the database this approach is quite reasonable even for large systems. Here’s how I tested this method over the course of about 7 days for ten million users and came up with only 1 collision out of every one hundred billion attempts. That’s an average of 0.000000001% chance of collision even on high load systems according to my own tests, and keep in mind you can even decrease these chances with better sources of entropy and higher payloads depending on your needs.
ini_set('memory_limit', -1);
$start = microtime(true);
for($i = 0; $i < 100000; $i ++) {
$key[] = bin2hex(mcrypt_create_iv(100,MCRYPT_DEV_URANDOM));
}
$end = microtime(true);
$time = sprintf('%0.03f ms',($end - $start)*1000);
$c = count($key);
$u = count(array_unique($key));
$n = $c - $u;
echo "Generated $c keys in $time. There were $n/$c repeat keys.
";
echo "
Sample Keys:
\n";
for ($i = 0; $i < 10; $i ++) echo "{$key[$i]}
\n";
I ran similar code an arbitrary number of times (between 1-10 times per minute) at a frequency of one minute for a period of 1 week on a database of a few million users (all random data used for testing) and came up with just one collision. You can easily reduce even this number by orders of magnitudes with just a little bit of work. The fact that a collision can happen doesn’t say anything about the effectiveness of the approach since it’s prevented from becoming a security risk with the unique constraint in your database. The aptitude of the approach, however, is very effective since it doesn’t rely on the session at all and works well alongside of your existing session-based mechanisms, but more importantly it is a completely random token that can’t be reverse engineered (not a hash of some deterministic data).
Synopsis
So in summation what you don’t want to do when implementing this remember-me functionality in your own applications is to make it a feature of the session itself. That’s a big no-no since it requires relying on the session lifetime, at least in PHP, and with the default PHP session handler we explored how risky this is. You do want to implement it as a completely separate feature and not rely on any existing data or hash of the user data to generate the random key. You don’t want to expose any sensitive user information either so don’t send the userid, username, or password in the cookie just like your session cookie does not send any of this information. Instead everything is kept on the server and away from prying eyes. You don’t want any attackers to be able to reverse engineer how the key is generated (just as with the session) so it’s completely random and not a hashed value of some data that can be reverse engineered.
