Sherif's Tech Blog

Just another guy on the Internet with a keyboard…

Browsing the Web

When you browse the web today, compared to just 12 years ago, you find it’s a much more competitive market place. There wasn’t much commercialization taking place on the web in the mid to late 90’s. eBay and amazon were just about it; as far as consumer-based competition was concerned. The rest of the dot com start-ups were trying to make their money by reaching large audiences very quickly and profiting from that reach through advertisers — at the time being the most interested parties in said reach.

Some Things Never Change

Surprisingly enough though, some things haven’t changed much since. Many of these companies are still trying to gain a larger portion of the advertising market share through the web. When you consider some of the most popular websites on the web today, namely Google, facebook, yahoo, twitter, and probably still myspace, you realize all of these companies have something in common. They’re all trying to make a buck by getting the attention of parties interested in spending lots of ad dollars. Who are these advertisers? Pretty much any entity that is willing to pay money for exposing it’s advertisements to people.Google seems to have made the biggest stride in this area with more than 98% of it’s revenues coming directly from advertising over the last decade. Facebook won’t disclose any real financial data publicly, yet. So there’s no telling for sure if they are making any significant profits from their advertising revenues just yet. Yahoo at some point had seemed to be succeeding in gaining a significant portion of this market share, but apparently has bombed recently as its stocks have plummeted and it’s advertising business slowly starting to fall apart. As for twitter and myspace, they’re still both privately owned companies with little public financial disclosure, but – much like facebook – while they appear to be quite popular they don’t appear to have made any significant strides in advertising either (myspace still getting the bulk of its advertising from Google Adsense).

Looks Can Be Deceiving

Sometimes looks can be quite deceiving, though. For instance, when you consider some of the raw numbers that these companies claim – head held high – like the number of users they serve, or the number of people they employee, or the amount of revenues they generate, not all of it is put into clear perspective.

For eaxmple, facebook, recently claimed it had 800 million users. This is indeed a huge number of users. If you were to compare it to the average website on the web today or even the average corporation, this makes it seem like a huge company. But facebook only employs around 2,000 people and none of their users pay a single dime for using the service. The service is free so clearly 800 million people aren’t facebook’s customers. Then again what does facebook actually do for its 800 million users? Well, we know a lot about what they don’t do. They certainly don’t write all of the software that their users come back for so often. There are tens of thousands of facebook developers that either develop facebook apps or build on top of the facebook platform just for the benefit of having access to the huge social graph that gathers under the umbrella that is facebook. So nothing about their software actually makes their service unique per se, just that they have access to a huge userbase which interests a lot of parties. They also don’t do much of anything significant with what software they have built. If we consider their largest and probably most significant work it would have to be Casandra. HipHop might probably rank second. Their SDK is actually pretty much poorly documented and poorly supported, for the most part. Photos? Well, they do claim to have the largest photo sharing application on the web, but then again flickr isn’t far behind. And it’s not like facebook photos does anything really interesting with the actual photos apart from tagging (clearly their most popular feature). Other sites have actually done a far better job with photo sharing than facebook even though they may not be as popular. So in all reality these numbers – once put into some perspective – actually indicate pretty negative things about facebook even though it tries to put them in a positive light with a flashy spin (in hope that this will only boost their already growing popularity of course). The reality, however, is that this only indicates how under-staffed, under-paid, and over-commited facebook really is.

Here are some charts to demonstrate what these numbers actually indicate with some contrast for visualization.

Even if we consider every single user on facebook to be a customer of facebook, as a company, this would mean they are gaining the attention of about half the number of customers Microsoft currently attracts with it’s dominant Windows Operating system market share. They are neck and neck with Google in terms of users, but facebook claims to have surpassed Google in pageviews. There are probably very good reasons as to why that is. Let’s not forget people who visit facebook are usually their to do a lot of things. Chat with friends, browse every photo their friend ever took of their toe-nail, armpit, mirror-shot, bar-hoping-night, passed-out-magic-marker-art, and just about every other pointless moment their friends have ever captured on camera and uploaded to facebook for the world to see, not to mention the hundreds of wall-posts, messages, and I-Like hits the site gets every day from all of its users. In a single second facebook will probably have received around 2,000 photo uploads from its users. In about the time it has taken me to write this blog post they will probably have gotten around 3 Million new photo uploads. Each one of those uploads will generate a page view. Each time you play farmville or use an app that lets you know when your friend made a funny face or tried to some-how through the powers of dark magic poke you over TCP/IP that also generates what facebook considers a page-view.

The point I’m trying to make here is that facebook is clearly a content-provider. But just like Google, they aren’t the ones producing all of the content. However, unlike Google, facebook does actually facilitate a place for you to store and share all of this content and encourages it profusely by not setting a lot of stringent restrictions on how much content you can share.


If we took the amount of content users shared as an indication that a company was powerful in reach, however, we would have to say Hotmail is probably more of a social networking tool than facebook and twitter combined. Consider that Hotmail has only around 364 Million users and they share billions of emails every day – this would indicate more active sharing than facebook has ever openly claimed – even though Hotmail has less than half the userbase. Additionally if we compare the number of developers that actually work on supporting the service that these users are making use of facebook clearly has a higher developer to user load ratio than any of these other companies.

Stepping Outside of the Browser for a Minute

So it’s not clear that twitter, facebook, myspace or any of these other very popular social networking sites, are actually proving to be nearly as profitable as they are popular, despite them clearly being a great tool for people to communicate. The reasons for this are mainly the advertising risk. Twitter, facebook, myspace all clearly strong in numbers (any company that has to support a site that has users in the millions is nothing to laugh at), are still weak in strategy. There’s no good reason for a serious advertising (with billions of dollars to invest) to make long-term commitments to a user base that may very well disappear in a few years.

If you think of what Twitter is really all about, for example, it’s like reminds me of when we used to pass short little notes around in class during high school on tiny snippets of paper with messages like “see you at lunch…” or something silly like that. You basically send messages to people who choose to read them in tiny bite-sized pieces. This is nothing fascinating. The only reason the service is even popular is because a lot of people – at one point – found their friends using it (just  like myspace and facebook evolved) and decided to use it as well. That’s the thing about these social-based sites. They are easy to gather around and just as easy to turn away from. Eventually some one gets bored and a collapse of the social graph brings down the whole thing.

Google, on the other hand, doesn’t actually want to keep you within its borders all day long. It actually wants to get you the information you were looking for and out the door as quickly as possible with little to no distraction along the way. Quite the opposite from what facebook aims to do, because facebook understands it’s only value is in how long its users chose to stay. So they will do everything in their power to keep their users there longer. Google knows its users only come back because they know something else is out there to be found and that Google is probably going to find it for them more quickly (since it’s done so numerous times in the past with great success).

If you’re looking for a nice read on the subject of Google – by the way – I’d recommend this title right off my recent bookshelf Googled The End of the World as We Know It by Ken Auletta

Googled The End of the World as we Know It - Ken Auletta

Googled

It’s Not Browsing, It’s Searching

It makes sense that someone would turn to their friends or colleagues, etc, to find out about which of the latest blockbuster movies is popular when deciding on purhcasing a DVD or paying a visit to the movie theater, for example. But it might not make a whole lot of sense that someone would spend their time searching facebook when they’re looking for information on World War II history, or a photo of the point contact transistor at Bell Labs in 1947. People are probably more likely to turn to Google or Wikipedia for that type of search. Even though I have no doubt people are probably starting conversations on facebook groups or posting messages on people’s walls with very similar questions as you read this. To be fair, however, facebook and twitter probably do have good uses in searching for information as well. For example, you might hear about some new trend or some recent local, national, or even global event that just took place on one of these social networking sites, but it’s not like you can’t hear about that on the news either.

It’s just people’s naturally insatiable curiosity that won’t stop them from asking questions or looking for answers. This is why Google has become so popular today. They found the ultimate way to keep users coming back for more. Facebook, while it appears to have done the same thing, it apparently does it for all the wrong reasons. Google’s mission is to solve the problem of search (people are always asking questions about something) by making use of the worlds information as it becomes available (i.e. the web, books, news papers, television, etc…). However, facebook’s mission (while not even clear at this stage) seems to publicly say it’s attempting to do something fancy with something called the “social graph” (yes I really put a quote-unqoute on that). However, internally, facebook developers aren’t really sure what they’re doing. They seem to blog a lot about all these neat new technologies the company is getting its hands on, but what about this social graph? What does it do? How does it really make our lives better? Because we can communicate with our friends? No, that can’t be it. We’ve been communicating with our friends long before facebook ever showed up. Is it because facebook makes it easier to see what our friends are doing? No, that can’t be it either, because our friends still have to show us what they’re doing for us to see it (so instead of you seeing your friend getting drunk at the bar you get to see them in a photo on facebook drunk at the bar). Clearly, that isn’t making anyone’s life better (at least not yet). Is it because facebook offers a platform where you can connect millions of people and extract large quantities of useful information out of this so-called social graph? Well, to whose benefit is that, exactly? I don’t know about you, but I really don’t want everyone having access to all of my information just because I want to use their app to see what books my friends are reading (and yes there are very little to virtually no restrictions on how much information each app you use can access about you). I really don’t care to know how many people poke how many other people every day instead of getting off the computer or pulling their eyes away from their smart phone for a minute to take a look at the people in their immediate presence for a change. I don’t find anything about that useful, do you? Perhaps some might, but who? It’s probably not the average joe. It would probably be someone with a serious vested interest in you. Someone like say a company that produces hundred dollar designer jeans and seeks yet another way to pry into your personal details to figure out just how much influence they can muster to get you and all of your friends to buy those really expensive jeans that will just make your butt look spectacular.

At the end of the day, the numbers speak for themselves.

But, you’re just browsing the web. It’s not like any of this occurs to the average person on a regular basis as they “just browse the web”…

Google April Fools Day Pranks: Who’s the Fool?

Google does it again this year with another great April Fools Day prank, but this one tops them all as Google goes double-fold this year! That’s right, they’ve setup not one, but two April Fools Day pranks on their website this year. Though the question remains who really is the fool in these elaborate pranks?

Google has a geeky sense of humor that appeals to both the witty, technically savvy, individuals as well as the technically-challenged yet equally witty. When Google announced “The Virgle Project: The Adventure of Many Life Times” in 2008 where they outlined a plan to colonize Mars – it was clearly a hoax. However, NASA has already made tremendous efforts to plan missions to Mars. Have we also forgotten the space race that led man to walk on the Moon or even the International Space Station? These pranks may be funny, but they are also not out of the realm of possibility – and that is what Google has always tried to do (aim for nothing short of revolutionary change).

Today Google placed a link on their home page that reads Gmail Motion: Turn your email into a true body of work. All pun intended, I’m sure.

Google Hiring Autocompleters

Google Hiring Autocompleters

Additionally, if you’ve used Google to run a search today you’ll notice some subtle new link at the bottom of your auto-complete dialogue box. Yes, you read that correctly! It’s a job offer from Google to become an Auto-Completer. Yet another brilliant demonstration of how Google is always looking to surprise us while retaining a stealthy demonstration of their technologies. You may recall the PigeonRank prank back in 2002 when Google attempted to boast their PageRank technology. Though, they remain both surreptitious and humorous as always in doing so. Of course, Google is always in the news around April Fools Day; ComputerWorld.com wrote about the Google Gmail Prank and even the New York Times and the Washington Post mentioned Google in the news today, so clearly they do get a lot of publicity from these pranks, which has to be good for business. For example, the Huffington Post elaborates on some of the last-minute pranks you can play on your friends this April Fools – mentioning the Google Search Settings (language feature). Did you know, for example, that you can change your language preferences on Google to things like Hacker, Elmer Fudd or even Kilngon, and good luck trying to get back to English from there!

Though, the reality is that Google has always been secretive about revealing what really goes behind their technology. Where does this obsession come from? Well, one of the co-founders of Google – Larry Page – has a fascination with Nikola Tesla, whom he admired for his visionary brilliance, but also was led to caution in that Tesla was never well recognized in his time and eventually expatriated from the scientific community for his wild claims of technological advancement. Tesla was also secretive about his inventions.

So is Googles Gmail Motion really a possibility of technology we may see in our life time? Well, the answer is yes. The technology for motion-interface has already been around for some time, though it has not been highly commercialized as of yet. Have you ever watched the Tom Cruise block-buster hit Minority Report? Well, it’s real believe it or not. Steven Spielberg didn’t actually make this one up!

So, who really is the fool in these jokes, I ask? Because it seems to me that Google just might have the last laugh. While all those other little companies are inventing petty little word-press widgets and obsessed with their facebook Like buttons, Google may be on the verge of availing technology that will change the way we have come to use computers in our everyday lives…

The Eventfulness of our Lives

Today the world is a very eventful place. Everything that is going on around us is inexplicably connected to some event at first. We have trouble easily building connections between all of the ongoing events as the world population grows profusely and time doesn’t seem to stand still for us to circulate this information quickly enough. So today we rely on technology to increase the effectiveness and efficiency of our communications. From the telephone, to the Internet, to beyond we are only as effective, in this modern society, as the tools that enable us to communicate.
The biggest challenge we’ve probably had to face so far is being able to find a way to make use of all of the information we posses with the technology we have amounted over the years. While this may seem like a trivial thing at first it is not without its trials and tribulations. As we all have come to realize by now, it makes no sense to transfer existing data and information from physical media to virtual media in order to say that we have utilized the capabilities of modern computers and their vast storage capacities. Obviously this is not a notable accomplishment. Instead, we come to realize that people will consume media in all formats. The true success story is in being able to have computers interpret or manipulate this data in a way that is useful to large groups of people. Not only that, but clearly we need to make this information easily accessible and well organized so that it may serve a diverse audience. It’s pretty clear that Google has taken on a significant portion of this effort over the years.
While having access to information in and of itself is very important, it’s also equally as important to be able to contribute information and have it become accessible to others. Thus social networking and social media, in general, has been increasingly utilized as a broadcasting mechanism for the average individual. The ability of millions of Egyptians to have their voices heard (figuratively speaking) on facebook and micro blogging web sites like twitter was a significant event to the Egyptian revolution in January of this year, for example. It was so significant, in fact, that the Egyptian government felt it was necessary to unplug virtually the entire country from the Internet so that it would not pose a further threat to the already then escalating situation. Governments all over the world are quickly realizing how much power the Internet gives people today by making information quickly and easily accessible as well as useful. China has demonstrated quite clearly that it intends to filter and control the information that it makes available to its people over the Internet by filtering web search results, blocking entire parts of the web and even specific services. Iran has either slowed social networking websites to a crawl or blocked them entirely in the past. The American government just seems to play the role of intelligence-gatherer by harnessing the Internet as a tool to keep close tabs on people.
How, day-to-day, mundane events such as “I just bought a new car…” or “Going on ski trip next week…” can be of any significance to a vast multitude of people is still unclear, but that shouldn’t stop us from making the information easily accessible to a diverse audience. We can deemphasize the information or further disassociate it from more wide-spread or significant events. Not only that, but we can even present the information in abstract as it pertains to other more specific information that we do happen to be searching for (i.e. I don’t care who is buying a new car or who is going on a ski trip, for example, but I may be interested in how many other people have also purchased the same car that I did this year or even how many people liked a particular ski resort that I am planning on visiting). It is this level of disambiguation among relational data or subsets of that data, which enables us to make information even more useful and easy to replicate and reproduce.

Just as in the case of karma systems making it easier to identify certain information as being more valuable, there can be mechanisms in place to more efficiently utilize the existing abundance of information and deliver it to the proper channels (i.e. the ones that would find it more useful).

Developing Your own Search Engine

I recently decided to explore the possibility of developing my own search engine. This involved understanding a few key concepts of how a search engine fundamentally works as well as some independent research on what popular languages would be suitable for such a project. It seems that while using a basic LAMP stack might prove too faulty on a large scale search engine it is not impossible. Building a basic web crawler with PHP/MySQL may not be as horrible of an idea as one might think!
It’s only mid-December and already I can feel the on-slew of cold as winter approaches. With a few inches of snow piling up outside my window, I warm myself up with a fresh cup of coffee and plug back in to my laptop; Yet, again I am exploring the wonderful world of search.
We’ve come a long way from the Google Search of 1999 and the Yahoo Search of the past. Bing has recently stepped onto the web scene (as a mass-marketed re-branded web solution). Google, still tops the lot in sheer volume of search traffic, but aren’t we forgetting about all of the other search engines on our massive list of web search?
The reason it sparked my interest to find out more about some of the requirements and specifications necessary for developing your own search engine was due to a recent project that involved seeking out and identifying broken links on a web site. A part of the project involved writing a small light-weight bot that would crawl only the internal links that pointed to your web site and detect any broken links in order to notify the web master. This is actually one fundamental part of what a search engine does. In my efforts I needed to find the most effective and efficient methods to crawl a site internally in order to find all of its internal links without using any site-map or file-system structures. It needed to be done strictly over HTTP and this is what led me to some key search engine and web crawler articles.
The most difficult part of writing this bot was having to rely solely on the links that the web pages of the site provided. Since a dynamic or database-driven web site can potentially have thousands of pages it proved more difficult than I had originally anticipated to map out this link structure directly over an HTTP crawl.

HTML Parsing
The first challenge was trying to obtain all the links provided by the <a> tags in the page. At first I thought to use a regex search pattern, but later on it proved more ostentatious than necessary. So, instead I turned to a common solution using the PHP libxml DOM object. With DOM you can easily obtain all of the HREF properties of the <a> HTML tags on the page through the getElementsByTagName method. Once we have the specific HREF attribute of that tag we can iterate through the object and produce an array for all other tags.
Some of the problems further presented by parsing the HTML could involve javascript, improper markup, or other external requirements. In the spirit of keeping things simple I chose to ignore these underlying factors.

The Domain Name Space Problem
The second challenge was identifying which of the links on the page were pointing to another page on the same website and which were pointing to external websites. Domain name-space is so vast that it has become increasingly difficult to identify the validity of a domain without having to rely on DNS. Since TLDs have grown so tall and stretched so wide that they now encompass UTF-8 characters and even CTLD catenation, many of my test methods have failed. Again, for purposes of simplification I chose to ignore the one-off problems and deal with the most common scenarios and their proven solutions.
I solved this problem by writing a very simplistic function to help me break down the individual components of the URL into some very distinct properties and return them in an array. The code below demonstrates the basic functionality of this method.

function breakURI($URI) {
  // This function identifies four key components of any URI that's the protocol, the domain, the port and the path.
  // The domain can be further broken down in to tld and sub domains, and the path can be further divded in to path + query string
  $pattern = "/^(?<protocol>[a-z]+:\/\/)?(?<domain>[a-z0-9\.-]*\.[a-z]{2,})(?<port>:\d+)?(?<path>[\/|\?].*)?$/i";
  if (!preg_match($pattern, $URI, $match)) return false;
  // Get the query string from the path (if any) and move it to it's own element in the array
  if (isset($match['path']) && strstr($match['path'],"?")) {
    $match['querystring'] = substr(strstr($match['path'],"?"),1);
    $match['path'] = str_replace('?'.$match['querystring'], null, $match['path']);
  }
  if (isset($match['port'])) $match['port'] = substr($match['port'], 1);
  // Verify that the domain is indeed a valid FQDN and classify any of its sub-parts
  $match['subdomains'] = explode(".",$match['domain']);
  $match['fqdn'] = $match['domain'];
  $match['tld'] = array_pop($match['subdomains']);
  $match['domain'] = array_pop($match['subdomains']);
  foreach($match['subdomains'] as $domain) if (strlen($domain) < 1 || $domain == '.' || $domain === null) return false;
  if (substr($match['domain'], 0, 1) == '-' || substr($match['domain'], -1) == '-') return false;
  $keys = array_keys($match);
  foreach ($keys as $var) if (is_numeric($var)) unset($match[$var]);
  if (!$match['protocol']) $match['protocol'] = 'http://'; // Protocol is always assumed if not supplied
  if (!isset($match['path']) || strlen($match['path']) < 1) $match['path'] = '/'; // Path is always assumed to be root if not path is supplied
  $add_port = (isset($match['port']) && $match['port'] && $match['port'] != "80") ? ':' .$match['port'] : null;
  $match['URI'] = (isset($match['querystring'])) ? $match['protocol'] . $match['fqdn'] . $add_port . $match['path'] . '?' . $match['querystring'] : $match['protocol'] . $match['fqdn'] . $add_port . $match['path'];
  return $match;
}

Since all web crawlers need to start somewhere it was easy enough to work from a web sites home page and then identify relative and absolute links from there. It’s assumed that the homepage should contain links to all of the key areas of the web site and that all other pages should point back to the home page. This also led me to a deeper understanding of a primary component in SEO. Search engines like web sites that make it easy to discover their site-map through proper linking.

Link Resolution
I had to make a few basic assumptions in sifting through the links of each page with my crawler. Whenever a link leads to a ‘javascript:’ or ‘mailto:’ it is safe to ignore it for the purposes of crawling the site since it is not likely to point to anything in which the crawler will be interested. Whenever the link does not contain a domain name it is not likely to be an internal link. Since some websites will use a relative-path link for some or all of their pages, I needed to ensure that those paths were properly used in forming the complete URL. Anything with a relative-path is obviously considered an internal link. For those links that contained an actual domain name or full URI we needed to determine if the domain name of the web site was a part of this URL to ensure that it was still pointing internally to the same web site. In my test-runs, sub-domains and IP addresses proved somewhat problematic to my methods. I worked around this by simply resolving all IP addresses to their rDNS (where available) or just ignoring it all-together. Allowing the crawler to blindly follow external links was, no-doubt, a bad idea since this would lead to an overload of my modest development server. This is why I had to be very careful in making sure the crawler was only following internal links.
During my early test runs the bot had actually followed a twitter link, which caused it to crawl thousands of pages on twitter and many other sites that just flooded my db with hundreds of thousands of external links. Needless to say I learned that such a bot would need constant monitoring when working in a small development environment that might not be able to handle that much traffic.

Indexing: The Most Cumbersome Search Engine Task
An actual crawler-search-engine is made up of more than one part. It usually has the crawler, which is the automated bot that follows the links on each page to explore a significant portion of the web. It also has an index, which stores the information about the web pages the bot has crawled and sorts them in terms of components that make up its indexing algorithms. The simple bot I described earlier was just one basic part of a search engine, but it was the most relevant part to my particular project.
Later, I decided to take my interests a little further by starting a personal project on the side that involved developing a simple indexing algorithm. This is where everything got really interesting! It appears that different search engines have different ways of indexing the web.
When we use a search engine like Google to search for things that interest us we aren’t really searching the web, but a sort-of copy of the web that is stored in Google’s index. This index doesn’t actually contain every web page on the world-wide-web, but a rather significant portion of it (billions and billions of web pages). It is significant enough so that we can safely assume that when we search Google we are in fact searching the web. This index is also updated constantly, but it isn’t always 100% accurate. Sometimes search engines will fail in finding the most relevant information for your search. The reason for this is that there is so much information on the web and so little information is provided by your search query in order to accurately determine the most relevant results in a reasonable amount of time and without causing the user to supply additional information about their search.
Search engines aim to solve this problem by developing various indexing algorithms that ask additional questions – not to the user, but to the pages in its index. These questions focus mainly on where your search terms fall on the web pages found to contain those terms and how frequent they are on those pages. That’s the most fundamental aspect of most indexing algorithms, but many search engines today take it a lot further. Google, for example, will attempt to exclude your pages from its index if you use terms or phrases too often (possible spaming); Use them too many times and the search is not likely relevant; Use them too few times and the search is also not likely relevant; Use them in your title, but not your URL and the search is likely ranked lower by the indexing algorithm.
At this point what I had was a very simple bot that constantly crawled through all the pages on a particular web site and stored the pages in a MySQL database. There was another script that would sort this index using only a few factors (namely: key-word location, frequency, and external link relevancy). This could be either achieved through writing your own daemon or just simply going about it the easy way, like I did, and using a custom script run by a cron job. I noticed how quickly the index could grow if I let the bot crawl external links. My database went from just a few Megabytes to several Gigabytes within just a couple of weeks.

The Search
Now, all I needed was an actual search component. A script that would return results for a particular search phrase by gathering things like Titles, Web Snippets, Links, and Cached Pages. This is all very similar to what Google does albeit on a much-much smaller scale. I didn’t intend to write a fully functional search engine so some of my techniques are actually quite error-prone and most will probably fail the scalability test. Though I think for a personal project I’m quite psyched about taking up building a search engine further. I may consider one day releasing the code and open-sourcing the project through some distributed source revision control system like git or mercurial. Who knows? For now I’m just happy to share some of my minuscule findings.

Other Uses
So what possible reason would there be to develop your own search engine? It’s true that trying to do what Google has already done falls in the cliché of “re-inventing the wheel”, but that shouldn’t stop us learning from, experimenting with, or redesigning the wheel. There could be a number of other practical applications for these techniques. For example, one might need a search engine for a private Intra-web. You may have a need to index the web in a different manner than those proprietary methods depicted by Google, Yahoo, or MSN. Smaller search engines today that do not get as much traffic as the big search engines can still be found very useful in some niches.

Find the Weather on Google by Typing W

I can now find out the weather, where I am, on Google with a single keystroke, by typing the letter ‘w’. Sound simple enough? This is a result of Google’s latest product launch Google Instant. Google Instant was launched on September 8th, 2010 by Google and claims that it will forever change the way people search.

VP of Search Products and User Experience, Marissa Mayer, hosted the Google Instant Launch event, along with Ben Gomes, at the San Francisco Museum of Modern Arts this September. Marissa spoke about Google’s continued efforts in renewing quality and user experience at Google and how Google Instant ties in to those objectives.

Do you remember the Google Doodle with the bubbles just before the launch of Google Instant? The NY Times and other news syndicates even published stories about the Doodle wondering what Google meant by creating such a strange Doodle with no explanation as to its meaning or purpose. This Doodle did stir up a lot of propaganda and Google made note of this in its launch event.

The engineers behind Google Instant spoke at the launch and explained the tremendous amount of time and effort it took to build such a powerful product and all the challenges it entailed. Everything Google does makes it easier for users to search. With Instant, Google can now saves its users over a billion hours every year by this new enhancement in Google Search. The estimate is based on the product saving the average user 2-5 seconds per search, multiplied by several billion searches that are performed on Google every day. The improved interface and infrastructure is not only fast, but predictive in its understanding of search queries and their relevance.

Because Google tries to keep its main product family-friendly and safe for its users the Instant search feature does not return results for queries that may be related to adult content, for example. A guest at the event made note of this as she asked why her last name would not return results in Google Instant. The obvious answer being that the last name Slutsky contains the partial search term “slut”, which does return results for potential porn related sites, images, videos, etc… When Google Instant is unable to return results based on those unsafe search queries it will instead ask you to press enter to see the search results.

You can still click the Search button of course, but Google has added features to make search even easier by adding keyboard navigation. The up and down arrow keys will now run the search queries for the predictions returned by Instant and the right arrow key will function as the “I’m Feeling Lucky” button. The tab key acts as the auto-complete, which also runs your query through Google Instant.

Since the introduction of the new interface does entail less mouse usage such as clicking buttons and scrolling through search results, Google plans to release this product to its mobile and other platforms within the coming months. This makes for less use of both typing and clicking in order to find what you’re looking for and anyone that uses a mobile phone to browse the Internet knows that isn’t always easy. Even with phones that offer qwerty keyboards, the process of having to type out long queries on Google may not be a quick task. This affirms Google’s intent to make the user experience easier, faster, and more meaningful, as co-founder Larry Page defines the worlds best search engine as ‘something that knows exactly what you mean and gives you back exactly what you need‘.

It’s important to note that Google is not search as you type. In fact, the actual search begins before your typing is complete. Since typing the letter ‘w’, all on its own, in Google has already alerted Google Squared to find the weather for my area (based on the Geo-location of my IP address), and generated other multiple possibilities for what I may be searching for, the search process may be completed before I’ve finished entering my query. This is what makes Instant so much different from a simple auto-complete search box. The results aren’t just based on all the possibilities in the index that begin with the letter w, but in fact, more predictively, what the most relevant search query would be based on the information available to Google so far. The odds that someone typing ‘w’ was actually searching for ‘w’ are so minute they aren’t worth considering, but of course if you really did mean to search for ‘w’ you can always hit enter or click on the search button to have Google display the results for that exact search query. Instant can always be turned off by clicking the link to the right of your search box, but I prefer to get my search results even faster from Google as I have found Instant to be very useful and interactive so far.