Sherif's Tech Blog

Just another guy on the Internet with a keyboard…

The Internet Blackout: SOPA

Today a large number of popular websites have participated in the Internet Blackout used as a protest against SOPA. It seems that while some websites have decided to block their service entirely from the public in order to raise awareness about the SOPA act and in an urgent call to get Americans to act, others have taken a more subtle approach.

For example, websites like reddit, mozilla, imgur, and wikipedia‘s English site have chosen to completely blackout from the public. While wikipedia is easier to get around than others sites like reddit, mozilla, and imgur are pretty much going to leave you empty handed for the day. Google simply blacked out their logo today and provide a direct link on their front page. If you’re visiting Google for the first time today you might have been redirected to the announcement page to contact your local senator. Beyond that they don’t seem to nag you like other sites are doing. Sites like WordPress, php.net, and Google took an elegant, but effective approach to the blackout. They demonstrated their anti-SOPA while still keeping their visitors happy. Sites like Bing and Yahoo, however, don’t seem to have participated in today’s blackout in any noticeable way.

Internet Blackout - SOPA

Internet Blackout - SOPA

Read more about the SOPA Strike here.

What Programming Language Should I Learn

So you want to start learning a programming language? The first question you might have is what language should you start with. Unlike when we’re born where we don’t get to pick the first language that we’ll start learning to speak, read, and write in – in the computer science world you have a choice. However, as a programmer you have a vast array of languages to chose from and sometimes I find people ask me “which is the easiest?

The truth is whoever you ask will tell you their language is the easiest, best, most powerful, or whatever reason they can think of for you to learn and use that language! As a joke I posted this video on youtube along with others about individual languages, but this one seemed to get the most hits. Is it that people are very inclined to find the one ultimate programming language or just laugh about the rest? Who really knows, but it’s funny…

This is kind of like asking a multilingual person which language they think I’ll find the easiest to learn. While there may be some valid answers, most of them will probably be subjective and they won’t answer the true question one should be asking. That is, what am I going to be using the language for. Just like when you decide to learn French, because you’re either moving to France, or would like to communicate with someone who speaks French, similarly, you pick a programming language because you would like to communicate with a computer in a way that meets your objectives. What you intend to do with the language, and what it can do for you, however, may not always be so apparent at first.

I only recently realized how truly complicated it may be for someone who does not come from a programming or technical background to actually chose their first programming language. Having recently examined a massive comprehensive list of programming languages on wikipedia I found that there are currently over 600 programming languages to chose from and that doesn’t even include the more than 300 dialects of BASIC and some other various esoteric languages. This also probably doesn’t account for some of the lesser-known dialects or derivatives of some of these languages. Since not all programming languages have an official specification they very well may be implemented in dozens of different ways in smaller niches.

If we look at programming languages broken up into categorical or even chronological lists the information still doesn’t help make the choice any easier; or even more useful. However, if you simply attribute languages to their strongest generational origins you can narrow the list down to just a few dozen languages. If you take away some of the older generations and put emphasis on those languages highest in popularity and active use/development you come up with just a few languages and their respective dialects. However, this still isn’t informative enough to help someone decide their first language so I split up languages based on their strongest usage and in the world of computer science this comes down to two broad categories (systems programming and utility or application programming). The most notable distinction between these two types of programming is that systems programming aims to provide software to communicate with hardware while application programming usually aims to provide software for the user that’s sitting at the computer. So while software like a text editor or word processor is considered application software, software like a disk formatting/partitioning utility is considered systems software. Your operating system has to deal with the hardware in your computer directly in order to provide application programs with a means to do things like write to your computers memory while the operating systems page table and memory manager can control how this hardware is being used by the various application programs.

C

By far, still one of the most popular programming languages still around and even though it is still used by many to develop application software it hasn’t lost its popularity or its power.

C is not a language you usually pick to write every-day utility applications. If you chose to start learning C be prepared to start learning a lot of other systems programming concepts and technical hardware documentation as well. Most Computer Science majors take C as one of their first programming language courses in college. This is important, because there is a huge amount of software that’s written in C. For example, most operating system software is either written in C. There may be some C++ in there, but for the most part you’ll find a lot of linux distributions are made up of a huge amount of C code and much smaller portions written in either C++ or some other similar language. You may hear about Assembly language as well when learning or working with C. Essentially, when a C program is compiled into a native binary and run as an executable program it is technically transported to Assembler. You take a high-level language like C and, eventually, to get it to run on the machine it has to become low-level software in bytecodes the machine architecture can execute. C is still a high-level programming language, but it has also been recognized for its lack of agility in relation to programming languages like assembly which is a low-level programming language. Don’t let this confuse you, however, C is a powerful language and in fact many of the popular languages you will likely hear about or discover in this article were written in C. For example, PHP, Python, and Java are languages whose APIs and extensions were written in C.

However, C can be tough. Writing non-buggy C code is costly. It can take a lot of time, because you either have to find the libraries you need and implement them or write them yourself. C is a procedural and somewhat imperative language. It also teaches concurrent programming and programming with side-effects, which is very different from languages like Scheme where you program without side-effects. C programs are like one big global scope where everything can effect everything else. So you have to be very careful about managing your memory in C. You have to worry about pointers and references and data types everywhere in your code. You have the basic constructs like IFs, and loops, and functions, but ultimately you have to learn to do a lot of things other programming languages can make a lot easier, because they already have extensions that implement a lot of these popular C libraries built right into the language.

So, unless you plan on designing an API for a larger program or build some system utility C may not be the right language for you to learn. If you’re a compsci major you’re probably going to learn it as your first language whether you want to or not, but lets face it you chose the degree…

BASIC

BASIC has been around for quite a while as well and it has hundreds of dialects. It was popularized by many hobbyists during the 80s and grew further in popularity on Windows during the 90s with Microsoft’s Visual Basic suite that attempted to keep the language as simple but as powerful as possible. BASIC is not very difficult to learn, but it is also a compiled language like C and has declined in popularity over the last decade. It might not be the best language to work with, but it is still high on the hobbyists list. Much like languages that were once popular to learn just as a hobby and were fun to play with (like LOGO which was a dialect of LISP) not many people take it seriously.

BASIC has the essential control structures you’d find in almost any language like IFs, loops, and GOTOs, but it was fundamentally built on the concept of sequential programming where the entire program is built on one huge sequence of instructions. There are subroutines (like functions) and some dialects implement a lot of other modern features, but for the most part it’s great for when you want to learn programming for fun. If you’re serious about building cross-platform or enterprise-level applications BASIC is far from a first choice.

Java

Java and its other Java-based languages stand out for their compile-once run anywhere trait as opposed to many other compiled languages where you write the code once and then have to compile it for each different platform you chose to run it on. With Java, if you chose to compile your code to Java bytecodes to run in the JVM you will only need to compile it once. The JVM (or the Java Virtual Machine) can pretty much run on any platform (Windows, Linux, MAC OS, etc…) and works with the systems hardware directly through its VM. This enables programmers to be able to compile their Java code on any machine just once and it will run on any other machine in virtually any platform without having to recompile for that specific platform. Java can also be run in JRE (or the Java Runtime Enviornment) so it works as an interpreted language as well. Java’s popularity hasn’t declined much over the years and it’s gained quite the reputation with later adopting open source initiatives.

Java is also popularly taught in compsci courses in colleges, institutes, and universities around the world. It’s similar to C in that it is a statically typed language and has functions and basic loops and other constructs. However, Java is an object-oriented language. C is pretty much procedural in paradigm. You can build structs and things in C, but Java makes abstraction a whole lot easier with its OOP features. You can get a whole lot more done in development in a fraction of the time it might take you to do the same in C. So developing day-to-day applications in Java is a lot more common than with C. It’s just that a lot of the folks that have learned C and know it well have stuck to it over the decades and continue using it. Java is a much newer language. It appeared around the mid 90′s, but it has proven itself in the last 16 years or so. C has been around since the early 70′s and hasn’t changed much. The most current standard of C is C11, its predecessor was C99. Java is at Standard Edition 7.

Java is also considered a fast and secure language for a number of reasons. It is skeptical whether or not all of these reasons hold true, but for the most part they’re built on some solid grounds. First, Java code runs in the JVM, or the Java Vritual Machine, which means the VM can check the compiled bytecodes of the program and make sure they’re valid Java bytecodes before running or executing the code. Second, Java code is cross-platform so it easily translates to the same machine code across different platforms without much concern over the implemented libraries. Java is expected to be very performant because of its JVM. This means your Java programs run directly in a virtual machine that sits on top of the hardware layer allowing direct hardware implementations and interfaces as opposed to some other VM concepts where the program runs in the VM that runs on top of operating system or its implemented libraries that runs on top of the hardware. There’s somewhat of a more direct interaction there. Between Java and some other scripting languages like Perl, Python or PHP – this might be an advantage, but between C and Java it can go either way. In most cases C would easily out-perform Java, but in a few cases it might go the other way around.

PHP

PHP is probably the most popular language on the web. It has many followers and a huge open source community. It’s an interpreted language that was originally developed for producing dynamic web pages. However, today it is seen as a general purpose language. What makes PHP so great is that it works very well with web servers. You can install it as a web server module or run it on the command line. It has many useful built-in features that make web development easier right out of the box. PHP is also built on share-nothing architecture so it scales very easily and doesn’t require much configuration. It offers automatic memory management and it’s somewhat loosely typed so its data types may not be very suitable for edge cases, but that can be debated. For most general purposes PHP works great, but like BASIC it attracts a lot of hobbyists given that it lowers the bar of entry.

Unlike with C, in PHP you do not have to worry about managing your own memory. You can easily build data structures, facilitate external resources to databases or other libraries directly through the PHP extensions, and generate output to standard streams without a lot of fuss. It’s easy to take a general idea and implement it in PHP very quickly. Most people do this with Python and Perl as well to get a working prototype up and running. However, if you build a lot of prototypes, you know that they end up getting tossed out when you start building the real thing. Regardless, PHP is a great language to get code working quickly and very similar in syntax to languages like C and Perl. However, the down side is that these languages are also considered very ugly and have many extensions with poor implementations or interfaces or leaky memory. Not everything about PHP or Perl is great, but it works. At the end of the day it takes a fraction of the time to write PHP or Perl code that would do the same thing in languages like C and with less possibility of bugs since these languages are usually very forgiving and try to account for user error where possible.

PHP is extended by C and is built around the Zend Engine, which is the PHP Virtual Machine. PHP has different SAPIs, or Server APIs, for different web servers and platforms. Among the most popular are probably the Apache httpd module, which is known as mod_php and the fastcgi /fcgi SAPIs. The difference between the two is basically like running PHP inside your webserver as a part of the webserver program (mod_php), and running another program along-side of your webserver that interfaces with it through a CGI (Common Gateway Interface), which is what the cgi/fastcgi SAPIs are built around. There are lots of different implementations, but the module running as a part of the webserver usually trumps the others in performance and scale. PHP also has a CLI SAPI, which allows you to run PHP directly from the command line. You could use this to build command-line scripts like the popular BASH scripting language, on *nix shells. However, most people don’t use PHP to build command-line programs. It’s not the most performant programming language, but it works well for things like the web where you want to build dynamic websites or applications. Just tiny programs that execute for a very short period of time and run independently of one another. When you look into building things like long-running daemons, you usually turn-away from PHP and head for languages like C or even Java.

Other General Purpose Languages

There are many languages considered for both web development and as general purpose languages that are also dynamically or loosely typed and offer automatic memory management and even web server modules just like PHP. Languages like Python, Perl, and Ruby are also exceedingly popular and quite similar to PHP in many ways though they are not all based on the same generational languages. Of course shell scripting is also going to fit under general purpose in most cases and so Bash, sed, AWK, etc.. are also great languages to know.

To some people’s surprise, javascript is now becoming somewhat of a general purpose language itself. Recent VM implementations like Node.js make using javascript faster and a little more powerful than some of its earlier ancestors. One of the best things about javascript is it’s non-blocking nature and event-driven capabilities. It’s a great language for automating event-driven tasks by setting up listeners and such. It’s got a lot of uses on the web and offers multiple paradigms as well.

Beyond

Beyond just looking at what all of these programming languages can do for you it’s important to realize one language isn’t always enough to do what you need. If you’re going to start learning a programming language it’s easier to pick one that won’t require a lot of time to setup and configure. Something llike Python or PHP or even javascript is easy to just install and start writing code and the best part is you can just run that code instantly without having to compile anything and see the result right away. These languages aren’t very hard to learn because they have a lot of free online resources, documentation, and a lot of people already use them so you shouldn’t have too much trouble finding quick tutorials or examples of code that show you how to write short and useful programs. But of course your mileage may vary!

Over time, when you have learned your first programming language very well you may find the need to do some things that aren’t always very easy or even possible with that language (or you may never experience this depending on the language and what you’re doing). This may lead you to start using another language in place of or along side of that language for a similar project or a different project. If you’re a hobbyist doing this for fun you might not be so inclined to learn more languages, but if you’re a professional you will probably need to learn many languages over the years. It doesn’t hurt to have a long list of programming languages on your resume for a job and it certainly won’t hurt to already have some experience with a language you’ll be using on a new project at work. However, most programmers will be quite proficient in just two or three languages and have some overall understanding of others. This is usually all you need in the majority of cases.

Browsing the Web

When you browse the web today, compared to just 12 years ago, you find it’s a much more competitive market place. There wasn’t much commercialization taking place on the web in the mid to late 90’s. eBay and amazon were just about it; as far as consumer-based competition was concerned. The rest of the dot com start-ups were trying to make their money by reaching large audiences very quickly and profiting from that reach through advertisers — at the time being the most interested parties in said reach.

Some Things Never Change

Surprisingly enough though, some things haven’t changed much since. Many of these companies are still trying to gain a larger portion of the advertising market share through the web. When you consider some of the most popular websites on the web today, namely Google, facebook, yahoo, twitter, and probably still myspace, you realize all of these companies have something in common. They’re all trying to make a buck by getting the attention of parties interested in spending lots of ad dollars. Who are these advertisers? Pretty much any entity that is willing to pay money for exposing it’s advertisements to people.Google seems to have made the biggest stride in this area with more than 98% of it’s revenues coming directly from advertising over the last decade. Facebook won’t disclose any real financial data publicly, yet. So there’s no telling for sure if they are making any significant profits from their advertising revenues just yet. Yahoo at some point had seemed to be succeeding in gaining a significant portion of this market share, but apparently has bombed recently as its stocks have plummeted and it’s advertising business slowly starting to fall apart. As for twitter and myspace, they’re still both privately owned companies with little public financial disclosure, but – much like facebook – while they appear to be quite popular they don’t appear to have made any significant strides in advertising either (myspace still getting the bulk of its advertising from Google Adsense).

Looks Can Be Deceiving

Sometimes looks can be quite deceiving, though. For instance, when you consider some of the raw numbers that these companies claim – head held high – like the number of users they serve, or the number of people they employee, or the amount of revenues they generate, not all of it is put into clear perspective.

For eaxmple, facebook, recently claimed it had 800 million users. This is indeed a huge number of users. If you were to compare it to the average website on the web today or even the average corporation, this makes it seem like a huge company. But facebook only employs around 2,000 people and none of their users pay a single dime for using the service. The service is free so clearly 800 million people aren’t facebook’s customers. Then again what does facebook actually do for its 800 million users? Well, we know a lot about what they don’t do. They certainly don’t write all of the software that their users come back for so often. There are tens of thousands of facebook developers that either develop facebook apps or build on top of the facebook platform just for the benefit of having access to the huge social graph that gathers under the umbrella that is facebook. So nothing about their software actually makes their service unique per se, just that they have access to a huge userbase which interests a lot of parties. They also don’t do much of anything significant with what software they have built. If we consider their largest and probably most significant work it would have to be Casandra. HipHop might probably rank second. Their SDK is actually pretty much poorly documented and poorly supported, for the most part. Photos? Well, they do claim to have the largest photo sharing application on the web, but then again flickr isn’t far behind. And it’s not like facebook photos does anything really interesting with the actual photos apart from tagging (clearly their most popular feature). Other sites have actually done a far better job with photo sharing than facebook even though they may not be as popular. So in all reality these numbers – once put into some perspective – actually indicate pretty negative things about facebook even though it tries to put them in a positive light with a flashy spin (in hope that this will only boost their already growing popularity of course). The reality, however, is that this only indicates how under-staffed, under-paid, and over-commited facebook really is.

Here are some charts to demonstrate what these numbers actually indicate with some contrast for visualization.

Even if we consider every single user on facebook to be a customer of facebook, as a company, this would mean they are gaining the attention of about half the number of customers Microsoft currently attracts with it’s dominant Windows Operating system market share. They are neck and neck with Google in terms of users, but facebook claims to have surpassed Google in pageviews. There are probably very good reasons as to why that is. Let’s not forget people who visit facebook are usually their to do a lot of things. Chat with friends, browse every photo their friend ever took of their toe-nail, armpit, mirror-shot, bar-hoping-night, passed-out-magic-marker-art, and just about every other pointless moment their friends have ever captured on camera and uploaded to facebook for the world to see, not to mention the hundreds of wall-posts, messages, and I-Like hits the site gets every day from all of its users. In a single second facebook will probably have received around 2,000 photo uploads from its users. In about the time it has taken me to write this blog post they will probably have gotten around 3 Million new photo uploads. Each one of those uploads will generate a page view. Each time you play farmville or use an app that lets you know when your friend made a funny face or tried to some-how through the powers of dark magic poke you over TCP/IP that also generates what facebook considers a page-view.

The point I’m trying to make here is that facebook is clearly a content-provider. But just like Google, they aren’t the ones producing all of the content. However, unlike Google, facebook does actually facilitate a place for you to store and share all of this content and encourages it profusely by not setting a lot of stringent restrictions on how much content you can share.


If we took the amount of content users shared as an indication that a company was powerful in reach, however, we would have to say Hotmail is probably more of a social networking tool than facebook and twitter combined. Consider that Hotmail has only around 364 Million users and they share billions of emails every day – this would indicate more active sharing than facebook has ever openly claimed – even though Hotmail has less than half the userbase. Additionally if we compare the number of developers that actually work on supporting the service that these users are making use of facebook clearly has a higher developer to user load ratio than any of these other companies.

Stepping Outside of the Browser for a Minute

So it’s not clear that twitter, facebook, myspace or any of these other very popular social networking sites, are actually proving to be nearly as profitable as they are popular, despite them clearly being a great tool for people to communicate. The reasons for this are mainly the advertising risk. Twitter, facebook, myspace all clearly strong in numbers (any company that has to support a site that has users in the millions is nothing to laugh at), are still weak in strategy. There’s no good reason for a serious advertising (with billions of dollars to invest) to make long-term commitments to a user base that may very well disappear in a few years.

If you think of what Twitter is really all about, for example, it’s like reminds me of when we used to pass short little notes around in class during high school on tiny snippets of paper with messages like “see you at lunch…” or something silly like that. You basically send messages to people who choose to read them in tiny bite-sized pieces. This is nothing fascinating. The only reason the service is even popular is because a lot of people – at one point – found their friends using it (just  like myspace and facebook evolved) and decided to use it as well. That’s the thing about these social-based sites. They are easy to gather around and just as easy to turn away from. Eventually some one gets bored and a collapse of the social graph brings down the whole thing.

Google, on the other hand, doesn’t actually want to keep you within its borders all day long. It actually wants to get you the information you were looking for and out the door as quickly as possible with little to no distraction along the way. Quite the opposite from what facebook aims to do, because facebook understands it’s only value is in how long its users chose to stay. So they will do everything in their power to keep their users there longer. Google knows its users only come back because they know something else is out there to be found and that Google is probably going to find it for them more quickly (since it’s done so numerous times in the past with great success).

If you’re looking for a nice read on the subject of Google – by the way – I’d recommend this title right off my recent bookshelf Googled The End of the World as We Know It by Ken Auletta

Googled The End of the World as we Know It - Ken Auletta

Googled

It’s Not Browsing, It’s Searching

It makes sense that someone would turn to their friends or colleagues, etc, to find out about which of the latest blockbuster movies is popular when deciding on purhcasing a DVD or paying a visit to the movie theater, for example. But it might not make a whole lot of sense that someone would spend their time searching facebook when they’re looking for information on World War II history, or a photo of the point contact transistor at Bell Labs in 1947. People are probably more likely to turn to Google or Wikipedia for that type of search. Even though I have no doubt people are probably starting conversations on facebook groups or posting messages on people’s walls with very similar questions as you read this. To be fair, however, facebook and twitter probably do have good uses in searching for information as well. For example, you might hear about some new trend or some recent local, national, or even global event that just took place on one of these social networking sites, but it’s not like you can’t hear about that on the news either.

It’s just people’s naturally insatiable curiosity that won’t stop them from asking questions or looking for answers. This is why Google has become so popular today. They found the ultimate way to keep users coming back for more. Facebook, while it appears to have done the same thing, it apparently does it for all the wrong reasons. Google’s mission is to solve the problem of search (people are always asking questions about something) by making use of the worlds information as it becomes available (i.e. the web, books, news papers, television, etc…). However, facebook’s mission (while not even clear at this stage) seems to publicly say it’s attempting to do something fancy with something called the “social graph” (yes I really put a quote-unqoute on that). However, internally, facebook developers aren’t really sure what they’re doing. They seem to blog a lot about all these neat new technologies the company is getting its hands on, but what about this social graph? What does it do? How does it really make our lives better? Because we can communicate with our friends? No, that can’t be it. We’ve been communicating with our friends long before facebook ever showed up. Is it because facebook makes it easier to see what our friends are doing? No, that can’t be it either, because our friends still have to show us what they’re doing for us to see it (so instead of you seeing your friend getting drunk at the bar you get to see them in a photo on facebook drunk at the bar). Clearly, that isn’t making anyone’s life better (at least not yet). Is it because facebook offers a platform where you can connect millions of people and extract large quantities of useful information out of this so-called social graph? Well, to whose benefit is that, exactly? I don’t know about you, but I really don’t want everyone having access to all of my information just because I want to use their app to see what books my friends are reading (and yes there are very little to virtually no restrictions on how much information each app you use can access about you). I really don’t care to know how many people poke how many other people every day instead of getting off the computer or pulling their eyes away from their smart phone for a minute to take a look at the people in their immediate presence for a change. I don’t find anything about that useful, do you? Perhaps some might, but who? It’s probably not the average joe. It would probably be someone with a serious vested interest in you. Someone like say a company that produces hundred dollar designer jeans and seeks yet another way to pry into your personal details to figure out just how much influence they can muster to get you and all of your friends to buy those really expensive jeans that will just make your butt look spectacular.

At the end of the day, the numbers speak for themselves.

But, you’re just browsing the web. It’s not like any of this occurs to the average person on a regular basis as they “just browse the web”…

Viral Videos and the Web

There’s a lot of power in viral videos on the web today. You can express an opinion or a thought and deliver to millions of people around the world with just a few minutes or even seconds of video. Youtube didn’t become popular because it did something revolutionary with video or because it developed any significant technology that made video better on the web during it’s early days. In fact, it grew too large too quickly for it to withstand the demand of its users and thus took on the deal with Google to maintain funding and the backing of a web startup that had the infrastructure necessary to expand their service. Youtube did, however, put the power of speech (or in this case video), back into the hands of the average person.

Television networks have been broadcasting what they see fit for decades before the web and the Internet ever came along. They do some market research, try to figure out what people want to watch and what forms of entertainment are most demanded and then they try to figure out a way to produce that content and broadcast it so that they can make a profit. There’s a key difference between that and what youtube did. Sure, there are plenty of users on youtube that will still try to submit some copyright-infringing video clip that some big-time production studio will try to get taken off, but there are also plenty of videos on youtube that are completely user-generated content. Like “Charilie Bit My Finger” as of today that video has gotten more than 377 million hits in just around four and a half years. That’s an average of 2 to 3 hits per second for four and a half years. So why are people clicking on Charlie Bit My Finger two or three times per second for years to watch two child and infant in a home video? For the same reasons millions of people watched America’s Funniest Home Videos on broadcast network television for decades. Except that it isn’t America’s Funniest Home Videos anymore and it isn’t owned by any network and there isn’ta  T.V Guide listing for when the show will air. There also isn’t the intervention of a studio editing your video. Individual users choose to share their own videos and the whole world decides for themselves if they’d like to watch.

Now, I’m going to assume that since hundreds of thousands of people have voted this video up over the years (or “liked” the video more than they “disliked” it) that the general audience finds this entertaining. But we didn’t have to pay an executive a six-figure salary and hire a marketing team to spend tens of thousands of man hours to figure this out in order to get there. It just works… Because people let it work.

So Viral Videos do have a significant impact on the web today. We apparently like watching other people or at least watching some creative expression of what they have to say. Whether that’s serious, comedic, or for any other various purposes. Sites like GoAnimate.com are pretty popular today. Allowing you to easily make and distribute your own animated videos and take your blog viral. I just wanted to include a brief demonstration of how easy it is to put your own videos on the web these days. It took me exactly 3 minutes to signup for a free account at GoAnimate.com, produce this video, and post it on my blog. But shooting your own home videos can also be just as easy.



GoAnimate.com: Facebook Changes Everything by GoogleGuy

Like it? Create your own at GoAnimate.com. It’s free and fun!

How Much Storage Does the World Really Need?

The question of exactly how much storage space humanity needs is a rather tough one to answer, because both our needs and our technologies change on a rather erratic basis. However, I believe that a single Yottabyte of storage space is sufficient to store all of the world’s data. In order to thoroughly explain how I arrived at this answer I’ll have to show you where technology – around storage space – began, where it has arrived today, and further explore possible avenues it may head down in the future.

Some of the underlying factors that depict how our needs and technologies change and shift our determination of digital storage may include the following observations… As an example, compression was not as prevalent in daily computer usage just a few decade ago — as it is today. Now, due to the advances in compression algorithms we can store images with millions of pixels and video at incredibly high resolutions in just a fraction of the space it took only 15 years ago. Not only that, but we also rely on compression to transmit data, over networks and between devices, faster. In the 1990s not everyone cared to store or watch video on their computers or consumed nearly as much video media as they do today. DVDs replaced VHS cassettes in just a few years and CD-players have been eradicated by hand-held devices like the iPod. So lets review some history of storage technology and jump into modern day storage.

Punch-cards to Magnetics to Flash

Punch cards (or punched cards) actually trace back to the 19th century long before the invention of the modern PC and were used as late as the 20th century in computer storage. They are rather cumbersome to both produce (write) and consume (read). They would not be capable of efficiently storing nearly the amount of data the average person stores today. Later, as computers became more and more wide-spread in everyday use, we moved to magnetic tapes. The problem with this type of media storage device is that it’s slow and prone to failure. Just a few decades ago we came up with a bit more resilient form of storage, also based on magnets, called HDDs (or hard-disk drives), which are made up of magnetic platters and pins that read and write to those magnetic platters. They were a lot more durable and lasted far longer than magnetic tapes, but even more recently in the past decade or so we came up with Flash drive technology. It’s name says a lot about how it works and it was derived from the same technologies that made flash memory possible (like your BIOS or other read-only memory chips). The memory is stored by flashing, which requires negatively charged electrons to pass through parts of the device that allow certain gates to open or close which creates a representation of digital storage. This type of memory is evidently the cheapest, but also the slowest and least reliable form of storage we have today.
Because flash drives wear down quicker over time due to the nature of how are they are built we normally don’t rely on them for long-term storage or mission-critical storage media. Let’s put it this way, you won’t find banks relying on flash-drives to store their financial data any time soon. However, because storage is cheap and easily replaceable we can expect that the average person will replace their hard-drive once every three to four years if not more. This is about how long it can take for a hard-drive to fail or start showing signs of failure.
The future may be in flash, but until we can build infrastructure to maintain the speeds and reliability that’s presently demanded from modern storage needs they may be replaced with other future technology. It’s hard to say where we will end up, but at present SSD and SATAII drives offer the speeds and reliability most of us require.

What is a Yottabyte

First thing’s first; If you’re not very computer savvy a yottabyte is 2ˆ80 bytes, or to put this in terms some of us may be more familiar with, it’s equivalent to about 1 trillion Terabytes. To give you a perspective of just how much storage space that is — consider that the average notebook or netbook device comes factory-standard with around 200-500 GB (Gigabytes) of storage space and the average consumer desktop PC usually comes with a standard 500GBs to 1TB of storage. Since it’s not unusual for many of us to own both a laptop (or notebook computer) and a desktop PC we can assume that the average person normally has around a terabyte of storage space at their personal disposal from their combined personal computer devices. Consider that many of us also own smart phones these days – with such an increase in the smart-phone market – and these devices too can also come factory-ready with several dozen GBs of storage. This means that it’s not unrealistic for the average person to consume more than a TB of storage just for personal use.

According to census from the US and other governments there are an estimated seven billion people in the world today. Now, it is not evident that all of them have access to or are capable of using computer devices. However, if we pretend that every man, woman and child on earth were to have access to a computer and are capable of using one we can allocate a specific portion of storage space for their personal consumption based on some modern usage statistics we’ve come to know today. If we divide the Yottabyte of storage space I estimated for humanity equally amongst every person on earth we can allocate around 157-158 terabytes of storage space for everyones personal consumption. This is actually several orders of magnitude greater than what most people have access to in storage today.

What Does Everyone Store?

Everyones needs are different, but basically we can break down storage based on types of media consumed on a regular basis.

  • Photos
  • Video
  • Audio
  • Software programs such as applications, games, etc…
  • Documents or raw data

This pretty much covers the broad categories we can fit data into when we consider personal usage. Everybody loves taking pictures today. That’s evident from the 800 million facebook members that upload photos to share online at a rate of over 100 million photos per day. People also love watching video on their computers. That’s also evident from the hundreds of millions of videos being uploaded to Youtube every month. There’s no-doubt we love games and gaming consoles too; based on the millions of popular game consoles being sold over the last few years. Another thing we also seem to love on a rather unanimous basis are documents like email. Whether you store word documents on your computer for school, work, or other miscellaneous uses you are likely to transmit them from one device to another at any given time. We share text messages, IMs, and video conference on a regular basis. Many of us store tabular data at work like spread-sheet documents. Of course everyone is in need of utility software like word processors, spreadsheet programs like Excel, presentation software, operating system software, video or photo editing software, and much more in order to make use of all this data being stored. So, if we evenly break up the allocated storage space per person into these 5 broad categories we can say that roughly 32 TiB of physical storage should be plenty to retain each of these types of data.

Putting it into Perspective

To give you a more comprehensible picture of what this amount of storage can hold let’s look at it in relative terms we use today. Let’s see what we can store on 32 Terabytes of space:

  • Photos
    • We can store more than 4 million 20 Megapixel high-resolution photos
      • Consider that an iPhone or most smart-phone cameras are only capable of around 2 or 5 megapixels and that a 2 megapixel image can fill a screen resolution of 1900×1080 pixels.
  • Video
    • We can store more than 7,000 hours of 1080p Bluray digital video.
    • Similarly that’s around 16,000 hours of DVD quality video in 720p or about 8,000 DVD movies.
  • Audio
    • We can store roughly 50,000 hours of CD-quality Audio
    • With MP3 or MP4 compression that can translate to around 150,000 hours. So basically you can store a music library of several hundred thousand titles with this amount of space. That’s likely more than most Studios and Radio stations keep on hand at a single location.
  • Software
    • We can also store thousands of computer programs and games – if not tens of thousands (more than the average person usually stores today).
    • Similarly this is like being able to store Microsoft Windows 7 — 2,000 times over
  • Documents
    • You can potentially store several million, or even billion, documents with this amount of space. Considering that documents vary greatly in size – depending of course on how much data you store in each document – we can still safely assume that if the medial person stores no more than a few megabytes of data per document on average this gives us plenty of room to store an archive of documents for an entire lifetime.

Now, I don’t know about you, but I actually don’t even have the time to take 4 million photos, let alone the time to spend looking through them. At an average rate of just 1 second per photo viewing a photo album of that size alone would require nearly 7 weeks of looking through photos without time to eat, sleep, or even go to the bathroom. Extrapolate that to the amount of time it would require to snap that many pictures, store them, upload them, edit and organize them and we’re probably looking at a decades worth of photo archives for a single person. That’s plenty by my estimates. If you consider compression and storing these photos in even lower resolutions we can push that number to nearly 100 million photos without sacrificing too much in quality.

Looking Into the Future

The actual problem with storage isn’t that we don’t have enough of it or that we can’t afford to get more, because storage is actually very cheap. To give you an idea I bought a 2TB external USB hard-drive a few weeks ago for just under $100 and you can get even faster storage media in the terabytes at incredibly low prices today. In fact, HDDs cost pennies on the dollar what they used to just 8 years ago. The problem of storage today is that we don’t have the infrastructure in place to make it accessible fast enough. Consider that if you have a camera phone you may also have a digital camera and even a cam-corder. Now you likely also have a desktop or laptop computer to which you transfer your photos and videos. It’s not enough to store this data locally, but we upload and share our photos and videos online as well. On top of that hard-drives are prone to failure. It’s no longer a question of if a disk will fail, but when. So we create backups in multiple locations in order to preserve our data. Now we have the problem of data-synchronicity. How do you know which device or location stores the most up-to-date version of the file you’ve transmitted from one location to another? This is where some say storing your data on the cloud makes sense. It’s because data centers are built with redundant storage capabilities likes RAID arrays that can sync quickly and consistently. They also have disaster recovery scenarios in place with offsite or tape backup facilities where necessary.

It’s not just about the hardware capability either, but software capability also plays an equally important role in keeping data in sync. When I store my word documents on Google Docs I have the freedom to modify and restore them from virtually any location or device as long as I have access to the Internet. I can even share them and modify them with others in real time and retain a revision history. This is great and it’s fast, but word documents are cheap. They don’t take up much space. We can’t say the same for other media like audio or video which are keen to take up far more space. If you don’t have a high-bandwidth connection you can’t exactly stream DVD quality video in real time. What’s even more prevalent is that you won’t find it feasible to transfer several hundred DVDs even with high-speed Internet like Verizon FiOS and Comcast’s Xfinity (capable of up to 50Mbps and 10Mbps download and upload speeds respectively). Even with these speeds it’d take you several days to backup a few dozen of your favorite DVD movies to an online backup facility.

Google Introduced an idea to make high-speed Internet in the home possible. I’m not just talking about 50 or 100 Megabit speeds like the cable companies and ISPs are offering today, but up to 1 Gigabit of bandwidth for the average person. Let me remind you that even some small-businesses don’t have this type of bandwidth capability today.

However, with that kind of bandwidth capacity you’d be able to stream  Bluray quality video in 1080p and simultaneously transfer several terabytes of data over the Internet everyday with ease. In fact those speeds are so fast that if they were any faster your hard drive’s write speeds might not be able to keep up with your network transfer rates.

Being able to share data has it’s advantages. Similarly having to worry about local vs. online storage shares trade-offs that we are only beginning to tackle. Whether we will make a permanent move to the cloud or not remains to be seen. What we do know so far is that no single Data Center in the world has exceeded the Zetabyte threshold for storage capacity; not even Google! A Zetabyte is 1,000 Exabytes or 1 Million Petabytes and the Petabyte is 1,000 Terabytes. So in order to reach a Yottabyte of world storage space we’d have to have 1,000 or more Data Centers consuming about a Zetabyte of storage. That’s not accounting for personal storage devices like your laptops, desktops, smart phones, etc… There are no concrete figures for me to prove whether or not the world indeed has reached or exceeded one Yottabyte in storage space with all these devices and data centers combined simply because it’s hard to say who’s actually using what amount of space on their personal storage devices at any given time… However, it’s been estimated that the world has pushed the Zetabyte last year in 2010 based on data centers and Internet usage metrics and considering the number of personal computing devices sold over that time period. It’s safe to assume we are just one order of magnitude closer to the next threshold.

With that said I don’t think we will continue to consume greater amounts of storage space on a global level once we’ve reached the Yottabyte treshold. I think at that threshold technology will have developed ways to consume that data more easily without the need for growing storage. This is based on things like memory expansion and the growing trends of cloud storage or Internet bandwidth increases and even compression ratios. I wrote an article about how cheap memory has become just a few months ago. It’s due to this dramatic drop in the cost of memory over the years that computers are now able to rely on slower storage like flash since we can do a lot more caching for files that are read more often. When you read from disk less you lower I/O rates and disk seeks and ultimately increase performance.

For example, because it’s actually faster and cheaper for me to search for the exact information I need on Google at the time I need it — rather than store millions of documents covering a broad assortment of information and then seek that information when it’s needed — I don’t bother consuming that storage myself. Instead I let a company that has entire Data Centers filled with machines to take on the cost of bandwidth, storage space, and compute power necessary to collect and assort that information for me. Google relies on building massive networks of low-voltage computers that can cache the indexes that make it possible for them to produce such instantaneous results for your searches at any given time. This spares me a few petabytes of data and produces even more accurate and speedy results than I would have attained on my own. Ultimately it’s saving me storage space and this is one of the reasons I believe we will one day reach a point when storage is no longer the concern but how quickly we can transmit the data being stored is.

I may want to save a few PDF documents or web pages here and there from Wikipedia on a particular topic I find interesting for my research, but ultimately I will never need to store the entirety of Wikipedia on my local machine. Mostly because I will never have to time to read through it all. Besides, it’s faster for me to search and read only the bits and pieces I need at that particular time over the web than it would be for me to do it locally even if I did store all that information (remember Wikipedia is one such organization with the Data Centers powerful enough to out-perform the read/write/compute capabilities of my tiny little machine).