4.19.2004

- Kill Bill 2 is most awesome! I think it's the fastest 2 hrs ever spent in a movie hall.
QT - I salute you!
Rating: 5/5

4.12.2004

Just got tickets to Itzhak Perlman concert. Going on the 25th! Awesome!
The Secret Source of Google's Power
(from: http://blog.topix.net/archives/000016.html)

Much is being written about Gmail, Google's new free webmail system. There's something deeper to learn about Google from this product than the initial reaction to the product features, however. Ignore for a moment the observations about Google leapfrogging their competitors with more user value and a new feature or two. Or Google diversifying away from search into other applications; they've been doing that for a while. Or the privacy red herring.
No, the story is about seemingly incremental features that are actually massively expensive for others to match, and the platform that Google is building which makes it cheaper and easier for them to develop and run web-scale applications than anyone else.

I've written before about Google's snippet service, which required that they store the entire web in RAM. All so they could generate a slightly better page excerpt than other search engines.

Google has taken the last 10 years of systems software research out of university labs, and built their own proprietary, production quality system. What is this platform that Google is building? It's a distributed computing platform that can manage web-scale datasets on 100,000 node server clusters. It includes a petabyte, distributed, fault tolerant filesystem, distributed RPC code, probably network shared memory and process migration. And a datacenter management system which lets a handful of ops engineers effectively run 100,000 servers. Any of these projects could be the sole focus of a startup.



Speculation: Gmail's Architecture and Economics
Let's make some guesses about how one might build a Gmail.

Hotmail has 60 million users. Gmail's design should be comparable, and should scale to 100 million users. It will only have to support a couple of million in the first year though.

The most obvious challenge is the storage. You can't lose people's email, and you don't want to ever be down, so data has to be replicated. RAID is no good; when a disk fails, a human needs to replace the bad disk, or there is risk of data loss if more disks fail. One imagines the old ENIAC technician running up and down the isles of Google's data center with a shopping cart full of spare disk drives instead of vacuum tubes. RAID also requires more expensive hardware -- at least the hot swap drive trays. And RAID doesn't handle high availability at the server level anyway.

No. Google has 100,000 servers. [nytimes] If a server/disk dies, they leave it dead in the rack, to be reclaimed/replaced later. Hardware failures need to be instantly routed around by software.

Google has built their own distributed, fault-tolerant, petabyte filesystem, the Google Filesystem. This is ideal for the job. Say GFS replicates user email in three places; if a disk or a server dies, GFS can automatically make a new copy from one of the remaining two. Compress the email for a 3:1 storage win, then store user's email in three locations, and their raw storage need is approximately equivalent to the user's mail size.

The Gmail servers wouldn't be top-heavy with lots of disk. They need the CPU for indexing and page view serving anyway. No fancy RAID card or hot-swap trays, just 1-2 disks per 1U server.

It's straightforward to spreadsheet out the economics of the service, taking into account average storage per user, cost of the servers, and monetization per user per year. Google apparently puts the operational cost of storage at $2 per gigabyte. My napkin math comes up with numbers in the same ballpark. I would assume the yearly monetized value of a webmail user to be in the $1-10 range.


Cheap Hardware
Here's an anecdote to illustrate how far Google's cultural approach to hardware cost is different from the norm, and what it means as a component of their competitive advantage.

In a previous job I specified 40 moderately-priced servers to run a new internet search site we were developing. The ops team overrode me; they wanted 6 more expensive servers, since they said it would be easier to manage 6 machines than 40.

What this does is raise the cost of a CPU second. We had engineers that could imagine algorithms that would give marginally better search results, but if the algorithm was 10 times slower than the current code, ops would have to add 10X the number of machines to the datacenter. If you've already got $20 million invested in a modest collection of Suns, going 10X to run some fancier code is not an option.

Google has 100,000 servers.

Any sane ops person would rather go with a fancy $5000 server than a bare $500 motherboard plus disks sitting exposed on a tray. But that's a 10X difference to the cost of a CPU cycle. And this frees up the algorithm designers to invent better stuff.

Without cheap CPU cycles, the coders won't even consider algorithms that the Google guys are deploying. They're just too expensive to run.

Google doesn't deploy bare motherboards on exposed trays anymore; they're on at least the fourth iteration of their cheap hardware platform. Google now has an institutional competence building and maintaining servers that cost a lot less than the servers everyone else is using. And they do it with fewer people.

Think of the little internal factory they must have to deploy servers, and the level of automation needed to run that many boxes. Either network boot or a production line to pre-install disk images. Servers that self-configure on boot to determine their network config and load the latest rev of the software they'll be running. Normal datacenter ops practices don't scale to what Google has.

What are all those OS Researchers doing at Google?
Rob Pike has gone to Google. Yes, that Rob Pike -- the OS researcher, the member of the original Unix team from Bell Labs. This guy isn't just some labs hood ornament; he writes code, lots of it. Big chunks of whole new operating systems like Plan 9.

Look at the depth of the research background of the Google employees in OS, networking, and distributed systems. Compiler Optimization. Thread migration. Distributed shared memory.

I'm a sucker for cool OS research. Browsing papers from Google employees about distributed systems, thread migration, network shared memory, GFS, makes me feel like a kid in Tomorrowland wondering when we're going to Mars. Wouldn't it be great, as an engineer, to have production versions of all this great research.

Google engineers do!


Competitive Advantage
Google is a company that has built a single very large, custom computer. It's running their own cluster operating system. They make their big computer even bigger and faster each month, while lowering the cost of CPU cycles. It's looking more like a general purpose platform than a cluster optimized for a single application.

While competitors are targeting the individual applications Google has deployed, Google is building a massive, general purpose computing platform for web-scale programming.

This computer is running the world's top search engine, a social networking service, a shopping price comparison engine, a new email service, and a local search/yellow pages engine. What will they do next with the world's biggest computer and most advanced operating system?

4.08.2004

- Ouch! Methinks me hurt me shoulder a little bit. 135 lbs may be chicken feed for all you regular gym rats, but for a novice like me... even that is a big step up from 115. And now it hurts (a bit). Note to self: Never skip the progression.
- The PLAT competition (Paplu.. remember the trading competition I had in Dec?) is back. This time, I have huge problems - it's right towards the end of my submission time... so I can't include much of the results in my report. However, my prof insists I add as much as possible. It's more data, but more work for me :-( ... and if I thought most of the work would be done with my first draft - I've got another thign coming. This is 'hellmonth' for me ;-)

4.07.2004

I'm way behind on my thesis report. It was due today and I'll have to mail my prof(s) and tell him(them) I can only turn it in on friday at the earliest. Since I have to work on that, my DBI project is down the drain. And I'm working on HW now. I still have to find a bunch of papers for my talk and send it to DBI prof. Not to mention move this weekend (I don't even know where yet!!). More administrative work to be done. And I have to grade homeworks and present a lecture on Saturday!

BUT I've watched three great soccer games (if only the highlights), executed a good dive in class today, and am doing good in the footy tipping league !

Life is good.
- IT'S HAPPENED AGAIN !!!
4-1 was the first leg score AC Milan - holders, superpowers, and the world's best defence - hammered 4-0 in the second leg by Depo. Man oh man! I was just telling a friend the other day that I should bet 5 bucks on AC Milan, Chelsea and Monaco going through to the semis. The odds would've been so long, that I could've made HEAPS... How I wish I'd gone through with it. Dramatic! Unprecedented! Remarkable! I'm running out of adjectives here...
- Sadly, for all the drama, this game wasn't quite as good as the other two. In fact, to say Milan looked off color would be an understatement. They looked like they were playing with many nights of partying and no sleep behind them...
- Oh, and Porto went through with a 2-2 draw (4-2 aggregate).
- Wow! We're going to see a new Champions League winner for sure now.
- 100 Hours! The difference between a potentially magical season with an unprecedented (for the club) three titles in a year - and total Eurpoean dominance - to a potentially disastrous season , where they will have to fight off disappointment and a lot of nerves, not to mention a surging Chelsea; just to salvage the premiership. Arsenal has gone from superb to desperate in 100 hours. My favorite english club (and top five among european ones) is out of the Champs' league. Sad! But it's nice to see Ranieri and the boys get some reward.
- Guess who joins Arsenal? RMFC!! Their loan to Monaco came back to bite them in the butt !Morientes scored a crucial second away goal last time around and this time sparked off a resurgence - to take Monaco through on away goals after their two contests ended 5-5 on aggregate. Towards the end of the first half, Real were up 5-2 and surely then, with all their talent, only a brave man would've betted against them. But a happy man he would've been. An eternal optimist... and firm believer in sporting folklore.
- I might have seen two of my fave teams go out - but it makes me happy to be a sports nut - to witness great games like these. It was a good day for the Champions' League.

4.04.2004

- Ok let's see. Friday night dinner (Malaysian), then videogames and movie (Hellboy).
- Hellboy - 3.5/5 (because I like action movies)
Pros: Not too much mush, lots of action, true to the comic book, Selma Blair's power (flames) awesome!
Cons: Little longer than could be, monsters could've been a little more imaginative
- Saturday morning: Family b'day party with cake, croissants and tea. Lunch at Ram's place. Chilled and watched TV all day. Dinner at night (Chuy's). Lots of calls :-) B'day over.
- Back at home... and back to work. NO good!
- Maybe I'll run the Texas Roundup 10k in 2 weeks. Let's see. There's also the 5k challenge next month.

4.02.2004

- 'Running Free' has one of my fave bass lines ... makes you want to get up and pump the drums or the bass!!
- Bond's birthday today. Planned to go to Dallas and have a combined b'day party. But with a deadline of April 7th for a report, 8th for a homework and 9th for a project, don't know if I can! Let's see.
- Love the way Arsenal's playing. Just hope they can go on to at least two of the three trophies they're on target for. Would REALLY like to see them lift the Champs' league. Bergkamp and Henry in particular deserve it. Also, it would REALLY tick off the 'red devils' and their legion. Man oh man... that would be just awesome!
- Where's all my mail (snail mail) disappearing to? Sheesh... this is the problem with moving so often.

4.01.2004

- Ok, it's 8.15 am and I've already had a few people trying to pull a fools' day stunt. Man, people start early! Got a call from a friend saying my advisor does not want to work with me anymore. (yeah, rite! I'll believe you when you tell me he told you and not me!!). Got an email from a guy expressing his "dismay" over the match fixing in Pak (the test series) and how the first test was all screwed up (jeez! give me a break!). Got another email about Yahoo Research Labs wanting to hire me. (Yeah rite! I didn't even apply there... ). Sheesh! If someone doesn't come up with something more exciting soon...