Tuesday, August 28, 2007

Amazon S3 for Video Hosting

In an earlier post I have written that the next YouTube could be built on Amazon’s Web Services (AWS) by two guys in a dorm for roughly $0. It is indeed very easy to host videos on Amazon S3.

All you need to do is to sign up for an Amazon S3 account and upload your videos with the Amazon S3 Firefox Organizer (available here: S3Fox). To see this in action check out this interesting Amazon S3 video hosting tutorial by Perry Lawrence on YouTube.

You can simply include the hosted files in your web pages. However it is also possible to create nice web applications or mashups using the video content on S3. The book Amazon.com Mashups by Francis Shanahan will help developers to get familiar with Amazon Web Services. The examples in this book demonstrates how to integrate AWS with APIs from Yahoo!, eBay, Google and YouTube. Check out this sample chapter on the Amazon Simple Storage Service (S3) available at Wrox.

It has never been easier to create your own YouTube!

Thursday, August 23, 2007

Building Smart Web 2.0 Applications

Successful internet companies have learned the hard way how to use the collective intelligence of users:
  • Amazon, Netflix and others have built recommendation engines
  • Goole has invented a unique page ranking system based on links
In his new book, Programming Collective Intelligence: Building Smart Web 2.0 Applications, Toby Segaran teaches the secrets of harnessing the power of user generated content. This practical book takes you into the world of machine learning and statistics, and explains how to draw conclusions about user experience, marketing, personal tastes, and human behavior in general. Each algorithm is described clearly and concisely with code that can immediately be used on your web site, blog, Wiki, or specialized application. The book explains:
  • Collaborative filtering techniques that enable online retailers to recommend products or media
  • Methods of clustering to detect groups of similar items in a large dataset
  • Search engine features -- crawlers, indexers, query engines, and the PageRank algorithm
  • Optimization algorithms that search millions of possible solutions to a problem and choose the best one
  • Bayesian filtering, used in spam filters for classifying documents based on word types and other features
  • Using decision trees not only to make predictions, but to model the way decisions are made
  • Predicting numerical values rather than classifications to build price models
  • Support vector machines to match people in online dating sites
  • Non-negative matrix factorization to find the independent features in a dataset
  • Evolving intelligence for problem solving -- how a computer develops its skill by improving its own code the more it plays a game

Wednesday, August 22, 2007

Web 2.0 meets Telco 2.0

The closed telecommunications business model is in trouble. Large operators find it difficult to introduce new profitable services using their expensive infrastructure. Some of them have realized this and are rushing to enable 3rd parties to implement new services by publishing interfaces to their networks. The prime example is British Telecom who has made major investments implementing their open 21st Century Network. Several APIs are now available for messaging, call control and mobile applications to create interesting mashups.

The Web21C SDK is a set of libraries that makes it simple for developers to consume Web Services exposed by BT. Many services can be accessed easily:
  • Messaging - send SMS messages
  • Voice call - place phone calls from applications
  • Conference call - place and control conference calls
  • Location - determine the geographic location (latitude,longitude, altitude) of a mobile device
  • Authentication - create and control authentication realms for applications, including management and authentication of users
  • Inbound SMS - receive and process SMS text messages from any mobile network
  • Contacts - give users the ability to build and maintain a list of buddies and set their availability
  • Information about me (IAM) - store and retrieve data about an individual in key value pairs
The Web21C SDK allows product and service developers, from major enterprises to one-man programmer start-ups, to integrate their new applications with BT services in a single line of code.

Competitive pricing model: Web21C SDK services are charged through a credit system with each service having an associated credit cost per invocation.

"No other telco in the world is doing this," says Dirk Wood of the Web21C team. "BT is ahead of the game. This product shows that BT is in the business of becoming a fundamental part of the new world where innovation is agile and can originate from anywhere. This is central to what web 2.0 really means, it's where all services are going and we're right at the leading edge of this movement."

ProgrammableWeb has a great how-to article on the big picture of telephony APIs and mashups.

Tuesday, August 21, 2007

Disruption in Telecom: Voice & Messaging

The proliferation of the internet and wireless broadband connectivity is disrupting the telecom industry status quo. Innovative IP based voice and messaging services challenge traditional phone services.

The Telco 2.0 blog (STL) has teamed with Telecom TV to cover this topic in a very interesting panel discussion. Their Chief Analyst Martin Geddes explains how the value proposition of telephony is likely to invert over the next 10-15 years:
  • access was in short supply (traditional telco)
  • user's time and attention is in short supply (next wave)
  • value is shifting from providing access to serve real needs of the users
    (usability, relevance, transactions, trust)
Keith Wallington represents a disruptor on the panel: Truephone. They offer free software that brings Internet/Wi-Fi based VoIP to mobile phones. That means free or very cheap mobile calls and messaging. He explains:
"the ability to charge a human being ... for a minute or a second of air time will tend to zero"

So the future is in rich services that create real value for the users. The user is the king!

Check out the full video here:
Telco 2.0 on TV: Disruption & Innovation in Voice & Messaging.

Friday, August 17, 2007

Top 10 APIs for Web Mashups

Mashups are web applications that combine data from two or more external sources into an integrated experience. Content used in mashups is sourced from a third party via a public interface or API. Many people are experimenting with mashups using Microsoft, Google, eBay, Amazon, Flickr, and Yahoo APIs, which has led to the creation of Mashup Editors.

ProgrammableWeb is a great resource on mashups. Based on their latest stats the top 10 APIs for web 2.0 mashups of all time are:

However recently popular mashups also use these APIs:
  • Twitter - What are you doing?
  • Skype - create applications that work together with Skype
  • GeoNames - eight million geographical names available under free creative commons licence
Check out the full list with ~500 APIs on ProgrammableWeb and their tutorial on how to make your own web mashup!

Wednesday, August 15, 2007

Erlang: The Programming Language for Multicore CPUs

The future of computing is going to be concurrent. Sun has just announced the UltraSPARC T2 CPU which they call the World's fastest microprocessor. With 8 cores and 64 threads it makes a true system on a chip. Even desktop systems have multi-core processors nowadays. However traditional software is not well prepared to effectively utilize large number of cores.

Concurrent programming is hard. Most programming languages do not make it easier either. On the other hand Erlang is ideally positioned for this new world. It was designed from the ground up to take advantage of parallel and multi-core architectures.

Erlang is a concurrent functional programming language and runtime system. It was designed to support distributed, fault-tolerant, soft-real-time, non-stop applications. Erlang was originally a proprietary language within Ericsson, but was released as open source in 1998.

Erlang programs usually scale very well on multi-core systems. Joe Armstrong explains why:

"Back in the old days (20 odd years ago) there were two models of concurrency:
  • Shared state concurrency
  • Message passing concurrency
Now the whole world went one way (towards shared state), and we went the other"

The Erlang concurrency model differs from other languages by not having any shared state. If a process wants to communicate with another, it does so by sending messages. This method scales better than methods that uses shared memory for communication.

You can learn much more about Erlang in Joe Armstrong's new book:
Programming Erlang: Software for a Concurrent World

It's very readable and does not require prior experience with functional languages. The book is packed with examples and encourages experimenting; in fact the first chapter explains the installation of Erlang. A reviewer calls it "the most important programming language book this decade".

To have some fun check out this article and video from 1990: Erlang Now!

More resources:

Tuesday, August 14, 2007

Open Source Realtime 3D Engine for Flash

Papervision3D is a high performance 3D engine for Flash. It features linear texture mapping, optimized for rendering speed and quality. It has been designed to be simple and easy to use.

Papervision3D is in public beta and available on Google Code under the open source MIT License. The project has a Wiki and a Blog as well.

Check out these stunning demos to get a glimpse of the possibilities:

Papervision3D based rich internet applications would look amazing on these real 3D displays!

Make Voice Calls from the Browser

Skype made phone calls over the internet easy. Ribbit plans to take it one step further by enabling VoIP phone calls directly from the browser. The Flex based RibbitPhone Component runs solely on the Flash Player so no additional downloads are necessary. It will give Rich Internet Applications the ability to make and receive calls, record/send and receive voicemail, as well as add and manage contacts, the ability to make true ‘one-click-calling’.

The RibbitPhone pre-release is expected on 3rd September and the public beta release is planned for 3rd October. Can Flash based VoIP finally appear in Rich Internet Applications?

Beginning at 9AM (PST) today (August 13th, 2007) in Seattle, Washington, the first unveiling of Ribbit's services and product road map will be shown in front of the audience during the keynote at 360 Flex.

Friday, August 10, 2007

Google Offers up to 250GB Storage for GMail and Picasa!

With the Google shared storage plan, you can purchase additional storage for your files, pictures, or emails. Google services (e.g. email and photos) will share your single new storage space. The initial price offering:
  • 6 GB ($20.00 per year)
  • 25 GB ($75.00 per year)
  • 100 GB ($250.00 per year)
  • 250 GB ($500.00 per year)
This extra storage acts as overflow when you run out of free storage space in either service. More details here: Google Paid Storage

Thursday, August 9, 2007

Web 3.0 Defined by Google CEO

Google CEO Eric Schmidt was recently asked to define Web 3.0. He made a great definition:

"My predicion would be that Web 3.0 will be ultimately seen as applications that are pieced together" - with the characteristics that the applications are:
  • relatively small
  • the data is in the cloud (or network)
  • can run on any device (PC or mobile phone)
  • very fast and very customizable
  • distributed virally (by social networks or e-mail)
That's a very different application model that we have ever seen in computing.
There is low barriers to entry - applications are easy to develop and works everywhere.

Here is Schmidt's full Web 3.0 definition on YouTube.

Monday, August 6, 2007

Amazon EC2, S3 + Hadoop = Open Source Utility Computing on Google Scale

Google is the undisputed King of Scalability. The High Scalability blog collected the open secrets of the Google Architecture. Two important components are:
These distributed technologies are the foundations of their scalable and reliable storage and processing clusters.

The open source Hadoop project implements these functions so you can build a Google like cluster in your data center. In case you do not have 100s of servers at your disposal ask Amazon for help. The Amazon EC2 and S3 utility computing services can host your cluster at a reasonable price. Check out this tutorial by Tom White who illustrates how to use Hadoop and Amazon Web Services together using a large collection of web access logs:
Yahoo has recently announced their support for Hadoop.

"Looking ahead and thinking about how the economics of large scale computing continue to improve, it's not hard to imagine a time when Hadoop and Hadoop-powered infrastructure is as common as the LAMP (Linux, Apache, MySQL, Perl/PHP/Python) stack that helped to powered the previous growth of the Web."

Thursday, August 2, 2007

YouTube Architecture Unveiled

YouTube has grown rapidly to serve 100+ million videos per day. How could they manage this incredible growth? What is the architecture behind YouTube that supports this extreme scalability?

In this Google Tech Talk Cuong Do discusses the scalability challenges that have arisen during YouTube's short but extraordinary history. Cuong is currently an engineering manager at YouTube/Google. He was part of the engineering team that scaled the YouTube software and hardware infrastructure from its infancy to its current scale.

Interesting bits:
  • Initial team consisted of 2 sysadmins, 2 architechts, 2 developers, 2 network engineers and a DBA
  • Based on Apache, Python and MySQL
  • Most popular content is moved to a CDN (content delivery network)
  • Much more details in the video and notes...
The notes of the Tech Talk are also available on the High Scalability Blog.

Wednesday, August 1, 2007

Internet Map Shows Connection Density and City-to-City Links

What does the Internet look like? How does it evolve? The DIMES project aims to collect data to answer these questions by mapping the structure and topology of the internet. It is a distributed scientific project with thousands of participating volunteers. You can also join by downloading the DIMES gent here.

Chris Harrison has spectacular maps based on the data collected by the DIMES project. The Internet Map shows connection density and city-to-city connections. It is also available in high resolution PNG.