As part of implementing the many products and services offered by Google, we have built a collection of systems and tools that simplify the storing and processing of large-scale data sets, and the construction of heavily-used public services based on these data sets. These systems are intended to work well in Google's computational environment, which consists of large numbers of commodity machines connected by commodity networking hardware. Our systems handle issues like storage reliability and availability in the face of machine failures, and our processing tools make it relatively easy to write robust computations that run reliably and efficiently on thousands of machines. In this talk I'll highlight some of the systems we have built, and discuss some challenges and future directions for new systems.
twitter's large-scale software distribution for production servers using bittorrent
Relational databases, such as MySQL, PostgreSQL and various commercial products, have served us well for many years. Lately, however, there has been a lot of discussion on whether the relational model is reaching the end of its life-span, and what may come after it.
Felix von Leitner's original paper on BSD and Linux io performance.
Dan Kegel's updated paper on high-performance network io programming on Unix.
nice slides about ruby on rails and scaling.