Se presento Hypertable, una DB de código abierto, que es un modelo de distribuido de base de datos basado en el modelo de Bigtable de Google, pero disponible para cualquier persona. Escala bases de datos para satisfacer cualquier necesidad. http://www.linux-mag.com/id/6645 Traditional database technology such as MySQL works well so long as your data fits on a single machine. As soon as your data exceeds the capacity of a single machine, the problem gets exponentially more difficult. Hypertable is an open source, distributed database, specifically designed to overcome this scaling barrier. It is modeled after Google's Bigtable, which has been successfully deployed at Google for several years and underpins many of their major services. This technology was designed to be massively scalable and to that end, certain traditional database features were sacrificed, most notably transactions and table joining. Though lack of support for these features makes Hypertable unsuitable for certain classes of applications, such as financial applications, there is a large set of Web applications in which this technology is well suited. System Overview The Hypertable data model consists of a multi-dimensional table of information that can be queried using a single primary key. The first dimension of the table is the row key. The row key is the primary key and defines the order in which the table data is physically stored. The second dimension is the column family. This dimension is somewhat analogous to a traditional database column. The third dimension is the column qualifier. Within each column family, there can be a theoretically infinite number of qualified instances. For example if we were building a URL tagging service, we might define column families content, url, and tag. Within the "tag" column family there could be an infinite number of qualified instances, such as tag:science, tag:theater, tag:good, etc. The fourth and final dimension is the time dimension. This dimension consists of a timestamp that is usually auto assigned by the system and represents the insertion time of the cell in nanoseconds since the epoch. Conceptually, a table in Hypertable can be thought of as a three dimensional Excel spreadsheet with timestamped versions of each cell. This data model is more versatile to that used by Distributed Hash Table (DHT) technology in that it supports efficient traversal of elements in primary key order. The Hypertable system is made up of several components: some number of RangeServers, a Master, the Client Library, and Hyperspace. RangeServers are responsible for managing ranges of physical table data. In general, there will be a RangeServer running on each machine in the cluster. There is a single Master that is responsible for meta operations such as table creation/deletion and range assignment. Client data does not move through the Master, so temporary Master failures do not affect typical client operations such as scanning and updating. Though there is a single Master, the system has been designed to support hot standbys. The client library is what gets linked into an application to give it access to the Hypertable system and provides APIs for creating, updating, scanning, and deleting tables. Hyperspace is somewhat analogous to Google's Chubby service (see labs.google.com/papers/chubby.html). It is a distributed lock manager and provides a global filesystem for storing small amounts of metadata. At present, Hyperspace is implemented as a single server but will be distributed and highly available in a future release. Scaling: How it is Achieved The key to Hypertable's ability to scale is the way it manages table data. Tables are broken into a set of contiguous row ranges, each of which is managed by a RangeServer. Initially each table consists of a single range that spans the entire row key space. As the table fills with data, the range will eventually exceed a size threshold (default is 200MB) and will split into two ranges using the middle row key as a split point. One of the ranges will stay on the same RangeServer that held the original range and the other will get reassigned to another RangeServer by the Master. This splitting process continues for all of the ranges as they continue to grow. Active ranges will consume some amount of system resource (e.g. memory and CPU) on the RangeServer machine. As load increases on the cluster as a whole, new machines can get added to provide more capacity. .......................................... --------------------------------------------------------------------- Para dar de baja la suscripción, mande un mensaje a: opensuse-es+unsubscribe@opensuse.org Para obtener el resto de direcciones-comando, mande un mensaje a: opensuse-es+help@opensuse.org