[opensuse-es] [OT] Hypertable

7 Aug 2008

      Se presento Hypertable, una DB de código abierto, que es un modelo de
distribuido de base de datos basado en el modelo de  Bigtable de
Google, pero disponible para cualquier persona. Escala bases de datos
para satisfacer cualquier necesidad.

http://www.linux-mag.com/id/6645

Traditional database technology such as MySQL works well so long as
your data fits on a single machine. As soon as your data exceeds the
capacity of a single machine, the problem gets exponentially more
difficult. Hypertable is an open source, distributed database,
specifically designed to overcome this scaling barrier. It is modeled
after Google's Bigtable, which has been successfully deployed at
Google for several years and underpins many of their major services.

This technology was designed to be massively scalable and to that end,
certain traditional database features were sacrificed, most notably
transactions and table joining. Though lack of support for these
features makes Hypertable unsuitable for certain classes of
applications, such as financial applications, there is a large set of
Web applications in which this technology is well suited.

System Overview

The Hypertable data model consists of a multi-dimensional table of
information that can be queried using a single primary key. The first
dimension of the table is the row key. The row key is the primary key
and defines the order in which the table data is physically stored.
The second dimension is the column family. This dimension is somewhat
analogous to a traditional database column. The third dimension is the
column qualifier.

Within each column family, there can be a theoretically infinite
number of qualified instances. For example if we were building a URL
tagging service, we might define column families content, url, and
tag. Within the "tag" column family there could be an infinite number
of qualified instances, such as tag:science, tag:theater, tag:good,
etc. The fourth and final dimension is the time dimension.

This dimension consists of a timestamp that is usually auto assigned
by the system and represents the insertion time of the cell in
nanoseconds since the epoch. Conceptually, a table in Hypertable can
be thought of as a three dimensional Excel spreadsheet with
timestamped versions of each cell. This data model is more versatile
to that used by Distributed Hash Table (DHT) technology in that it
supports efficient traversal of elements in primary key order.

The Hypertable system is made up of several components: some number of
RangeServers, a Master, the Client Library, and Hyperspace.
RangeServers are responsible for managing ranges of physical table
data. In general, there will be a RangeServer running on each machine
in the cluster. There is a single Master that is responsible for meta
operations such as table creation/deletion and range assignment.
Client data does not move through the Master, so temporary Master
failures do not affect typical client operations such as scanning and
updating. Though there is a single Master, the system has been
designed to support hot standbys.

The client library is what gets linked into an application to give it
access to the Hypertable system and provides APIs for creating,
updating, scanning, and deleting tables. Hyperspace is somewhat
analogous to Google's Chubby service (see
labs.google.com/papers/chubby.html). It is a distributed lock manager
and provides a global filesystem for storing small amounts of
metadata. At present, Hyperspace is implemented as a single server but
will be distributed and highly available in a future release.

Scaling: How it is Achieved

The key to Hypertable's ability to scale is the way it manages table
data. Tables are broken into a set of contiguous row ranges, each of
which is managed by a RangeServer. Initially each table consists of a
single range that spans the entire row key space. As the table fills
with data, the range will eventually exceed a size threshold (default
is 200MB) and will split into two ranges using the middle row key as a
split point.

One of the ranges will stay on the same RangeServer that held the
original range and the other will get reassigned to another
RangeServer by the Master. This splitting process continues for all of
the ranges as they continue to grow. Active ranges will consume some
amount of system resource (e.g. memory and CPU) on the RangeServer
machine. As load increases on the cluster as a whole, new machines can
get added to provide more capacity.

..........................................
---------------------------------------------------------------------
Para dar de baja la suscripciÃ³n, mande un mensaje a:
   opensuse-es+unsubscribe@opensuse.org
Para obtener el resto de direcciones-comando, mande
un mensaje a:
   opensuse-es+help@opensuse.org

[opensuse-es] [OT] Hypertable

Juan Erbes