Most
of us are using Mysql database and majority don’t know how to choose the data
base engines, what are the different types of storage engines available in
mysql and how they differ from each other. In this article let me give you a
brief idea about the Storage Engines and what are the limitations and where to
use these various storage engines.
One
of the greatest things about MySQL, other than being free, widely supported and
fast, is the flexibility of choosing different storage engines for different
tables. These storage engines act as handlers for different table types. Thus
MySQL storage engines include both those that handle transaction-safe tables
and those that handle non-transaction-safe tables along with many others. MySQL
does this through their Pluggable Storage Engine Architecture.
Comparing Transaction and
Nontransaction Engines
MySQL
supports several storage engines that act as handlers for different table
types. MySQL storage engines include both those that handle transaction-safe
tables and those that handle nontransaction-safe tables.
As
of MySQL 5.1, MySQL Server uses a pluggable storage engine architecture that
enables storage engines to be loaded into and unloaded from a running MySQL
server
NOTE:--
Prior to MySQL 5.1.38, the pluggable storage engine architecture is supported
on Unix platforms only and pluggable storage engines are not supported on
Windows
To
determine which storage engines your server supports by using the SHOW ENGINES statement.
The value in the Support column
indicates whether an engine can be used. A value of YES, NO,
or DEFAULT indicates
that an engine is available, not available, or available and currently set as
the default storage engine.
The
listing shows the full list of available database engines.
There
are a number of ways you can specify the storage engine to use. The simplest
method,
if
you have prefer a particular engine type that fits most of your database needs
then you can set the
default
engine type within the MySQL configuration file using the following
commands(using the
option
storage_engine or when starting the database server (by supplying the
–default-storage
engine
or –default-table-type options on the command line).
The
default storage engine for MySQL prior to version 5.5 was MyISAM. For MySQL 5.5
and later,
the
default storage engine is InnoDB. Choosing the right storage engine is an
important strategic decision, which will impact future development. In this
tutorial, we will be using MyISAM, InnoDB, Memory and CSV storage engines. If
you are new to MySQL and your are studying the MySQL database management
system, then this is not much of a concern. If you are planning a production
database, then things become more complicated.
Other
storage engines may be available from third parties and community members that
have used the Custom Storage Engine interface.
NOTE
Third
party engines are not supported by MySQL. For further information,
documentation, installation guides, bug reporting or for any help or assistance
with these engines, please contact the developer of the engine directly.
Third
party engines that are known to be available include the following:
PrimeBase XT (PBXT):
PBXT has been designed for modern, web-based, high concurrency environments.
RitmarkFS:
RitmarkFS enables you to access and manipulate the file system using SQL
queries. RitmarkFS also supports file system replication and directory change
tracking.
Distributed
Data Engine: The Distributed Data Engine is an Open Source project that is
dedicated to provide a Storage Engine for distributed data according to
workload statistics.
mdbtools: A
pluggable storage engine that enables read-only access to Microsoft Access .mdb
database files.
solidDB for MySQL:
solidDB Storage Engine for MySQL is an open source, transactional storage
engine for MySQL Server. It is designed for mission-critical implementations
that require a robust, transactional database. solidDB Storage Engine for MySQL
is a multi-threaded storage engine that supports full ACID compliance with all
expected transaction isolation levels, row-level locking, and Multi-Version
Concurrency Control (MVCC) with nonblocking reads and writes.
The InnoDB Plugin:
The InnoDB Plugin for MySQL permits users to replace the “built-in” version of
InnoDB in MySQL 5.1 with a new release of InnoDB that offers new features,
improved performance, enhanced reliability and new capabilities for flexibility
and ease of use. Among the new features of the InnoDB Plugin are “Fast index
creation”, table and index compression, file format management and new
INFORMATION_SCHEMA tables.
More
flexibility is offered by allowing you to specify the default storage engine to
be used MySQL,
the
most obvious is to specify the engine type when creating the table:
You
can also alter the storage engine used in an existing table:
However,
you should be careful when altering table types in this way, as making a
modification to a type that does not support the same indexes, field types or
sizes may mean that you lose data. If you specify a storage engine that doesn’t
exist in the current database then a table of type MyISAM (the
default) is created instead.
You
can set the default storage engine to be used during the current session by
setting the storage_engine variable:
Before
taking any decision about which engine we need to choose, first we need to
think about the
different
core functionality provided by each engine that allow us to differentiate
between them. We can divide up the core functionality into four areas; the
supported field and data types, locking types, indexing and transactions. Some
engines have unique functionality that can also drive your decision.
The
MySQL pluggable storage engine architecture enables a database professional to
select a specialized storage engine for a particular application need while
being completely shielded from the need to manage any specific application
coding requirements. The MySQL server architecture isolates the application
programmer and DBA from all of the low-level implementation details at the
storage level, providing a consistent and easy application model and API. Thus,
although there are different capabilities across different storage engines, the
application is shielded from these differences.
The
pluggable storage engine architecture provides a standard set of management and
support services that are common among all underlying storage engines. The
storage engines themselves are the components of the database server that
actually perform actions on the underlying data that is maintained at the
physical server level.
This
efficient and modular architecture provides huge benefits for those wishing to
specifically target a particular application need—such as data warehousing,
transaction processing, or high availability situations—while enjoying the
advantage of utilizing a set of interfaces and services that are independent of
any one storage engine.
The
application programmer and DBA interact with the MySQL database through
Connector APIs and service layers that are above the storage engines. If
application changes bring about requirements that demand the underlying storage
engine change, or that one or more storage engines be added to support new
needs, no significant coding or process changes are required to make things
work. The MySQL server architecture shields the application from the underlying
complexity of the storage engine by presenting a consistent and easy-to-use API
that applies across storage engines.
Field and Data Types
Although
all of the engines support the common data types, i.e., integers, reals and
character based
storage,
not all engines support other field types, particularly the BLOB (binary large
object) or TEXT types. Other engines may support only limited character widths
and data sizes.
These
limitations, while directly affecting the information you store may also have a
related effect
to
the types of searches you perform, or the indexes you create on that
information. In turn, these differences can affect the performance and
functionality of your application as you may have to make decisions about
functionality based on the storage engine choice you make for the type of data
you are storing.
Locking
Locking
within database engines defines how access and updates to information are
controlled.
When
an object in the database is locked for updating, other processes cannot modify
(or in some
cases
read) the data until the update has completed.
Locking
not only affects how many different applications can update the information in
the database, it can also affect queries on that data. The reason for this is
that the queries may be accessing data that may be being altered or updated. In
general, such delays are minimal. The bulk of the locking mechanism is devoted
to preventing multiple processes updating the same data. Since both additions
(INSERT statements) and alterations (UPDATE statements) to the data require
locking, you can imagine that multiple applications using the ame database can
have a significant impact.
Locks
are supported by different storage engines at different object levels, and
these levels affect the concurrency of access to the information. Three
different levels are supported, table locking, block locking and row locking.
Table locking is most commonly supported and is the locking provided in MyISAM.
It locks an entire table during an update. This will limit the number of
applications that are updating a specific table to just one, and this can
affect heavily used multi-user databases because it introduces delays into the
update process.
Page
level locking is used by the Berkeley DB storage engine and locks data
according to the page (8Kb) of information that is being uploaded. When performing
updates across a range of locations within the database, the locking is not a
problem, but because adding rows involves locking the final 8Kb of the data
structure, adding large numbers of rows, particularly of small data, can be a
problem.
Row
level locking provides the best concurrency; only individual rows within a
table are locked, which means that many applications can be updating different
rows of the same table without causing a lock situation. Only the InnoDB
storage engine supports row level locking.
Indexing
Indexing
can dramatically increase the performance when searching and recovering data
from the database. Different storage engines provide different indexing
techniques and some may be better suited for the type of data you are storing.
Some
storage engines simply do not support indexing at all either because they use
the indexing of the underlying tables (in the MERGE engine for example) or
because the data storage method does not allow indexing (FEDERATED or BLACKHOLE
engines).
Transactions
Transactions
provide data reliability during the update or insert of information by enabling
you to add data to the database, but only to commit that data when other
conditions and stages in the application execution have completed successfully.
For example, when transferring information from one account to another you
would use transactions to ensure that both the debit from one account and the
credit to the other completed successfully. If either process failed, you could
cancel the transaction and the changes would be lost. If the process
completed,then we would confirm it by committing the changes.
List of storage engines
MySQL supported storage
engines:
MyISAM
InnoDB
Memory
CSV
Merge
Archive
Federated
Blackhole
Example
Storage Engines:
The
MyISAM engine is the default engine in most MySQL installations and is a
derivative of the original ISAM engine type supported in the early versions of
the MySQL system. The engine provides the best combination of performance and
functionality, although it lacks transaction capabilities (use the InnoDB or
BDB engines) and uses table-level locking.
Unless
you need transactions, there are few databases and applications that cannot
effectively be stored using the MyISAM engine. However, very high-performance
applications where there are large numbers of data inserts/updates compared to
the number of reads can cause performance problem for the MyISAM engine. It was
originally designed with the idea that more than 90% of the database access to
a MyISAM table would be reads, rather than writes.
With
table-level locking, a database with a high number of row inserts or updates becomes
a performance bottleneck as the table is locked while data is added. Luckily
this limitation also works well within the restrictions of a non-transaction
database.
Limitations
1.Crash
recovery can be a time-consuming process owing to MyISAM’s lack of a
transaction log. Expect to have to perform such a recovery during your
peak-usage period. (If it can happen, it almost certainly will.)
2.
MyISAM does not support or enforce foreign key constraints.
3.
All UPDATE queries to the same table are serialized — that is to say they
carried out one at a time — and block all other queries, including SELECTs,
from executing. This effect is pronounced on most busy multi-user applications.
4.
MyISAM supports concurrent INSERTs only in certain cases.
5.
Maximum of 64 indexes per row,
When to use MyISAM
1.
Your application demands full-text search capabilities. Rather than push all
your data into MyISAM tables to gain full-text searching, it may be viable to
split your dataset into data that must be indexed for full-text searching — and
stored using MyISAM — and data that should be stored using a transactional
engine, such as InnoDB. A scheduled background job may then asynchronously
update your MyISAM full-text indexes and provide links from the InnoDB data as
appropriate. This is a common example of how to gain the best from all worlds.
2.
Your application is effectively single-user — there are very few concurrent
queries hitting the MySQL server.
3.
You are performing limited testing or development where performance is not
under scrutiny.
MERGE
The
MERGE storage engine, also known as the MRG_MyISAM engine, is a collection of
identical MyISAM tables that can be used as one. You can then execute queries
that return the results from multiple tables as if they were just one table.
Each table merged must have the same table definition.
The
MERGE table is particularly effective if you are logging data directly or
indirectly into a MySQL database and create an individual table per day, week
or month and want to be able to produce aggregate queries from multiple tables.
There are limitations to this however, you can only merge MyISAM tables and the
identical table definition restriction is strictly enforced. Although this
seems like a major issue, if you had used one of the other table types (for
example InnoDB) then the merge probably wouldn’t be required.
Limitations
1.
You can use only identical MyISAM tables for a MERGE table.
2.
You cannot use a number of MyISAM features in MERGE tables. For example, you
cannot create FULLTEXT indexes on MERGE tables. (You can, of course, create
FULLTEXT indexes on the underlying MyISAM tables, but you cannot search the
MERGE table with a full-text search.)
3.
If the MERGE table is non-temporary, all underlying MyISAM tables must be
non-temporary, too. If the MERGE table is temporary, the MyISAM tables can be
any mix of temporary and non-temporary.
4.
MERGE tables use more file descriptors. If 10 clients are using a MERGE table
that maps to 10 tables, the server uses (10 × 10) + 10 file descriptors. (10
data file descriptors for each of the 10 clients, and 10 index file descriptors
shared among the clients.)
5.
Key reads are slower.Key reads are slower. When you read a key, the MERGE
storage engine needs to issue a read on all underlying tables to check which
one most closely matches the given key. To read the next key, the MERGE storage
engine needs to search the read buffers to find the next key. Only when one key
buffer is used up does the storage engine need to read the next key block.
When to Use MERGE
1.
Easily manage a set of log tables. For example, you can put data from different
months into separate tables, compress some of them with myisampack, and then
create a MERGE table to use them as one.
2.
Obtain more speed. You can split a big read-only table based on some criteria,
and then put individual tables on different disks. A MERGE table on this could
be much faster than using the big table.
3.
Perform more efficient searches. If you know exactly what you are looking for,
you can search in just one of the split tables for some queries and use a MERGE
table for others. You can even have many different MERGE tables that use
overlapping sets of tables.
4.
Perform more efficient repairs. It is easier to repair individual tables that
are mapped to a MERGE table than to repair a single large table.
5.
Instantly map many tables as one. A MERGE table need not maintain an index of
its own because it uses the indexes of the individual tables. As a result,
MERGE table collections are very fast to create or remap. (Note that you must
still specify the index definitions when you create a MERGE table, even though
no indexes are created.
6.
If you have a set of tables from which you create a large table on demand, you
should instead create a MERGE table on them on demand. This is much faster and
saves a lot of disk space.
7.
Exceed the file size limit for the operating system. Each MyISAM table is bound
by this limit, but a collection of MyISAM tables is not.
8.
You can create an alias or synonym for a MyISAM table by defining a MERGE table
that maps to that single table. There should be no really notable performance
impact from doing this (only a couple of indirect calls and memcpy() calls for
each read).
MEMORY
The
MEMORY storage engine (previously known as the HEAP storage engine) stores all
data in memory; once the MySQL server has been shut down any information stored
in a MEMORY database will have been lost. However, the format of the individual
tables is kept and this enables you to create temporary tables that can be used
to store information for quick access without having to recreate the tables
each time the database server is started.
Long
term use of the MEMORY storage engine is not generally a good idea, because the
data could so easily be lost. However, providing you have the RAM to support
the databases you are working on, use of MEMORY based tables is an efficient
way of running complex queries on large data sets and benefitting from the
performance gains.
The
best way to use MEMORY tables is to use a SELECT statement to select a larger
data set from your original, disk-based, tables and then sub-analyse that
information for the specific elements you want.
The
FEDERATED storage engine (added in MySQL 5.03) enables you to access data from
remote MySQL database (other databases may be supported in the future) as if it
were a local database. In effect, the MySQL server acts as a proxy to the
remote server, using the MySQL client access library to connect to the remote
host, execute queries and then reformat the data into the localized format.
In
essence, it is a way for a server, rather than a client, to access a remote
database and can be an effective way of combining data from multiple hosts or
of copying specific data from remote databases into local tables without the
use of data exports and imports.
ARCHIVE
The
ARCHIVE storage engine supports only the INSERT and SELECT statements, but does
support most of the MySQL field types. Information stored in an ARCHIVE storage
engine table is compressed and cannot be modified and so ARCHIVE tables are
perfect for storing log data (which you don’t want to be able to change) or
information that is no longer in active use (for example, old invoicing or
sales data).
While
the information is stored very efficient, care should be taken when accessing
data stored in the ARCHIVE tables. Because the information is compressed,
selects have to read the entire table, and that also means decompressing the
information. This can obviously increase the time taken to perform complex
searches and retrievals. If you are performing a large number of queries on the
information in these tables it may be easier to temporarily copy your data to
another, uncompressed, data type such as MyISAM.
The
CSV storage engine stores data not in a binary format, but in the form a CSV
(Command Separated Values) file. Because of this, there are limitations to the
data stored. It is not an efficient method for storing large volumes of data,
or larger data types like BLOB, although such types are supported. There is
also no indexing. However, because the data is stored in the CSV format it is
exceedingly portable; these CSV files generated can easily be imported into
many different software packages, including Excel, OpenOffice and database
systems like Access or FileMaker.
In
general, the CSV engine is impractical as a general database engine. It is,
however, probably the most effective and easiest method for data exchange. What
makes it so convenient is that we can use SELECT and INSERT statements to
create the database, which in turn means that we can easily produce CSV files
based on queries of other data.
With
some careful work, the CSV storage engine can also be used as an effective way
of getting information into MySQL. Here, you can create the tables first,
shutdown the MySQL server, copy over CSV files that you have exported from
Excel, Access or another database, and you can then import the data and copy it
over to MyISAM or InnoDB tables.
The
Blackhole storage engine accepts but does not store data and retrievals always
return an empty set. The functionality can be used in distributed database
design where data is automatically replicated, but not stored locally. Although
you can create tables and indexes, all SQL statements that would add or update
information to the database are executed without actually writing any data. The
database structure is retained, however, and you can create any indexes on the
(non-existent) information that you want.
Although
this seems like a futile exercise, it does allow you to test out database
structures and play with table definitions without actually creating any data.
Even more useful, however, is that SQL statements on BLACKHOLE databases are
written to the binary log, and therefore are replicated to slave databases.
You
can use this functionality to update one or more slaves directly without
writing any local data. There are a number of potential uses for this
functionality.
ISAM
The
ISAM storage engine was the original engine type available with versions of
MySQL up until MySQL 3.23, when the MyISAM storage engine was introduced. ISAM
has a number of different limitations that make it impractical as a database
engine. These include the storage format, which is
native
to the platform (and therefore not portable between systems), a maximum table
size of just 4GB and limited text searching facilities. Indexes are also more
limited. Since MyISAM is supported on the same platforms as ISAM, and provides
better compatibility, portability and performance.
ISAM
is included for backwards compatibility, you certainly shouldn’t use ISAM for
new databases, use MyISAM instead.
Berkeley DB (BDB)
The
Berkeley DB (or BDB) engine is based on the technology provided by the Berkeley
DB storage system developed by SleepyCat software. BDB is a hash based storage
mechanism, and the keys to the hash values are stored very efficiently. This
makes the recovery of information–especially when accessed directly using a
unique key incredibly quick, and by far the quickest of the available database
types. Recovering full records is even quicker if you the data is short enough
to be stored with the unique key (i.e., under 1024 bytes long). BDB is also one
of only two types of storage engine that support transactions.
BDB
is, however, limited in other ways. Although it uses page locking, locking only
8192 bytes of a table, rathter than the entire table, during an update this can
cause problems if you are performing a large number of updates in the same page
(for example, inserting many rows). There is unfortunately no way round this.
Sequential data access–for example a large quantity of rows matching
non-indexed data–can be a lot slower because the data needs to be scanned row
by row.
Recovery
of information with BDB tables can also be a problem. Data in BDB is stored in
a combination of the key index, the data file and binary data logs. A loss of
data in any of these sections, even just one of the data logs, can make the
data in the database totally unrecoverable.
Where
BDB shines therefore is in locations where you can access specific blocks of
data by a unique key that does not frequently change.
The
InnoDB Engine is provided by Innobase Oy and supports all of the database functionality
(and more) of MyISAM engine and also adds full transaction capabilities (with
full ACID (Atomicity, Consistency, Isolation, and Durability) compliance) and
row level locking of data.
The
key to the InnoDB system is a database, caching and indexing structure where
both indexes and data are cached in memory as well as being stored on disk.
This enables very fast recovery, and works even on very large data sets. By
supporting row level locking, you can add data to an InnoDB table without the
engine locking the table with each insert and this speeds up both the recovery
and storage of information in the database.
As
with MyISAM, there are few data types that cannot effectively be stored in an
InnoDB database. In fact, there are no significant reasons why you shouldn’t
always use an InnoDB database. The
management
overhead for InnoDB is slightly more onerous, and getting the optimization
right for the sizes of in-memory and on disk caches and database files can be
complex at first. However, it also means that you get more flexibility over
these values and once set, the performance benefits can easily outweigh the
initial time spent. Alternatively, you can let MySQL manage this automatically
for you.
If
you are willing (and able) to configure the InnoDB settings for your server,
then I would recommend that you spend the time to optimize your server
configuration and then use the InnoDB engine as the default.
Limitations
1.
Queries that result in large scans of the tablespace are often slower when using
InnoDB.
2.
Consistency is only maintained if the underlying operating system and hardware
can guarantee buffer flushes. This limitation is inherent in all transactional
database management systems.
3.
InnoDB tables consume a greater amount of space on-disk than their MyISAM
equivalents. This is now largely irrelevant given the ubiquity of large
(multiple hundreds of gigabytes) hard disk drives.
When to use InnoDB
1.
You are developing an application that requires ACID compliance. At the very
least, your application demands the storage layer support the notion of
transactions.
2.
You require expedient crash recovery. Almost all production sites fall into
this category, however MyISAM table recovery times will obviously vary from one
usage pattern to the next. To estimate an accurate figure for your environment,
try running myisamchk over a many-gigabyte table from your application’s
backups on hardware similar to what you have in production. While recovery
times of MyISAM tables increase with growth of the table, InnoDB table recovery
times remain largely constant throughout the life of the table.
3.
Your web site or application is mostly multi-user. The database is having to
deal with frequent UPDATEs to a single table and you would like to make better
use of your multi-processing hardware.
NDB Cluster
Another
well-known storage engine. It allows one to cluster tables. That means you have
multiple masters, which all can do inserts, updates and deletes on the same
table. NDB has row-level locking, but not full multi-version concurrency
control. Due to some architectural limitations of how the MySQL server executes
joins, they perform rather poorly with NDB storage engine. For that reason, it
is best used for single table primary key lookups. So, you would not likely
port your entire web-facing database to NDB Cluster for example.
EXAMPLE
The
EXAMPLE storage engine is a stub engine that does nothing. Its purpose is to
serve as an example in the MySQL source code that illustrates how to begin
writing new storage engines. As such, it is primarily of interest to
developers.
To
enable the EXAMPLE storage engine if you build MySQL from source, invoke
configure with the --with-example-storage-engine option.
To
examine the source for the EXAMPLE engine, look in the storage/example
directory of a MySQL source distribution.
When
you create an EXAMPLE table, the server creates a table format file in the database
directory. The file begins with the table name and has an .frm extension. No
other files are created. No data can be stored into the table. Retrievals
return an empty result.
The
EXAMPLE storage engine does not support indexing.
The
IBMDB2I storage engine is designed as a fully featured transaction-capable
storage engine that enables MySQL to store its data in DB2 tables running on
IBM i. With the IBMDB2I storage engine, data can be shared between MySQL
applications and applications coded for native DB2 for i interfaces.
IBMDB2I
provides ACID-compliant transactions, support for foreign key constraints, full
crash recovery, radix-tree-based indexes, and the unique ability to enable DB2
for i applications to see and update table data in real time.
More
information about the storage engine and its interaction with DB2 for i can be
found in IBM's Using DB2 for i as a Storage Engine for MySQL Redbook
publication, at IBM DB2 for i a Storage Engine for MySQL Redbook.
Note
The
IBMDB2I storage engine was introduced in MySQL 5.1.33 and considered GA in
MySQL 5.1.35. It was removed in MySQL 5.1.54.
Summary
As
you may have been able to conclude from the above summary of the different
storage engines available, there are few reasons not to use either the MyISAM
or InnoDB engine types. MyISAM will do in most situations, but if you have a
high number of updates or inserts compared to your searches and selects then
you will get better performance out of the InnoDB engine. To get the best
performance out of InnoDB you need to tweak the parameters for your server,
otherwise there is no reason not to use it.
The
MERGE engine is an exceedingly effective way of querying data from multiple,
identically defined, tables. The MEMORY engine is the best way to perform a
large number of complex queries on data that would be inefficient to search on
a disk based engine. The CSV engine is a great way to export data that could be
used in other applications. BDB is excellent for data that has a unique key
that is frequently accessed. The following table provides an overview of some
storage engines provided with MySQL: