Home InterBase Firebird Community About us Search Help English Russian
Everything about Borland InterBase and Firebird
Get Firebird power with FIBPlus
 
Anonymous Login / Register
Site News
Third-party Product News
Programming
Miscellaneous
Encyclopedia
FAQ


Search the site


Subscribe the site news

Sections:
News
E-mail:


InterBase World is a Full Member of Firebird Foundation



Login


Hits 2622341
2471
Hosts 152100
794
Visitors 771498
1110

6



Powered by Bitrix Site Manager - Content Management & Portal Solutions

Home / Article archives / Programming

To BLOB or not to BLOB, that is the question...


01/17/2003  | Views: 8129

Source: Ivan Prenosil's InterBase site

Some notes that can help with decision whether to store long strings in BLOB or in CHAR/VARCHAR columns.

Author: Ivan Prenosil. Original text

Differences between CHAR and VARCHAR

Many people believe that VARCHAR is better because it stores only actual data, while CHAR is stored in full length. It is not true. In fact, both CHAR and VARCHAR are stored in memory buffer in their full, declared length; when the row is stored on disk, then RLE compression algorithm is used to compress whole row, i.e. CHARs, VARCHARs, INTEGERs, DATEs, etc. all together. So if you want to save space, CHARs are slightly better than VARCHARs (the differenece is that VARCHAR stores string length in two bytes).
There is also a bug that causes that VARCHAR does not properly clean string tail if you assign shorter string, thus causing worse compression. (this problem is fixed in Firebird-0.9.4)

Many people also believe that VARCHAR sends over network only actual data, while CHAR is sent in full length. It is not true either. Communication between client and server is done via messages of fixed length. For this reason both CHAR and VARCHAR are sent in their full declared length. (this problem is fixed in IB-6.5)

So decision whether use CHAR or VARCHAR should be based solely on application's requirements. E.g. store fixed length codes in CHAR, store names in VARCHAR (to allow correct concatenating).


Advantages/disadvantages of BLOBs vs. VARCHARs

All comments in this paragraph referring VARCHAR type are valid for CHAR type too.
Each comment ends with BLOB + or VARCHAR + mark to indicate which data type is better.
  • Do you know maximum length of your data ?
    With VARCHARs you need to declare maximum string length.
    With blobs you do not need to worry about it.
    BLOB +
  • Do you need to store very long strings ?
    Single VARCHAR column is restricted to 32K bytes (i.e. about 10K Unicode characters).
    Maximum blob size is (according to Operation Guide):
      1Kb page size =>  64 Mb
      2Kb page size => 512 Mb
      4Kb page size =>   4 Gb
      8Kb page size =>  32 Gb 
    BLOB +
  • Do you need to store many long text columns in single table ?
    Total row length (uncompressed) is restricted to 64K. VARCHARs are stored in row directly, so you can't store many long strings in one row.
    Blobs are represented by their blob-id, so they use up only 8 bytes from 64K maximum.
    BLOB +
  • Do you want to minimize conversation between client and server ?
    VARCHAR data are fetched along with other row data in one fetch operation, and usually several rows are sent over network at once.
    Each single blob needs extra open/fetch operation.
    VARCHAR +
  • Do you want to minimize amount of data transferred between client and server ?
    With blobs you have advantage that after fetching row you can decide whether or not you want to fetch blob data too.
    With VARCHAR you have disadvantage in that they are transferred across network in full declared length (Too long VARCHAR's degrade performance significantly even on LAN, not to mention dial-up.)
    This problem is fixed in IB-6.5, which sends only actual data of VARCHAR columns.
    BLOB +   /   ( VARCHAR + for IB-6.5)
  • Do you want to minimize used space ?
    VARCHARs are RLE compressed (in fact whole row except blobs is compressed). At most 128 bytes can be compressed to 2 bytes. This means that even empty VARCHAR(32000) will occupy 500+2 bytes.
    Blobs are not compressed, but empty (i.e. null) blob will occupy only 8 bytes of blob-id (and it will be later RLE compressed). Non-empty blob can be stored either on the same page as other row data (if it fits), or on separate page. Small blob that fits on data page has overhead about 40 bytes (or slightly more). Large blob has the same 40 bytes overhead on data page, plus 28 bytes overhead on each blob page (30 bytes on first one). One blob page can't contain more than one blob (i.e. blob pages are not shared like data pages). E.g. for 4K page size, if you store 5K blob, two blob-type pages will be allocated, which means that you lose 3K of space ! In other words - the larger page size, the higher probability that small blobs will fit on data page, but also more wasted space if separate blob pages are needed for large blobs.
    VARCHAR + (except VARCHARs with extremely large declared length, or tables with lots of NULL blobs)
  • Do you need table with extremely large number of rows ?
    Each row is identified by DB_KEY, which is 64 bit value, where 32 bits represents relation id and 32 bits are used to locate the row. So theoretical maximum number of rows in one table is 2^32 (but for several reasons the real maximum is even lower). Blob-ids are assigned from the same address space as DB_KEYs, it means the more blobs in table, the less DB_KEYs will remain to address rows. On the other hand when stored rows are wide (e.g. if they contain long VARCHARs), then less rows fit on data page and many DB_KEY values will remain unasigned anyway.
    varchar + ?
  • Do you want good performance ?
    Because large blobs are stored off data pages, they increase "density" of rows on data pages and thus cache efficiency (reduce number of i/o operations during search).
    BLOB +
  • Do you need to perform search on the contents of text columns ?
    On VARCHAR column you can use operators like '=', '>', BETWEEN, IN (), case sensitive LIKE and STARTING, case insensitive CONTAINING. In most cases index can be used to speed search up.
    Blobs can't be indexed, and you are restricted to LIKE, STARTING, and CONTAINING operators. You can't directly compare blobs with operators '=', '>' etc. (unless you use UDF), so you can't e.g. join tables on blob fields.
    VARCHAR +
  • Do you want to search contents of these texts with CONTAINING ?
    CONTAINIG can be used to perform case-insensitive search of contents of VARCHAR field. (not using index)
    Because it is not possible to define collation order for blob columns, you can't use fully case insensitive search with national characters on blob columns (only lower half of character set will be case insensitive). (As a workaround you can use UDF.)
    VARCHAR +
  • Do you need to UPPERcase contents of text column ?
    You can use built-in function UPPER() on VARCHAR, but not on blob. (Also CAST, MIN, MAX can't be used with blobs)
    VARCHAR +
  • It is not possible to sort by blob column. (and GROUP BY, DISTINCT, UNION, JOIN ON)
    It is not possible to concatenate blob columns.
    VARCHAR +
  • There is no built-in conversion function (CAST) for converting blob to VARCHAR or VARCHAR to blob.
    (But it is possible to write UDF for this purpose.)
    draw
  • It is not possible to assign value to blob directly in SQL command,
    e.g. INSERT INTO tab(MyBlob) VALUES('abc'); (But it is possible to use UDF for converting string to blob).
    VARCHAR +

    Firebird-0.9.4 already has this functionality
    draw
  • Sort file size.
    Sometimes when InterBase needs to sort result set (e.g. for ORDER BY, DISTINCT, UNION without ALL, etc.) it creates temporary sort file. This file contains values from all columns you are selecting, i.e. either full VARCHAR strings, or only short blob-ids. It means blobs can keep sort files smaller.
    BLOB +
  • Blobs can have assigned recommended-segment-size, blob-sub_type and use blob-filters.
    VARCHARs do not support such functionality.
    BLOB +
  • Blobs exist in two flavours - segmented (default one) and stream.
    Stream blob is unstructured, while segmented preserves data in the form it was put into blob, i.e. it preserves segment boundaries.
    VARCHARs do not support such functionality.
    BLOB +
  • Do you need good security on these text columns ?
    To retrieve data from table, you need to have granted SELECT privilege.
    To retrieve blob, you need to know only blob-id (stored in table), but InterBase will not check whether you have any rights to table blob belongs to. It means that everybody who knows or guess right blob-id can read the blob without any rights to table. (You can try it with ISQL and BLOBDUMP command.)
    VARCHAR +
  • What tools do you use.
    Final decision (whether prefer blob or varchar) can be determined by tools used to access the database (Delphi components, middleware ...). Some tools can have problems with correct handling blobs, some other tools can have restrictions on VARCHARs (e.g. limit to 255 characters or inability to distinguish between empty and null string).
    draw

Copyright © 2002 Ivan Prenosil

Back to the news section


 Last articles 
09/24/2007 IBSurgeon would like to inform you about a series of free Firebird technical seminars

09/23/2007 Firebird System Tables (Parts II+III)

09/20/2007 Firebird Maestro 7.9 released

09/19/2007 And the Winner is —

09/19/2007 Unable to access fireruby lib - with firebird 2.0.x and ror

news archive