Copyright (C) 2004-2006 The Regents of the University of California.
Copyright (C) 2007 Los Alamos National Security, LLC.
This material was produced under U.S. Government contract DE-AC52-06NA25396 for Los Alamos National Laboratory (LANL), which is operated by Los Alamos National Security, LLC for the U.S. Department of Energy. The U.S. Government has rights to use, reproduce, and distribute this software. NEITHER THE GOVERNMENT NOR LOS ALAMOS NATIONAL SECURITY, LLC MAKES ANY WARRANTY, EXPRESS OR IMPLIED, OR ASSUMES ANY LIABILITY FOR THE USE OF THIS SOFTWARE. If software is modified to produce derivative works, such modified software should be clearly marked, so as not to confuse it with the version available from LANL.
Additionally, this library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. Accordingly, this library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
Los Alamos Computer Code LA-CC-06-105
The SIF library does not depend on any other libraries except for libc. The library has only been compiled on a few platforms, 32-bit machines running Visual Studio(R) on Windows 2000(R) as well as Windows XP(R) as well as GNU/Linux with large file support (64-bit) turned on. In theory, SIF should compile on platforms that do not support 64-bit files but this has not been tested. SIF has not been tested on Cygwin.
Open up the solution file sif.sol and click Build...Compile. This should build the sif.dll file, which you may put in a folder that is accessible by your PATH variable.
We use the autotools, specifically automake and autoconf. First, your system needs to be inspected with autoconf by doing
./configure
/usr/local by default. To change the prefix, do ./configure --prefix=/desired/prefix
After configuring your build environment, you are ready to build the library. The first command below builds the library while the second command installs it and the include files.
make make install
The compilation of your own code against SIF is not covered in this document since its a topic that others have certainly covered better than I could. However, there are two helpful points to note: SIF builds a shared library libsif.so, which it puts in prefix/lib and it puts a header file sif-io.h in prefix/include. You will need to include sif-io.h to have access to SIF functions and you will need to link against libsif.so. We provide a pkg-config file for your convenience called sif-io.pc, which is put in prefix/lib/pkgconfig. If you wish to use it, make sure your PKG_CONFIG_PATH includes that directory. Be sure your C_INCLUDE_PATH is set to include prefix/include and your LD_LIBRARY_PATH includes prefix/lib.
Every tile has a small tile header describing it: its uniformity and its byte offset location on disk (if applicable). A block refers to a unit of space on disk for storing a tile. The terms are different to easily differentiate between the physical entity of an image (a tile) with the storage space used to store it (block). A block need not contain a tile, in which case it is an unused block, and it can be reclaimed for later use. The next table shows how all of these image elements (tiles, blocks, slices) are related to one another and their relative sizes.
| Relevant to | Unit | Size |
| Image | image_width | User-defined |
| Image | image_height | User-defined |
| Image/Tile/Block | bands | User-defined |
| Image | image_wpixels | ceil(image_width / tile_width) * tile_width |
| Image | image_hpixels | ceil(image_height / tile_height) * tile_height |
| Image | image_data_units | image_wpixels * image_hpixels * bands |
| Image | image_bytes | image_data_units * data_unit_size |
| Tile/Slice/Block | tile_width | User-defined |
| Tile/Slice/Block | tile_height | User-defined |
| Tile | tile_data_units | tile_width * tile_height * bands |
| Tile | tile_bytes | tile_pixels * data_unit_size |
| Block | block_data_units | block_width * block_height * bands |
| Block | block_bytes | block_pixels * data_unit_size |
| Slice | slice_data_units | tile_width * tile_height * 1 |
| Slice | slice_bytes | slice_data_units * data_unit_size |
| Data Unit | data_unit_size | User-defined |
Compression in the SIF format is quite simple. A slice is uniform if every data unit in the slice is the same and the common pixel value for the slice is called its uniform pixel value. A tile is uniform if all of its slices are uniform. Slices within the same tile need not have the same uniform pixel value.
The next figure shows an example of an image, tile headers for a few tiles, and the layout of various elements in the file. The red and white blocks correspond to used blocks and unused blocks, respectively. Unused blocks are occur when a previously used block is freed up. Let's examine five tiles (i, j, k, m, and n), their associated tile headers, and their placement on disk.
Some functions only check for shallow uniformity when performing their operations while others consider intrinsic uniformity. The sif_consolidate function helps free up data blocks by checking for uniformity of underlying block rasters, labeling them as shallow uniform if it finds them to be intrinsically uniform, and freeing up the disk blocks used by them. The sif_consolidate function also reduces external fragmentation by moving the used blocks to the front of the file and the unused blocks to the back. If the sif_header::consolidate flag is set, this consolidation process is performed on the file's closing.
Sometimes the tile width does not divide the image width or the tile height does not divide the image height. The next Figure illustrates this. The border tile overlaps the border of the image. In cases such as these, only tile pixels within the image boundary are examined for uniformity. Border tiles cause some internal fragmentation but it is often negligible.
The choice of tile width and tile height depends on several factors. The number of bytes needed to store the tile header relative to the bytes needed to store the tile raster. It is also useful to characterize the kind of uniformity expected in the image like the largest non-uniform region size and the number of these regions.
The SIF format facilitates the storage of (key, value)-paired meta-data. A (key, value) pair is called a meta-data item. A meta-data item is referred to by its key and the element data is its value. Keys are case-sensitive.
SIF has two types of meta-data values, strings and binary byte sequences. String values must be represented by null-terminated character arrays. Binary byte sequences permit the storage of arbitrary data in a SIF file. String values are stored as binary byte sequences. If an attempt is made to retrieve a meta-data item as a string but the value is not a null-terminated byte sequence, an error is returned.
Any meta-data key beginning with _sif_ is reserved for special meta-data, as defined by the SIF file format specification. The following reserved meta-data keys are currently in usage:
_sif_proj _sif_agree
The following example sets a meta-data field on the SIF file pointed to by the file sif_file pointer,
sif_set_meta_data(file, "model_file", "/afs/clue/gadm/833/hyper.model")
Now, let's retrieves it, and print it
printf("model file: %s\n", sif_get_meta_data(file, "model_file"));
Now let's try to store an array V of 32 doubles using native byte order,
sif_set_meta_binary(file, "my_32_doubles", V, sizeof(double) * 32);
Let's now retrieve it and print it out.
double *buf; int nbytes, i; buf = sif_get_meta_data_binary(file, "my_32_doubles", &nbytes); if (nbytes != sizeof(double) * 32) { printf("Something bad happened.\n"); } else { for (i = 0; i < 32; i++) { printf("my_double %d: %5.8f\n", i, buf[i]); } }
Since a copy of all the meta-data in SIF file is always stored in memory, the meta-data feature is only intended for light use. In a future version, storage of a large meta-data footprint will be viable.
The SIF file format does not establish a set of data types. The underlying pixel values are data-typeless to the SIF I/O library. The library permits the user to store a sif_header::user_data_type field but the field's value does not influence the behavior of SIF routines. This scheme of ignoring the underlying data type works if it can be guaranteed that two values pixel values p(x1,y1,b1) and p(x2,y2,b2) are the same if and only if the underlying byte sequences of these values are the same. Thus, SIF only needs to know the size of each underlying byte sequence that represents a single pixel value.
Agreeing to a data type convention provides a guarantee that the data type of pixels can easily be determined. We encourage users to set the "_sif_agree" meta-data value to a string indicating the data type convention used. If "_sif_agree" meta-data string is not set, all bets are off, and no guarantees can be made about the data type of the pixels. Alternatively, the sif_get_agreement and sif_set_agreement functions can be used to get and set the agreement string.
We define one data type convention, "simple". The type codes for the sif_header::user_data_type field are defined below.
Value of user_data_type | Corresponding Data Type |
0 or SIF_SIMPLE_UINT8 | unsigned char or uint8_t (little-endian) |
1 or SIF_SIMPLE_INT8 | char or int8_t |
2 or SIF_SIMPLE_UINT16 | uint16_t |
3 or SIF_SIMPLE_INT16 | int16_t |
4 or SIF_SIMPLE_UINT32 | uint32_t |
5 or SIF_SIMPLE_INT32 | int32_t |
6 or SIF_SIMPLE_UINT64 | uint64_t |
7 or SIF_SIMPLE_INT64 | int64_t |
8 or SIF_SIMPLE_FLOAT32 | IEEE-754 32-bit float |
9 or SIF_SIMPLE_FLOAT64 | IEEE-754 64-bit float |
For example, suppose we've created a new SIF file with a data_unit_size of 4. Now let's write some code to indicate the "simple" convention: use unsigned 32-bit integers as the data type and use big-endian as the byte-order of the data units. Then, we'll print out these codes using functions that manipulate the compound type code (which we store in the sif_header::user_data_type in the file's header) to give the base type code (i.e. the data type irrespective of the byte order) and the endian code.
int base_code = SIF_SIMPLE_UINT32; int endian_code = SIF_SIMPLE_BIG_ENDIAN; int compound_code = SIF_SIMPLE_TYPE_CODE(base_code, endian_code); sif_set_agreement(file, SIF_AGREEMENT_SIMPLE); sif_set_user_data_type(file, compound_code); printf("Base Data Type: %d\n" "Compound Data Type: %d\n" "Endian: %d\n", SIF_SIMPLE_BASE_TYPE_CODE(compound_code), compound_code, SIF_SIMPLE_ENDIAN(compound_code));
Alternatively, you can use the sif_simple_create function to create a file using the "simple" data type convention.
The SIF file begins with a fixed-size header followed by sif_header::n_tiles fixed-sized tile headers, followed by a variable number of fixed-sized blocks. Finally, the meta-data is written after all the data blocks. The file header and tile headers are put in the beginning of the file since their size does not change, although their values may change. This means that the large data blocks need not be moved forward in the file. Meta-data is written after the data blocks since the number of meta-data items can change; thus, the approach eliminates the need to move up data blocks after inexpensive meta-data operations. Unfortunately, there is a danger that after a new data block is allocated, there may not be enough space on the partition for the meta-data, and the file's meta-data cannot be safely written as read. This kind of data loss is uncommon if efforts are made to ensure adequate disk space is available to the scientific programs that use SIF.
The overall layout of a SIF file from the first byte to the last is shown in the following table. The file header remains of constant size and most header fields are immutable. The tile headers are of constant size since the data unit size and number of bands are immutable quantities. The inclusion of routines for changing tile dimensions, image dimensions, endianess, and data types is under currently under consideration. The block region is of variable size and precedes the meta-data region. Consequently, the location of the meta-data region changes as the size of the block region changes. The size of the meta-data region changes as meta-data fields are modified, added, or removed. It was anticipated that meta-data would be modified more infrequently than the data blocks. If the meta-data were to precede the block region, a small increase in the size of the meta-data region would result in the need to move the entire block region, which is costly when the block region is large. This is a casual justification for our choice of storing the meta-data after the block region. Also under consideration is the ability to store the meta-data before the block region, employing preallocation strategies to minimize moves of the block region due to meta-data region resizing.
| File Header |
Tile Header 1 (starts at header->header_bytes) |
| Tile Header 2 |
| ... |
Tile Header n_tiles (starts at file->base_location) |
| Block 1 |
| Block 2 |
| ... |
Block n_blocks |
| Meta-data Item 1 (starts after the last byte of the last block) |
| Meta-data Item 2 |
| ... |
Meta-data Item n_meta_data_items |
The absolute byte offset for each file header field is shown in the next table. The second column is the name of the field as stored in the sif_header struct stored when a SIF file is opened. Integers and doubles are signed and stored in big-endian (or network) byte order. Note that in SIF Format Version code 1, doubles were stored little-endian but we realized this was confusing so this has been changed to big endian in versions 2 and higher. The header_bytes field enables the format to be changed without advancing the version code. Specifically, non-essential header fields can be added but they will be ignored by earlier versions of the I/O library.
The only fields that can change following the first write of a raster to an image are the defragmentation, consolidation, and intrinsic write flags as well as the georeferencing transform, and key count. If the caller wishes to change the image dimensions, data type, or tile dimensions after the first raster write, it must be done manually.
| Absolute Offset | Name | Description | Type |
| 0 | header_bytes | The header size in bytes including the space needed for header_bytes. | 32-bit int (b.e.) |
| 4 | magic_number | The magic number "!**SIF**". | 8 8-bit chars |
| 12 | version | The version of the SIF file format used for the target file. This field is not the version of the SIF I/O library used to write the file. | 32-bit int |
| 16 | width | The width of the image in pixels. | 32-bit int |
| 20 | height | The height of the image in pixels. | 32-bit int |
| 24 | bands | The depth of the image in pixels. | 32-bit int |
| 28 | n_keys | The number of meta data fields stored. | 32-bit int |
| 32 | n_tiles | The number of tiles stored. | 32-bit int |
| 36 | tile_width | The width of each tile and slice in pixels. | 32-bit int |
| 40 | tile_height | The height of each tile and slice in pixels. | 32-bit int |
| 44 | tile_bytes | The number of bytes to store a single tile with all bands. | 32-bit int |
| 48 | n_tiles_across | The number of tiles for a single row of tiles on an image. | 32-bit int |
| 52 | data_unit_size | The size of a single data unit. | 32-bit int |
| 56 | user_data_type | A user-defined constant to represent the data type of the pixels, meaningful to the caller. | 32-bit int |
| 60 | defragment | When set, defragments the file during close. | 32-bit int |
| 64 | consolidate | When set, consolidates the file during close. | 32-bit int |
| 68 | intrinsic_write | When set, newly dirtied tiles are checked for intrinsic uniformity when written. | 32-bit int |
| 72 | tile_hd_bytes | The number of bytes to store a single tile header on disk. | 32-bit int |
| 76 | n_unif_flags | The number of bytes to store the uniform flags in the tile header. | 32-bit int |
| 80 | aff_geo_trans | The affine geo-referencing transform. | Six 64-bit IEEE-754 doubles (b.e.) |
The tile headers store information about the uniformity or non-uniformity of the block. If the tile is uniform, the uniform_pixel_value fields have meaning, and the i'th value is the uniform pixel value for the i'th. The block_num field is set to -1 if the tile header corresponds to a non-uniform tile. The first advancing index is the horizontal tile index and the second, the vertical tile index. This corresponds to how tiles are read and written from and to buffers in the image, i.e. the x coordinate of the pixels advances before the y.
| Relative Offset (to the previous unit) | Name | Description | Type |
0 | uniform_pixel_value[0] | The value to fill band 0 if the tile is uniform. Otherwise, the value is meaningless | User defined (of size data_unit_size) |
data_unit_size | uniform_pixel_value[1] | The value to fill band 1 if the tile is uniform. Otherwise, the value is meaningless | User defined (of size data_unit_size) |
| ... | ... | ... | ... |
i*data_unit_size | uniform_pixel_value[i] | The value to fill band i if the tile is uniform. Otherwise, the value is meaningless | User defined (of size data_unit_size) |
| ... | ... | ... | ... |
(bands - 1) * data_unit_size | uniform_pixel_value[bands - 1] | The value to fill the last band if the tile is uniform. Otherwise, the value is meaningless | User defined (of size data_unit_size) |
r=bands * data_unit_size | uniform_flags | An array of bits, the i'th bit is TRUE if the i'th slice is uniform. | h=ceil(bands/8) 8-bit characters |
r+h | block_num | The block number where this tile is stored if it is non-uniform. This value is -1 if uniform. | 32-bit int |
The meta-data item byte layout is simple. Again, integer length fields are assumed to be big-endian.
| Relative Offset (to the previous unit) | Name | Description | Type |
0 | key_length | The number of bytes to store the key including the null terminator. | 32-bit int (b.e.) |
4 | key | The key as a string. | key_length bytes |
4+key_length | value_length | The number of bytes to store the value including the null terminator (if the value is non-binary). | 32-bit int (b.e.) |
8+key_length | value | The value as a byte sequence. | value_length bytes |
The first release of the SIF I/O library (0.9) and SIF File Format (code 1) was internal while the second release (1.0 and code 2) was the first public release. Version 1 assumes integers in the header, tile headers, and meta-data headers are big-endian and doubles are little-endian. Realizing this was confusing, version 2 assumes doubles in the headers (namely sif_header::affine_geo_transform are also big-endian). Files can be written using older versions of the SIF File Format using the sif_use_file_format_version function.
The following table lists the file versions supported by each version of the SIF I/O library.
| SIF Software Version | Read | Write |
| 0.9 | 1 | 1 |
| 1.0 | 1-2 | 1-2 |
The number of pixels along the x-coordinate axis is given by the image width, and the number of pixels y-coordinate axis is given by the image height. Each tile is referenced by a tile coordinate, (tx, ty). tx is the index of the tile with respect to the x-coordinate axis and ty is the index of the tile with respect to the y-coordinate axis. The x index is the fastest advancing index; the y index, the second fastest advancing; and the band index, the slowest advancing index. The absolute byte offset q is computed from a pixel coordinate (x,y,b) as follows:
q=(b * image_width * image_height) + (image_width * y + x)
The absolute tile index r is similarly computed,
r=(n_tiles_across*ty)+tx
The SIF library performs extensive error checking for I/O errors, memory allocation errors, and errors in the parameters passed to SIF functions. The sif_file::error code is set to 0 or SIF_ERROR_NONE during normal operation. No sif-io.h function resets this flag so the caller must do so if the error is deemed as non-fatal and the caller wishes to perform further operations on the file. Most SIF functions return immediately the first time an error is encountered. Memory allocated during an operation resulting in an error is deallocated prior to returning. The value returned by non-void functions during an error depends on the expected range of values for that function. If a pointer is usually returned, 0 is returned; if a positive number is usually returned, a non-positive is returned; or if a non-positive is usually returned, a number greater than zero is returned. For ease of coding, callers should test the sif_file::error flag rather than checking the return value because of the lack of consistency of return values when returning due to an error. The sif_get_error_description function returns a string description of an error code, which callers may conveniently use when reporting errors.
The sif_is_possibly_sif_file function checks whether a file could possibly be a SIF file. The present version only checks whether the magic number is valid. Future versions of the library will ensure:
Bundled with the SIF library are functions for manipulating SIF files conforming to the "simple" data type convention. These functions begin with sif_simple_.
The sif_simple_create and sif_simple_create_defaults functions are both used to create a SIF file conforming to the "simple" data type convention. The latter function sets defaults related to consolidation, uniformity checking, defragmentation, and tile size. Native byte order is used to store the image rasters; however all header fields, tile header fields, and meta-data length fields are all stored in big-endian byte order, regardless of the endian of the image rasters. The sif_simple_set_endian function must be called after creating the file and prior to performing any image I/O if the file's image endian is changed. Undefined behavior occurs when the endian field is changed after performing image I/O.
When data blocks are written to or read from a file, the blocks are converted to the appropriate byte order prior to writing to the file or after reading from it. sif_simple_ functions may not be used unless the file is opened with the sif_simple_create, sif_simple_create_defaults, or sif_simple_open function.
The sif_simple_set_raster and sif_simple_get_raster functions are used to write and read a rectangular region, respectively. Only one band can be read or written at a time. The offsets and dimensions of the region are in pixel units, not tile units. The sif_simple_is_shallow_uniform checks whether the tiles comprising a rectangular region are stored as shallow uniform.
The sif_simple_get_tile_slice and sif_simple_set_tile_slice functions read and write a slice. The sif_simple_fill_tile_slice function fills a slice with a constant value. The sif_is_slice_shallow_uniform function checks whether a slice is stored as shallow uniform in the file.
The conformity of a file to the "simple" data type convention can be verified with the sif_is_simple_file or sif_is_simple_file_by_name functions. The first function assumes the file as already been opened with sif_open while the second accepts a filename.
SIF allocates enough memory to hold two image blocks in memory for each open SIF file. When the sif_simple_open (for update), sif_simple_create, or sif_simple_create_defaults functions are used to open or create a SIF file conforming to the "simple" data type convention, a buffer is also allocated for converting the byte order of image rasters. The buffer is initially the size of a block. When a call is made to sif_simple_set_raster with a raster larger than the size of the buffer, the buffer is enlarged appropriately. Note that this buffer is not needed if the file is opened for read-only access, since the byte order conversion is performed on the caller's buffer. All of a file's memory buffers are deallocated during close.
We provide the sif-util command for you to use to create, inspect and manipulate SIF files at a UNIX or DOS shell. The first argument is the name of the file to manipulate; the second argument, the name of the operation to perform; and the remaining arguments, the parameters of the operation.
sif-util operation operation-args
sif-util. Arguments are mandatory unless enclosed with square brackets. Indicated in parenthesis is whether the file must be writable to perform the operation.
consolidate on/off/now (writable): when set to on, a file opened for update is consolidated whenever it is closed. When set to now the consolidate flag in the file's header is unchanged and consolidation is performed immediately on the file.
create width height bands sdt [consolidate=on/off] [defragment=on/off] [tw=int] [th=int] [intrinsic=on/off] [endian=native] (writable): creates an image file conforming to the "simple" data type convention. The dimensions of the image are width by height with band bands. The simple data type code for the image is defined with sdt, which must be between 0 and 9 or a string type identifier (uint8, uint16, uint32, uint64, int8, int16, int32, int64, float, double). When the defragment flag (default=on) is set, the file is scheduled for defragmentation whenever it is closed. When the consolidate flag (default=on) is set, the file is scheduled for consolidation whenever it is closed. The tw (default=64) and th (default=64) parameters are the width and height of the tiles in the image. When the intrinsic write flag (default=on) is set, whenever a raster is written to a file, it is checked for intrinsic uniformity. Native byte order is used for storing the image raster unless the endian is set to big or little.
defragment on/off/now (writable): when set to on, a file opened for update is defragmented whenever it is closed. When set to now the defragment flag in the file's header is unchanged and defragmentation is performed immediately on the file.
get-md key: returns the meta-data field referred to by key. If the meta-data is binary, each non-printable character is printed with \xYY where YY is the character's hexadecimal code. New lines are printed with \n.
has-key: prints "yes" if a meta-data field with key key is stored in the file, and "no" otherwise.
header: prints out the file's header in a human-readable format.
intrinsic on/off (writable): when set, whenever a raster is written to the file, it is checked for intrinsic uniformity.
is-sif: writes "yes" if the file is a SIF file, and no, otherwise.
is-simple: writes "yes" if the file is a SIF file and conforms to the "simple" data type convention, and no, otherwise.
list-keys filename: lists the meta-data keys in the file. Each key is printed on a separate line.
list-md filename: lists the meta-data in the file. Each key=value pair is printed on a separate line. Each non-printable character is printed with \xYY where YY is the character's hexadecimal code. New lines are printed with \n.
region-to-pnm [x y w h] [band=int/all]: prints the file's raster in PNM format. This output can be redirected to a PNM file and then converted to another image format using your favorite image converter. When x, y, w, and h are set, the region is read starting at (x,y) with width w and height h. When the band parameter (default=all) is set to a non-negative integer value, a specific band is written. By the default, the entire image is printed.
set-md filename key value (writable): sets the meta-data field referred to by key with the string value value. If the meta-data is binary, each non-printable character must be written with \xYY where YY is the character's hexadecimal code. New lines must be written with \n.
statistics: prints some file statistics, including the number of tiles that are either shallow uniform, hidden uniform, and intrinsically non-uniform. The number of blocks in the block region as well as the number of unused blocks is also printed. These statistics may be useful to decide whether to defragment or consolidate a file. Also printed is the percentage of space saved by the compression and the space that could be saved if consolidation was performed.
tile-to-pnm tx ty [band=int/all]: prints out the tile in the file with tile coordinates tx and ty in PNM format. This output can be redirected to a PNM file and then converted to another image format using your favorite image converter. When the band parameter (default=all) is set to a non-negative integer value, a specific band is written.
When the PNM output operations region-to-pnm and tile-to-pnm are used, the pixel values are assumed to be of unsigned type. If the number of bands to write is 3, the PPM subformat is used with each of the three bands representing a separate color (band[0]=R, band[1]=G, band[2]=B). When writing a single band, the PGM format is used. If the image contains any other number of bands, the PAM format is used. The data unit size must not exceed 2 bytes, i.e. only uint8 and uint16 are supported. It is assumed the image raster is stored in native byte order if the file does not conform to the "simple" data type convention. The image raster is translated into proper ASCII decimal form (PPM or PGM format) or big-endian byte order (PAM format) prior to being outputted.
1.4.7