[ru] [en]


Creating checkpoints (CP) is a widely used technique for inserting fault tolerance into distributed computing systems.

Majority of existing software (OpenMPI, by MVAPICH, DMTCP), which allowing to checkpointing parallel programs, apply synchronous approach which provides simultaneous saving of local CP by all branches of the program. It leads to the considerable overheads arising at creation of CP which connected with raised load of computer system's input-output subsystem. Therefore the problem of decreasing overheads during creation of distributed CP by reducing their size. One of the approaches to solve this problem is a CP compression.

The HBICTHash Based Incremental Checkpointing Tool – is a package for checkpoints optimization on time of their creation and size using algorithms of the universal and delta compression.