APIs

int FTI_Init(const char *configFile, MPI_Comm globalComm)

Initializes FTI.

This function initializes the FTI context and prepares the heads to wait for checkpoints. FTI processes should never get out of this function. In case of a restart, checkpoint files should be recovered and in place at the end of this function.

Return

integer FTI_SCES if successful.

Parameters
  • configFile: FTI configuration file.

  • globalComm: Main MPI communicator of the application.

int FTI_Status()

It returns the current status of the recovery flag.

This function returns the current status of the recovery flag.

Return

integer FTI_Exec.reco.

int FTI_InitType(fti_id_t *type, int size)

Registers a new data type in FTI runtime.

This function initalizes a data type. The type is treated as a black box for FTI. Thus, the runtime only requires information about its size. Types built this was are saved as byte array when using HDF5 format.

Return

int FTI_SCES on sucess, otherwise FTI_NSCS.

Parameters
  • type: Output parameter for the data type handle.

  • size: The data type size in bytes.

fti_id_t FTI_InitCompositeType(char *name, size_t size, FTIT_H5Group *h5g)

Initializes an empty composite data type.

Creates a composite data type that can contains other data types as fields. The fields can be added using FTI_AddScalarField and FTI_AddVectorField.

Return

fti_id_t A handle to represent the new type.

Parameters
  • name: An optional type name

  • size: The total size of the complex data type

  • h5g: An optional H5 group identifier

int FTI_AddScalarField(fti_id_t id, char *name, fti_id_t fid, size_t offset)

Adds a scalar field to a composite type.

Adds a scalar field to a complex data type at a given offset.

Return

integer FTI_SCES when successful, FTI_NSCS otherwise

Warning

Do note that FTI does not check for memory boundaries within the data type. Specifying a wrong offset leads to undefined behavior. This can be avoided using the offsetof() macro.

Parameters
  • id: The composite data type handle

  • name: An optional field name

  • fid: The field data type handle

  • offset: Offset of the field (use offsetof)

int FTI_AddVectorField(fti_id_t id, char *name, fti_id_t tid, size_t offset, int ndims, int *dim_sizes)

Adds an n-dimensional vector field to a composite data type.

Adds an N-dimensional array field to a complex data type at a given offset.

Return

integer FTI_SCES when successful, FTI_NSCS otherwise

Warning

Do note that FTI does not check for memory boundaries within the data type. Specifying a wrong offset leads to undefined behavior. This can be avoided using the offsetof() macro.

Parameters
  • id: The composite data type handle

  • name: The field name

  • fid: The field data type handle

  • offset: Offset of the field (use offsetof)

  • ndims: The number of dimensions for the field

  • dim_size: Array of lengths for each dimension

int FTI_GetStageDir(char *stageDir, int maxLen)

Places the FTI staging directory path into ‘stageDir’.

This function places the FTI staging directory path in ‘stageDir’. If allocation size is not sufficiant, no action is perfoprmed and FTI_NSCS is returned.

Return

integer FTI_SCES if successful, FTI_NSCS else.

Parameters
  • stageDir: pointer to allocated memory region.

  • maxLen: size of allocated memory region in bytes.

int FTI_GetStageStatus(int ID)

Returns status of staging request.

This function returns the status of the staging request corresponding to ID. The ID is returned by the function ‘FTI_SendFile’. The status may be one of the five possible statuses:

Return

integer Status of staging request on success, FTI_NSCS else.

Parameters
  • ID: ID of staging request.

FTI_SI_FAIL - Stage request failed FTI_SI_SCES - Stage request succeed FTI_SI_ACTV - Stage request is currently processed FTI_SI_PEND - Stage request is pending FTI_SI_NINI - There is no stage request with this ID

Note

If the status is FTI_SI_NINI, the ID is either invalid or the request was finished (succeeded or failed). In the latter case, ‘FTI_GetStageStatus’ returns FTI_SI_FAIL or FTI_SI_SCES and frees the stage request ressources. In the consecutive call it will then return FTI_SI_NINI.

int FTI_SendFile(char *lpath, char *rpath)

Copies file asynchronously from ‘lpath’ to ‘rpath’.

This function may be used to copy a file local on the nodes via the FTI head process asynchronously to the PFS. The file will not be removed after successful transfer, however, if stored in the directory returned by ‘FTI_GetStageDir’ it will be removed during ‘FTI_Finalize’.

Return

integer Request handle (ID) on success, FTI_NSCS else.

Parameters
  • lpath: absolute path local file.

  • rpath: absolute path remote file.

If staging is enabled but no head process, the staging will be performed synchronously (i.e. by the calling rank).

int FTI_InitGroup(FTIT_H5Group *h5group, char *name, FTIT_H5Group *parent)

It initialize a HDF5 group.

Initialize group defined by user. If parent is NULL this mean parent will be set to root group.

Return

integer FTI_SCES if successful.

Parameters
  • h5group: H5 group that we want to initialize

  • name: Name of the H5 group

  • parent: Parent H5 group

int FTI_setIDFromString(char *name)

Searches in the protected variables for a name. If not found it allocates and returns the ID.

This function searches for a given name in the protected variables and returns the respective id for it.

Return

integer id of the variable.

Parameters
  • name: Name of the protected variable to search

int FTI_getIDFromString(char *name)

Searches in the protected variables for a name. If not found it allocates and returns the ID.

This function searches for a given name in the protected variables and returns the respective id for it.

Return

integer id of the variable.

Parameters
  • name: Name of the protected variable to search

int FTI_RenameGroup(FTIT_H5Group *h5group, char *name)

Renames a HDF5 group.

This function renames HDF5 group defined by user.

Return

integer FTI_SCES if successful.

Parameters
  • h5group: H5 group that we want to rename

  • name: New name of the H5 group

int FTI_Protect(int id, void *ptr, int32_t count, fti_id_t tid)

It sets/resets the pointer and type to a protected variable.

This function stores a pointer to a data structure, its size, its ID, its number of elements and the type of the elements. This list of structures is the data that will be stored during a checkpoint and loaded during a recovery. It resets the pointer to a data structure, its size, its number of elements and the type of the elements if the dataset was already previously registered.

Return

integer FTI_SCES if successful.

Parameters
  • id: ID for searches and update.

  • ptr: Pointer to the data structure.

  • count: Number of elements in the data structure.

  • tid: The data type handle for the variable

int FTI_SetAttribute(int id, FTIT_attribute attribute, FTIT_attributeFlag flag)

it allows to add descriptive attributes to a protected variable

This function allows to set a descriptive attribute to a protected variable. The variable has to be protected and an ID assigned before the call. The flag can consist of any combination of the following flags: FTI_ATTRIBUTE_NAME FTI_ATTRIBUTE_DIM flags can be combined by using the bitwise or operator. The attributes will appear inside the meta data files when a checkpoint is taken. When setting the dimension of a dataset, the first dimension is the leading dimension, i.e. the dimension that is stored contiguous inside a flat matrix representation.

Return

integer FTI_SCES if successful.

Parameters
  • id: ID of the variable.

  • attribute: structure that holds the attributes values.

  • flag: flag to indicate which attributes to set.

int FTI_DefineGlobalDataset(int id, int rank, FTIT_hsize_t *dimLength, const char *name, FTIT_H5Group *h5group, fti_id_t tid)

Defines a global dataset (shared among application processes)

This function defines a global dataset which is shared among all ranks. In order to assign sub sets to the dataset the user has to call the function ‘FTI_AddSubset’. The parameter ‘did’ of that function, corres- ponds to the global dataset id define here.

Return

integer FTI_SCES if successful.

Parameters
  • id: ID of the dataset.

  • rank: Rank of the dataset.

  • dimLength: Dimention length for each rank.

  • name: Name of the dataset in HDF5 file.

  • h5group: Group of the dataset. If Null then “/”.

  • tid: FTI Data type handler

int FTI_AddSubset(int id, int rank, FTIT_hsize_t *offset, FTIT_hsize_t *count, int did)

Assigns a FTI protected variable to a global dataset.

This function assigns the protected dataset with ID ‘id’ to a global data- set with ID ‘did’. The parameters ‘offset’ and ‘count’ specify the selec- tion of the sub-set inside the global dataset (‘offset’ and ‘count’ cor- respond to ‘start’ and ‘count’ in the HDF5 function ‘H5Sselect_hyperslab’ For questions on what they define, please consult the HDF5 documentation.)

Return

integer FTI_SCES if successful.

Parameters
  • id: Corresponding variable ID.

  • rank: Rank of the dataset.

  • offset: Starting coordinates in global dataset.

  • count: number of elements for each coordinate.

  • did: Corresponding global dataset ID.

int FTI_UpdateGlobalDataset(int id, int rank, FTIT_hsize_t *dimLength)

Updates global dataset (shared among application processes)

updates only the rank and number of elements for each coordinate direction.

Parameters
  • id: ID of the dataset.

  • rank: Rank of the dataset.

  • dimLength: Dimention length for each rank.

int FTI_GetDatasetRank(int did)

returns rank of shared dataset

Return

integer rank of dataset.

Parameters
  • id: ID of the dataset.

FTIT_hsize_t *FTI_GetDatasetSpan(int did, int rank)

returns static array of dataset dimensions

Parameters
  • id: ID of the dataset.

  • rank: Rank of the dataset.

int FTI_RecoverDatasetDimension(int did)

loads dataset dimension from ckpt file to dataset ‘did’

Parameters
  • id: ID of the dataset.

int FTI_DefineDataset(int id, int rank, int *dimLength, char *name, FTIT_H5Group *h5group)

Defines the dataset.

This function gives FTI all information needed by HDF5 to correctly save the dataset in the checkpoint file.

Return

integer FTI_SCES if successful.

Parameters
  • id: ID for searches and update.

  • rank: Rank of the array

  • dimLength: Dimention length for each rank

  • name: Name of the dataset in HDF5 file.

  • h5group: Group of the dataset. If Null then “/”

int32_t FTI_GetStoredSize(int id)

Returns size saved in metadata of variable.

This function returns size of variable of given ID that is saved in metadata. This may be different from size of variable that is in the program. If this function it’s called when recovery it returns size from metadata file, if it’s called after checkpoint it returns size saved in temporary metadata. If there is no size saved in metadata it returns 0.

Return

int32_t Returns size of variable or 0 if size not saved.

Parameters
  • id: Variable ID.

void *FTI_Realloc(int id, void *ptr)

Reallocates dataset to last checkpoint size.

Return

ptr Pointer if successful, NULL otherwise This function loads the checkpoint data size from the metadata file, reallacates memory and updates data size information.

Parameters
  • id: Variable ID.

  • ptr: Pointer to the variable.

int FTI_BitFlip(int datasetID)

Bit-flip injection following the injection instructions.

This function injects the given number of bit-flips, at the given frequency and in the given location (rank, dataset, bit position).

Return

integer FTI_SCES if successful.

Parameters
  • datasetID: ID of the dataset where to inject.

int FTI_Checkpoint(int id, int level)

It takes the checkpoint and triggers the post-ckpt. work.

This function starts by blocking on a receive if the previous ckpt. was offline. Then, it updates the ckpt. information. It writes down the ckpt. data, creates the metadata and the post-processing work. This function is complementary with the FTI_Listen function in terms of communications.

Return

integer FTI_SCES if successful.

Parameters
  • id: Checkpoint ID.

  • level: Checkpoint level.

int FTI_InitICP(int id, int level, bool activate)

Initialize an incremental checkpoint.

This function defines the environment for the incremental checkpointing mechanism. The iCP mechanism consists of three functions: FTI_InitICP, FTI_AddVarICP and FTI_FinalizeICP. The two functions FTI_InitICP and FTI_FinalizeICP define the iCP region within the user may write the protected variables in any order. The iCP region is active, when the expression passed through ‘activate’ evaluates to TRUE.

Return

integer FTI_SCES if successful.

Parameters
  • id: Checkpoint ID.

  • level: Checkpoint level.

  • activate: Boolean expression.

Note

This function is not blocking for POSIX, FTI-FF and HDF5, but, blocking for MPI-IO. This is due to the collective open call in MPI_IO

int FTI_AddVarICP(int varID)

Write variable into the CP file.

With this function, the user may write the protected datasets in any order into the checkpoint file. However, before the call to FTI_FinalizeICP, all protected variables must have been written into the file.

Return

integer FTI_SCES if successful.

Parameters
  • id: Protected variable ID.

int FTI_FinalizeICP()

Finalize an incremental checkpoint.

This function finalizes an incremental checkpoint. In contrast to InitICP, this function is collective on the communicator FTI_COMM_WORLD and blocking.

Return

integer FTI_SCES if successful.

int FTI_Recover()

It loads the checkpoint data.

This function loads the checkpoint data from the checkpoint file and it updates some basic checkpoint information.

Return

integer FTI_SCES if successful.

int FTI_Snapshot()

Takes an FTI snapshot or recovers the data if it is a restart.

This function loads the checkpoint data from the checkpoint file in case of restart. Otherwise, it checks if the current iteration requires checkpointing, if it does it checks which checkpoint level, write the data in the files and it communicates with the head of the node to inform that a checkpoint has been taken. Checkpoint ID and counters are updated.

Return

integer FTI_SCES if successful.

int FTI_Finalize()

It closes FTI properly on the application processes.

This function notifies the FTI processes that the execution is over, frees some data structures and it closes. If this function is not called on the application processes the FTI processes will never finish (deadlock).

Return

integer FTI_SCES if successful.

int FTI_RecoverVar(int id)

Recovers given variable.

Return

integer FTI_SCES if successful.

Parameters
  • integer: id of variable to be recovered

Warning

doxygenfunction: Cannot find function “FTI_Print” in doxygen xml output for project “Fault Tolerance Library” from directory: ../Doxygen/xml