APIs¶
-
int
FTI_Init
(const char *configFile, MPI_Comm globalComm)¶ Initializes FTI.
This function initializes the FTI context and prepares the heads to wait for checkpoints. FTI processes should never get out of this function. In case of a restart, checkpoint files should be recovered and in place at the end of this function.
- Return
integer FTI_SCES if successful.
- Parameters
configFile
: FTI configuration file.globalComm
: Main MPI communicator of the application.
-
int
FTI_Status
()¶ It returns the current status of the recovery flag.
This function returns the current status of the recovery flag.
- Return
integer FTI_Exec.reco.
-
int
FTI_InitType
(fti_id_t *type, int size)¶ Registers a new data type in FTI runtime.
This function initalizes a data type. The type is treated as a black box for FTI. Thus, the runtime only requires information about its size. Types built this was are saved as byte array when using HDF5 format.
- Return
int FTI_SCES on sucess, otherwise FTI_NSCS.
- Parameters
type
: Output parameter for the data type handle.size
: The data type size in bytes.
-
fti_id_t
FTI_InitCompositeType
(char *name, size_t size, FTIT_H5Group *h5g)¶ Initializes an empty composite data type.
Creates a composite data type that can contains other data types as fields. The fields can be added using FTI_AddScalarField and FTI_AddVectorField.
- Return
fti_id_t A handle to represent the new type.
- Parameters
name
: An optional type namesize
: The total size of the complex data typeh5g
: An optional H5 group identifier
-
int
FTI_AddScalarField
(fti_id_t id, char *name, fti_id_t fid, size_t offset)¶ Adds a scalar field to a composite type.
Adds a scalar field to a complex data type at a given offset.
- Return
integer FTI_SCES when successful, FTI_NSCS otherwise
- Warning
Do note that FTI does not check for memory boundaries within the data type. Specifying a wrong offset leads to undefined behavior. This can be avoided using the offsetof() macro.
- Parameters
id
: The composite data type handlename
: An optional field namefid
: The field data type handleoffset
: Offset of the field (use offsetof)
-
int
FTI_AddVectorField
(fti_id_t id, char *name, fti_id_t tid, size_t offset, int ndims, int *dim_sizes)¶ Adds an n-dimensional vector field to a composite data type.
Adds an N-dimensional array field to a complex data type at a given offset.
- Return
integer FTI_SCES when successful, FTI_NSCS otherwise
- Warning
Do note that FTI does not check for memory boundaries within the data type. Specifying a wrong offset leads to undefined behavior. This can be avoided using the offsetof() macro.
- Parameters
id
: The composite data type handlename
: The field namefid
: The field data type handleoffset
: Offset of the field (use offsetof)ndims
: The number of dimensions for the fielddim_size
: Array of lengths for each dimension
-
int
FTI_GetStageDir
(char *stageDir, int maxLen)¶ Places the FTI staging directory path into ‘stageDir’.
This function places the FTI staging directory path in ‘stageDir’. If allocation size is not sufficiant, no action is perfoprmed and FTI_NSCS is returned.
- Return
integer FTI_SCES if successful, FTI_NSCS else.
- Parameters
stageDir
: pointer to allocated memory region.maxLen
: size of allocated memory region in bytes.
-
int
FTI_GetStageStatus
(int ID)¶ Returns status of staging request.
This function returns the status of the staging request corresponding to ID. The ID is returned by the function ‘FTI_SendFile’. The status may be one of the five possible statuses:
- Return
integer Status of staging request on success, FTI_NSCS else.
- Parameters
ID
: ID of staging request.
FTI_SI_FAIL - Stage request failed FTI_SI_SCES - Stage request succeed FTI_SI_ACTV - Stage request is currently processed FTI_SI_PEND - Stage request is pending FTI_SI_NINI - There is no stage request with this ID
- Note
If the status is FTI_SI_NINI, the ID is either invalid or the request was finished (succeeded or failed). In the latter case, ‘FTI_GetStageStatus’ returns FTI_SI_FAIL or FTI_SI_SCES and frees the stage request ressources. In the consecutive call it will then return FTI_SI_NINI.
-
int
FTI_SendFile
(char *lpath, char *rpath)¶ Copies file asynchronously from ‘lpath’ to ‘rpath’.
This function may be used to copy a file local on the nodes via the FTI head process asynchronously to the PFS. The file will not be removed after successful transfer, however, if stored in the directory returned by ‘FTI_GetStageDir’ it will be removed during ‘FTI_Finalize’.
- Return
integer Request handle (ID) on success, FTI_NSCS else.
- Parameters
lpath
: absolute path local file.rpath
: absolute path remote file.
If staging is enabled but no head process, the staging will be performed synchronously (i.e. by the calling rank).
-
int
FTI_InitGroup
(FTIT_H5Group *h5group, char *name, FTIT_H5Group *parent)¶ It initialize a HDF5 group.
Initialize group defined by user. If parent is NULL this mean parent will be set to root group.
- Return
integer FTI_SCES if successful.
- Parameters
h5group
: H5 group that we want to initializename
: Name of the H5 groupparent
: Parent H5 group
-
int
FTI_setIDFromString
(char *name)¶ Searches in the protected variables for a name. If not found it allocates and returns the ID.
This function searches for a given name in the protected variables and returns the respective id for it.
- Return
integer id of the variable.
- Parameters
name
: Name of the protected variable to search
-
int
FTI_getIDFromString
(char *name)¶ Searches in the protected variables for a name. If not found it allocates and returns the ID.
This function searches for a given name in the protected variables and returns the respective id for it.
- Return
integer id of the variable.
- Parameters
name
: Name of the protected variable to search
-
int
FTI_RenameGroup
(FTIT_H5Group *h5group, char *name)¶ Renames a HDF5 group.
This function renames HDF5 group defined by user.
- Return
integer FTI_SCES if successful.
- Parameters
h5group
: H5 group that we want to renamename
: New name of the H5 group
-
int
FTI_Protect
(int id, void *ptr, int32_t count, fti_id_t tid)¶ It sets/resets the pointer and type to a protected variable.
This function stores a pointer to a data structure, its size, its ID, its number of elements and the type of the elements. This list of structures is the data that will be stored during a checkpoint and loaded during a recovery. It resets the pointer to a data structure, its size, its number of elements and the type of the elements if the dataset was already previously registered.
- Return
integer FTI_SCES if successful.
- Parameters
id
: ID for searches and update.ptr
: Pointer to the data structure.count
: Number of elements in the data structure.tid
: The data type handle for the variable
-
int
FTI_SetAttribute
(int id, FTIT_attribute attribute, FTIT_attributeFlag flag)¶ it allows to add descriptive attributes to a protected variable
This function allows to set a descriptive attribute to a protected variable. The variable has to be protected and an ID assigned before the call. The flag can consist of any combination of the following flags: FTI_ATTRIBUTE_NAME FTI_ATTRIBUTE_DIM flags can be combined by using the bitwise or operator. The attributes will appear inside the meta data files when a checkpoint is taken. When setting the dimension of a dataset, the first dimension is the leading dimension, i.e. the dimension that is stored contiguous inside a flat matrix representation.
- Return
integer FTI_SCES if successful.
- Parameters
id
: ID of the variable.attribute
: structure that holds the attributes values.flag
: flag to indicate which attributes to set.
-
int
FTI_DefineGlobalDataset
(int id, int rank, FTIT_hsize_t *dimLength, const char *name, FTIT_H5Group *h5group, fti_id_t tid)¶ Defines a global dataset (shared among application processes)
This function defines a global dataset which is shared among all ranks. In order to assign sub sets to the dataset the user has to call the function ‘FTI_AddSubset’. The parameter ‘did’ of that function, corres- ponds to the global dataset id define here.
- Return
integer FTI_SCES if successful.
- Parameters
id
: ID of the dataset.rank
: Rank of the dataset.dimLength
: Dimention length for each rank.name
: Name of the dataset in HDF5 file.h5group
: Group of the dataset. If Null then “/”.tid
: FTI Data type handler
-
int
FTI_AddSubset
(int id, int rank, FTIT_hsize_t *offset, FTIT_hsize_t *count, int did)¶ Assigns a FTI protected variable to a global dataset.
This function assigns the protected dataset with ID ‘id’ to a global data- set with ID ‘did’. The parameters ‘offset’ and ‘count’ specify the selec- tion of the sub-set inside the global dataset (‘offset’ and ‘count’ cor- respond to ‘start’ and ‘count’ in the HDF5 function ‘H5Sselect_hyperslab’ For questions on what they define, please consult the HDF5 documentation.)
- Return
integer FTI_SCES if successful.
- Parameters
id
: Corresponding variable ID.rank
: Rank of the dataset.offset
: Starting coordinates in global dataset.count
: number of elements for each coordinate.did
: Corresponding global dataset ID.
-
int
FTI_UpdateGlobalDataset
(int id, int rank, FTIT_hsize_t *dimLength)¶ Updates global dataset (shared among application processes)
updates only the rank and number of elements for each coordinate direction.
- Parameters
id
: ID of the dataset.rank
: Rank of the dataset.dimLength
: Dimention length for each rank.
-
int
FTI_GetDatasetRank
(int did)¶ returns rank of shared dataset
- Return
integer rank of dataset.
- Parameters
id
: ID of the dataset.
-
FTIT_hsize_t *
FTI_GetDatasetSpan
(int did, int rank)¶ returns static array of dataset dimensions
- Parameters
id
: ID of the dataset.rank
: Rank of the dataset.
-
int
FTI_RecoverDatasetDimension
(int did)¶ loads dataset dimension from ckpt file to dataset ‘did’
- Parameters
id
: ID of the dataset.
-
int
FTI_DefineDataset
(int id, int rank, int *dimLength, char *name, FTIT_H5Group *h5group)¶ Defines the dataset.
This function gives FTI all information needed by HDF5 to correctly save the dataset in the checkpoint file.
- Return
integer FTI_SCES if successful.
- Parameters
id
: ID for searches and update.rank
: Rank of the arraydimLength
: Dimention length for each rankname
: Name of the dataset in HDF5 file.h5group
: Group of the dataset. If Null then “/”
-
int32_t
FTI_GetStoredSize
(int id)¶ Returns size saved in metadata of variable.
This function returns size of variable of given ID that is saved in metadata. This may be different from size of variable that is in the program. If this function it’s called when recovery it returns size from metadata file, if it’s called after checkpoint it returns size saved in temporary metadata. If there is no size saved in metadata it returns 0.
- Return
int32_t Returns size of variable or 0 if size not saved.
- Parameters
id
: Variable ID.
-
void *
FTI_Realloc
(int id, void *ptr)¶ Reallocates dataset to last checkpoint size.
- Return
ptr Pointer if successful, NULL otherwise This function loads the checkpoint data size from the metadata file, reallacates memory and updates data size information.
- Parameters
id
: Variable ID.ptr
: Pointer to the variable.
-
int
FTI_BitFlip
(int datasetID)¶ Bit-flip injection following the injection instructions.
This function injects the given number of bit-flips, at the given frequency and in the given location (rank, dataset, bit position).
- Return
integer FTI_SCES if successful.
- Parameters
datasetID
: ID of the dataset where to inject.
-
int
FTI_Checkpoint
(int id, int level)¶ It takes the checkpoint and triggers the post-ckpt. work.
This function starts by blocking on a receive if the previous ckpt. was offline. Then, it updates the ckpt. information. It writes down the ckpt. data, creates the metadata and the post-processing work. This function is complementary with the FTI_Listen function in terms of communications.
- Return
integer FTI_SCES if successful.
- Parameters
id
: Checkpoint ID.level
: Checkpoint level.
-
int
FTI_InitICP
(int id, int level, bool activate)¶ Initialize an incremental checkpoint.
This function defines the environment for the incremental checkpointing mechanism. The iCP mechanism consists of three functions: FTI_InitICP, FTI_AddVarICP and FTI_FinalizeICP. The two functions FTI_InitICP and FTI_FinalizeICP define the iCP region within the user may write the protected variables in any order. The iCP region is active, when the expression passed through ‘activate’ evaluates to TRUE.
- Return
integer FTI_SCES if successful.
- Parameters
id
: Checkpoint ID.level
: Checkpoint level.activate
: Boolean expression.
- Note
This function is not blocking for POSIX, FTI-FF and HDF5, but, blocking for MPI-IO. This is due to the collective open call in MPI_IO
-
int
FTI_AddVarICP
(int varID)¶ Write variable into the CP file.
With this function, the user may write the protected datasets in any order into the checkpoint file. However, before the call to FTI_FinalizeICP, all protected variables must have been written into the file.
- Return
integer FTI_SCES if successful.
- Parameters
id
: Protected variable ID.
-
int
FTI_FinalizeICP
()¶ Finalize an incremental checkpoint.
This function finalizes an incremental checkpoint. In contrast to InitICP, this function is collective on the communicator FTI_COMM_WORLD and blocking.
- Return
integer FTI_SCES if successful.
-
int
FTI_Recover
()¶ It loads the checkpoint data.
This function loads the checkpoint data from the checkpoint file and it updates some basic checkpoint information.
- Return
integer FTI_SCES if successful.
-
int
FTI_Snapshot
()¶ Takes an FTI snapshot or recovers the data if it is a restart.
This function loads the checkpoint data from the checkpoint file in case of restart. Otherwise, it checks if the current iteration requires checkpointing, if it does it checks which checkpoint level, write the data in the files and it communicates with the head of the node to inform that a checkpoint has been taken. Checkpoint ID and counters are updated.
- Return
integer FTI_SCES if successful.
-
int
FTI_Finalize
()¶ It closes FTI properly on the application processes.
This function notifies the FTI processes that the execution is over, frees some data structures and it closes. If this function is not called on the application processes the FTI processes will never finish (deadlock).
- Return
integer FTI_SCES if successful.
-
int
FTI_RecoverVar
(int id)¶ Recovers given variable.
- Return
integer FTI_SCES if successful.
- Parameters
integer
: id of variable to be recovered
Warning
doxygenfunction: Cannot find function “FTI_Print” in doxygen xml output for project “Fault Tolerance Library” from directory: ../Doxygen/xml