APIs¶
-
int
FTI_Init
(const char *configFile, MPI_Comm globalComm)¶ Initializes FTI.
This function initializes the FTI context and prepares the heads to wait for checkpoints. FTI processes should never get out of this function. In case of a restart, checkpoint files should be recovered and in place at the end of this function.
- Return
integer FTI_SCES if successful.
- Parameters
configFile
: FTI configuration file.globalComm
: Main MPI communicator of the application.
-
int
FTI_Status
()¶ It returns the current status of the recovery flag.
This function returns the current status of the recovery flag.
- Return
integer FTI_Exec.reco.
-
int
FTI_InitType
(FTIT_type *type, int size)¶ It initializes a data type.
This function initalizes a data type. The only information needed is the size of the data type, the rest is black box for FTI. Types saved as byte array in case of HDF5 format.
- Return
integer FTI_SCES if successful.
- Parameters
type
: The data type to be intialized.size
: The size of the data type to be intialized.
-
int
FTI_InitComplexType
(FTIT_type *newType, FTIT_complexType *typeDefinition, int length, size_t size, char *name, FTIT_H5Group *h5group)¶ It initializes a complex data type.
This function initalizes a simple data type. New type can only consists fields of flat FTI types (no arrays). Type definition must include:
length => number of fields in the new type
field[].type => types of the field in the new type
field[].name => name of the field in the new type
field[].rank => number of dimentions of the field
field[].dimLength[] => length of each dimention of the field
- Return
integer FTI_SCES if successful.
- Parameters
newType
: The data type to be intialized.typeDefinition
: Structure definition of the new type.length
: Number of fields in structuresize
: Size of the structure.name
: Name of the structure.h5group
: Group of the type.
-
void
FTI_AddSimpleField
(FTIT_complexType *typeDefinition, FTIT_type *ftiType, size_t offset, int id, char *name)¶ It adds a simple field in complex data type.
This function adds a field to the complex datatype. Use offsetof macro to set offset. First ID must be 0, next one must be +1. If name is NULL FTI will set “T${id}” name. Sets rank and dimLength to 1.
- Return
integer FTI_SCES if successful.
- Parameters
typeDefinition
: Structure definition of the complex data type.ftiType
: Type of the fieldoffset
: Offset of the field (use offsetof)id
: Id of the field (start with 0)name
: Name of the field (put NULL if want default)
-
void
FTI_AddComplexField
(FTIT_complexType *typeDefinition, FTIT_type *ftiType, size_t offset, int rank, int *dimLength, int id, char *name)¶ It adds a simple field in complex data type.
This function adds a field to the complex datatype. Use offsetof macro to set offset. First ID must be 0, next one must be +1. If name is NULL FTI will set “T${id}” name.
- Return
integer FTI_SCES if successful.
- Parameters
typeDefinition
: Structure definition of the complex data type.ftiType
: Type of the fieldoffset
: Offset of the field (use offsetof)rank
: Rank of the arraydimLength
: Dimention length for each rankid
: Id of the field (start with 0)name
: Name of the field (put NULL if want default)
-
int
FTI_GetStageDir
(char *stageDir, int maxLen)¶ Places the FTI staging directory path into ‘stageDir’.
This function places the FTI staging directory path in ‘stageDir’. If allocation size is not sufficiant, no action is perfoprmed and FTI_NSCS is returned.
- Return
integer FTI_SCES if successful, FTI_NSCS else.
- Parameters
stageDir
: pointer to allocated memory region.maxLen
: size of allocated memory region in bytes.
-
int
FTI_GetStageStatus
(int ID)¶ Returns status of staging request.
This function returns the status of the staging request corresponding to ID. The ID is returned by the function ‘FTI_SendFile’. The status may be one of the five possible statuses:
- Return
integer Status of staging request on success, FTI_NSCS else.
- Parameters
ID
: ID of staging request.
FTI_SI_FAIL - Stage request failed FTI_SI_SCES - Stage request succeed FTI_SI_ACTV - Stage request is currently processed FTI_SI_PEND - Stage request is pending FTI_SI_NINI - There is no stage request with this ID
- Note
If the status is FTI_SI_NINI, the ID is either invalid or the request was finished (succeeded or failed). In the latter case, ‘FTI_GetStageStatus’ returns FTI_SI_FAIL or FTI_SI_SCES and frees the stage request ressources. In the consecutive call it will then return FTI_SI_NINI.
-
int
FTI_SendFile
(char *lpath, char *rpath)¶ Copies file asynchronously from ‘lpath’ to ‘rpath’.
This function may be used to copy a file local on the nodes via the FTI head process asynchronously to the PFS. The file will not be removed after successful transfer, however, if stored in the directory returned by ‘FTI_GetStageDir’ it will be removed during ‘FTI_Finalize’.
- Return
integer Request handle (ID) on success, FTI_NSCS else.
- Parameters
lpath
: absolute path local file.rpath
: absolute path remote file.
If staging is enabled but no head process, the staging will be performed synchronously (i.e. by the calling rank).
-
int
FTI_InitGroup
(FTIT_H5Group *h5group, char *name, FTIT_H5Group *parent)¶ It initialize a HDF5 group.
Initialize group defined by user. If parent is NULL this mean parent will be set to root group.
- Return
integer FTI_SCES if successful.
- Parameters
h5group
: H5 group that we want to initializename
: Name of the H5 groupparent
: Parent H5 group
-
int
FTI_setIDFromString
(char *name)¶ Searches in the protected variables for a name. If not found it allocates and returns the ID.
This function searches for a given name in the protected variables and returns the respective id for it.
- Return
integer id of the variable.
- Parameters
name
: Name of the protected variable to search
-
int
FTI_getIDFromString
(char *name)¶ Searches in the protected variables for a name. If not found it allocates and returns the ID.
This function searches for a given name in the protected variables and returns the respective id for it.
- Return
integer id of the variable.
- Parameters
name
: Name of the protected variable to search
-
int
FTI_RenameGroup
(FTIT_H5Group *h5group, char *name)¶ Renames a HDF5 group.
This function renames HDF5 group defined by user.
- Return
integer FTI_SCES if successful.
- Parameters
h5group
: H5 group that we want to renamename
: New name of the H5 group
-
int
FTI_Protect
(int id, void *ptr, int32_t count, FTIT_type type)¶ It sets/resets the pointer and type to a protected variable.
This function stores a pointer to a data structure, its size, its ID, its number of elements and the type of the elements. This list of structures is the data that will be stored during a checkpoint and loaded during a recovery. It resets the pointer to a data structure, its size, its number of elements and the type of the elements if the dataset was already previously registered.
- Return
integer FTI_SCES if successful.
- Parameters
id
: ID for searches and update.ptr
: Pointer to the data structure.count
: Number of elements in the data structure.type
: Type of elements in the data structure.
-
int
FTI_DefineGlobalDataset
(int id, int rank, FTIT_hsize_t *dimLength, const char *name, FTIT_H5Group *h5group, FTIT_type type)¶ Defines a global dataset (shared among application processes)
This function defines a global dataset which is shared among all ranks. In order to assign sub sets to the dataset the user has to call the function ‘FTI_AddSubset’. The parameter ‘did’ of that function, corres- ponds to the global dataset id define here.
- Return
integer FTI_SCES if successful.
- Parameters
id
: ID of the dataset.rank
: Rank of the dataset.dimLength
: Dimention length for each rank.name
: Name of the dataset in HDF5 file.h5group
: Group of the dataset. If Null then “/”.type
: FTI type of the dataset.
-
int
FTI_AddSubset
(int id, int rank, FTIT_hsize_t *offset, FTIT_hsize_t *count, int did)¶ Assigns a FTI protected variable to a global dataset.
This function assigns the protected dataset with ID ‘id’ to a global data- set with ID ‘did’. The parameters ‘offset’ and ‘count’ specify the selec- tion of the sub-set inside the global dataset (‘offset’ and ‘count’ cor- respond to ‘start’ and ‘count’ in the HDF5 function ‘H5Sselect_hyperslab’ For questions on what they define, please consult the HDF5 documentation.)
- Return
integer FTI_SCES if successful.
- Parameters
id
: Corresponding variable ID.rank
: Rank of the dataset.offset
: Starting coordinates in global dataset.count
: number of elements for each coordinate.did
: Corresponding global dataset ID.
-
int
FTI_UpdateGlobalDataset
(int id, int rank, FTIT_hsize_t *dimLength)¶ Updates global dataset (shared among application processes)
updates only the rank and number of elements for each coordinate direction.
- Parameters
id
: ID of the dataset.rank
: Rank of the dataset.dimLength
: Dimention length for each rank.
-
int
FTI_GetDatasetRank
(int did)¶ returns rank of shared dataset
- Return
integer rank of dataset.
- Parameters
id
: ID of the dataset.
-
FTIT_hsize_t *
FTI_GetDatasetSpan
(int did, int rank)¶ returns static array of dataset dimensions
- Parameters
id
: ID of the dataset.rank
: Rank of the dataset.
-
int
FTI_RecoverDatasetDimension
(int did)¶ loads dataset dimension from ckpt file to dataset ‘did’
- Parameters
id
: ID of the dataset.
-
int
FTI_DefineDataset
(int id, int rank, int *dimLength, char *name, FTIT_H5Group *h5group)¶ Defines the dataset.
This function gives FTI all information needed by HDF5 to correctly save the dataset in the checkpoint file.
- Return
integer FTI_SCES if successful.
- Parameters
id
: ID for searches and update.rank
: Rank of the arraydimLength
: Dimention length for each rankname
: Name of the dataset in HDF5 file.h5group
: Group of the dataset. If Null then “/”
-
int32_t
FTI_GetStoredSize
(int id)¶ Returns size saved in metadata of variable.
This function returns size of variable of given ID that is saved in metadata. This may be different from size of variable that is in the program. If this function it’s called when recovery it returns size from metadata file, if it’s called after checkpoint it returns size saved in temporary metadata. If there is no size saved in metadata it returns 0.
- Return
int32_t Returns size of variable or 0 if size not saved.
- Parameters
id
: Variable ID.
-
void *
FTI_Realloc
(int id, void *ptr)¶ Reallocates dataset to last checkpoint size.
- Return
ptr Pointer if successful, NULL otherwise This function loads the checkpoint data size from the metadata file, reallacates memory and updates data size information.
- Parameters
id
: Variable ID.ptr
: Pointer to the variable.
-
int
FTI_BitFlip
(int datasetID)¶ Bit-flip injection following the injection instructions.
This function injects the given number of bit-flips, at the given frequency and in the given location (rank, dataset, bit position).
- Return
integer FTI_SCES if successful.
- Parameters
datasetID
: ID of the dataset where to inject.
-
int
FTI_Checkpoint
(int id, int level)¶ It takes the checkpoint and triggers the post-ckpt. work.
This function starts by blocking on a receive if the previous ckpt. was offline. Then, it updates the ckpt. information. It writes down the ckpt. data, creates the metadata and the post-processing work. This function is complementary with the FTI_Listen function in terms of communications.
- Return
integer FTI_SCES if successful.
- Parameters
id
: Checkpoint ID.level
: Checkpoint level.
-
int
FTI_InitICP
(int id, int level, bool activate)¶ Initialize an incremental checkpoint.
This function defines the environment for the incremental checkpointing mechanism. The iCP mechanism consists of three functions: FTI_InitICP, FTI_AddVarICP and FTI_FinalizeICP. The two functions FTI_InitICP and FTI_FinalizeICP define the iCP region within the user may write the protected variables in any order. The iCP region is active, when the expression passed through ‘activate’ evaluates to TRUE.
- Return
integer FTI_SCES if successful.
- Parameters
id
: Checkpoint ID.level
: Checkpoint level.activate
: Boolean expression.
- Note
This function is not blocking for POSIX, FTI-FF and HDF5, but, blocking for MPI-IO. This is due to the collective open call in MPI_IO
-
int
FTI_AddVarICP
(int varID)¶ Write variable into the CP file.
With this function, the user may write the protected datasets in any order into the checkpoint file. However, before the call to FTI_FinalizeICP, all protected variables must have been written into the file.
- Return
integer FTI_SCES if successful.
- Parameters
id
: Protected variable ID.
-
int
FTI_FinalizeICP
()¶ Finalize an incremental checkpoint.
This function finalizes an incremental checkpoint. In contrast to InitICP, this function is collective on the communicator FTI_COMM_WORLD and blocking.
- Return
integer FTI_SCES if successful.
-
int
FTI_Recover
()¶ It loads the checkpoint data.
This function loads the checkpoint data from the checkpoint file and it updates some basic checkpoint information.
- Return
integer FTI_SCES if successful.
-
int
FTI_Snapshot
()¶ Takes an FTI snapshot or recovers the data if it is a restart.
This function loads the checkpoint data from the checkpoint file in case of restart. Otherwise, it checks if the current iteration requires checkpointing, if it does it checks which checkpoint level, write the data in the files and it communicates with the head of the node to inform that a checkpoint has been taken. Checkpoint ID and counters are updated.
- Return
integer FTI_SCES if successful.
-
int
FTI_Finalize
()¶ It closes FTI properly on the application processes.
This function notifies the FTI processes that the execution is over, frees some data structures and it closes. If this function is not called on the application processes the FTI processes will never finish (deadlock).
- Return
integer FTI_SCES if successful.
-
int
FTI_RecoverVar
(int id)¶ Recovers given variable.
- Return
integer FTI_SCES if successful.
- Parameters
integer
: id of variable to be recovered
Warning
doxygenfunction: Cannot find function “FTI_Print” in doxygen xml output for project “Fault Tolerance Library” from directory: ../Doxygen/xml