colossalai.utils

colossalai.utils.checkpoint(function, *args)

Checkpoint the computation while preserve the rng states, modified from Pytorch torch.utils.checkpoint

Parameters
  • function – describe the forward pass function. It should know how to handle the input tuples.

  • args – tuple containing inputs to the function

Returns

Output of running function on *args

colossalai.utils.print_rank_0(msg, logger=None)

Print messages and save logs(optional). This is executed only if you are the rank-0 gpu.

Parameters
  • msg – A str message to output

  • logger – python logger object, defaults to None

colossalai.utils.sync_model_param_in_dp(model)

Make sure data parameters are consistent during Data Parallel Mode

Parameters

model – A pyTorch nn.model on whose parameters you check the consistency

colossalai.utils.clip_grad_norm_fp32(parameters, max_norm, norm_type=2)
Clips gradient norm of an iterable of parameters whose gradients

are in fp32.

This is adapted from torch.nn.utils.clip_grad.clip_grad_norm_ and added functionality to handle model parallel parameters. Note that the gradients are modified in place.

Parameters
  • parameters ((Iterable[Tensor] or Tensor)) – an iterable of Tensors or a single Tensor that will have gradients normalized

  • max_norm (float or int) – max norm of the gradients

  • norm_type (float or int) – type of the used p-norm. Can be 'inf' for infinity norm.

Returns

Total norm of the parameters (viewed as a single vector).

Return type

float

colossalai.utils.get_current_device()

Returns the index of a currently selected device (gpu/cpu).

colossalai.utils.synchronize()

Similar to cuda.synchronize(). Waits for all kernels in all streams on a CUDA device to complete.

colossalai.utils.empty_cache()

Similar to cuda.empty_cache() Releases all unoccupied cached memory currently held by the caching allocator.

colossalai.utils.set_to_cuda(models)

Send model to gpu.

Parameters

models – nn.module or a list of module

colossalai.utils.report_memory_usage(message, logger=None, report_cpu=False)

Calculate and print RAM usage (in GB)

Parameters
Raises

EnvironmentError – raise error if no distributed environment has been initialized

class colossalai.utils.Timer

A timer object which helps to log the execution times, and provides different tools to assess the times.

start()

Fisrtly synchronize cuda, reset the clock and then start the timer.

stop(keep_in_history=False)

Stop the timer and record the start-stop time interval. :param keep_in_history: whether does it record into history each start-stop interval, defaults to False :type keep_in_history: bool, optional :return: start-stop interval :rtype: int

get_history_mean()

mean of all history start-stop time intervals. :return: mean of time intervals :rtype: int

get_history_sum()

add up all the start-stop time intervals. :return: sum of time intervals :rtype: int

get_elapsed_time()

return the last start-stop time interval. use it only when timer is not in progress :return: the last time interval :rtype: int

reset()

clear up the timer and its history

class colossalai.utils.MultiTimer(on=True)

An object contains multiple timers

Parameters

on (bool) – whether the timer is enabled. Default is True

start(name)

Start namely one of the timers :param name: timer’s key :type name: str

stop(name, keep_in_history)

Stop namely one of the timers. :param name: timer’s key :param keep_in_history: whether does it record into history each start-stop interval :type keep_in_history: bool

get_timer(name)

Get timer by its name (from multitimer) :param name: timer’s key :return: timer with the name you give correctly :rtype: Timer

reset(name=None)

Reset timers. :param name: if name is designated, the named timer will be reset and others will not, defaults to None

colossalai.utils.accumulate_gradient(model, optimizer, dataloader, accumulate_size, gradient_handlers=None, lr_scheduler=None)
Parameters
  • model (torch.nn.Module) – your model object

  • optimizer (torch.optim.Optimizer) – your optimizer object

  • dataloader (Iterable) – your dataloader object

  • accumulate_size (int) – the number of steps to accumulate gradients

  • gradient_handlers (List[colossalai.engine.BaseGradientHandler]) – list of gradient handler objects. Default is None

  • lr_scheduler (torch.optim.lr_scheduler._LRScheduler) – your lr scheduler object. Default is None

class colossalai.utils.DataParallelSampler(dataset, shuffle=False, seed=0, drop_last=False)

A data sampler for distributed data parallelism

Parameters
  • dataset (torch.utils.data.Dataset) – a Dataset instance

  • shuffle (bool, optional) – whether to shuffle data, defaults to False

  • seed (int, optional) – the random seed, defaults to 0

  • drop_last (bool, optional) – set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller, defaults to False

set_epoch(epoch)

Sets the epoch for this sampler. When shuffle=True, this ensures all replicas use a different random ordering for each epoch. Otherwise, the next iteration of this sampler will yield the same ordering.

Parameters

epoch (int) – Epoch number.

colossalai.utils.get_dataloader(dataset, shuffle=False, seed=1024, add_sampler=True, drop_last=False, pin_memory=False, num_workers=0, **kwargs)

Set up a deterministic dataloader (also configure seed workers, samplers and whether shuffle or not)

Parameters
  • dataset (torch.utils.data.Dataset) – a :class:utils.data.dataset dataset

  • shuffle (bool, optional. Default is False) – whether to shuffle the dataset

  • seed (int, optional. Default is 1024) – random worker seed, defaults to 1024

  • add_sampler (bool, optional. Default is True) – add DistributedDataParallelSampelr to the dataset

  • drop_last (bool, optional. Default is False) – drop the last incomplete batch of data

  • pin_memory (bool, optional. Default is False) – whether to pin memory address in CPU memory

  • num_workers (int, optional. Default is 0) – number of worker threads for this dataloader

Returns

a object of torch.utils.data.DataLoader

Return type

torch.utils.data.DataLoader