Colossal-AI provides a collection of parallel training components for you. We aim to support you with your development of distributed deep learning models just like how you write single-GPU deep learning models. ColossalAI provides easy-to-use APIs to help you kickstart your training process. To better how ColossalAI works, we recommend you to read this documentation in the following order.
- If you are not familiar with distributed system or have never used Colossal-AI, you should first jump into the
Conceptssection to get a sense of what we are trying to achieve. This section can provide you with some background knowledge on distributed training as well.
- Next, you can follow the
basicstutorials. This section will cover the details about how to use Colossal-AI.
- Afterwards, you can try out the features provided in Colossal-AI by reading
featuressection. We will provide a codebase for each tutorial. These tutorials will cover the basic usage of Colossal-AI to realize simple functions such as data parallel and mixed precision training.
- Lastly, if you wish to apply more complicated techniques such as how to run hybrid parallel on GPT-3, the
advanced tutorialssection is the place to go!
We always welcome suggestions and discussions from the community, and we would be more than willing to help you if you encounter any issue. You can raise an issue here or create a discussion topic in the forum.