Skip to content

PavelNikolaichev/CNN_CUDA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CNN implementation on CUDA using C++

This is a very simplified implementation of a CNN (Convolutional Neural Network) using C++ and CUDA for GPU acceleration. This implementation is intended for educational purposes and may not be optimized for performance or accuracy. Additionally, some functionalities were omitted for simplicity - proper data loading, CLI and metrics, inference and persistence, etc.

I have tried to squeeze out as much performance as possible without making complex optimizations, however that came with some trade-offs due to my limited time - for example, there is a huge memory footprint because of improper iterator for data loading and copying in layers.

To simplify the data handling, I have decided to use templates for most of the cuda code, should make it much easier to handle different data types out of the box.

Installation and Running

To compile and run the code, you need to have a CUDA-capable GPU and the CUDA toolkit installed. You can follow these steps:

  1. Clone the repo. Modify main.cu to change hyperparameters or dataset paths if needed (note that I might use slightly different name for unpacked MNIST data).
  2. Compile using CMake (I recommend opening this project in CLion, since it handles most of the stuff automatically, don't forget to use proper compiler with CUDA included):
    cmake --build ./cmake-build-<build-type> --target CNN_CUDA -j $(nproc)
    Most likely you can use nvcc instead of CMake, I haven't tested it though.
  3. Run the executable:
     ./cnn_cuda

Ensure you are using MNIST dataset files in the same directory as the executable or modify the paths in the code accordingly.

I haven't added any CLI arguments, or proper inference because I have wanted to implement the core CNN in CUDA as a small challenge, I don't really think it's that important for this demo. Same about block_size and grid_size, they are hardcoded in most of the places, which is a subject for future improvement.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors