During this project, David, Achraf, and Hector trained various deep learning models to achieve machine translation from english to spanish. By the end, the variations we have on our models are either on the architecture level, or the embeddings used to feed the data to the models and these variations are the following:
- LSTM (Character-Level, Word-Level: Manual (Trainable Layer), BERT (Frozen), Word2Vec (Frozen))
- GRU (With and Without Attention)
The main file is a mini dashbord on terminal that makes choosing which architecture to train easier,more details about the setup are in the next part.
git clone https://github.com/ML-DL-Teaching/deep-learning-project-2025-dl_team_16.git
cd deep-learning-project-2025-dl_team_16.git├── main.py # Interactive CLI to run any model
├── environment.yml # Conda dependencies
├── data/ # Input data and processing scripts
│ ├── process_data.py # Preprocessing script
│ ├── spa.txt # Raw input text
│ └── processed.txt # Preprocessed output (generated)
├── EMBEDDINGS/ # Word/character embedding modules
│ ├── CARACTERLEVEL/
│ └── WORDLEVEL/
│ ├── BERT/
│ ├── MANUAL/
│ └── WORD2VEC/
├── MODELS/ # Deep learning model scripts
│ ├── LSTM/
│ │ ├── CARACTER LEVEL/
│ │ └── WORDLEVEL/
│ └── GRU/
│ ├── ATTENTION/
│ └── NO ATTENTION/
conda env create -f environment.yml
conda activate your-env-nameyou also need to install torch in the env with cuda support for our case 12.1
pip install torch==2.4.0+cu121 torchaudio==2.4.0+cu121 torchvision==0.19.0+cu121 \
--index-url https://download.pytorch.org/whl/cu121- Download Required Resources
If you're using Word2Vec:
Download GoogleNews-vectors-negative300.bin.gz
Place it in the data/ directory.
-
spa.txt is already in the github repo in data folder
-
Run the Data Preprocessing Script
python data/process_data.pyThis will generate processed.txt for training
python main.pyFollow the interactive prompts to choose and run the model you want.
You will be prompted to select:
Model Type
LSTM
GRU
Subtype (for LSTM)
Character-Level
Word-Level
Embedding Strategy (for Word-Level)
Manual
BERT
Word2Vec
After your selections, the corresponding training script will automatically be executed.
📝 Example run:
Select Model Type:
1. LSTM
2. GRU
> 1
LSTM selected. Choose level:
1. Character Level
2. Word Level
> 2
Choose embedding type:
1. Manual
2. BERT
3. Word2Vec
> 1
Running: MODELS/LSTM/WORDLEVEL/MANUAL/word_level_manual_lstm.py