🌐 AI搜索 & 代理 主页
Skip to content

2j2h5/attn-encoder-decoder

Repository files navigation

Encdoer-Decoder Model with Attention

This repository implements both the LSTM Encoder-Deocder Model and the LSTM Encoder-Decoder Model with attention. The models are trained using the Multi30k en-de dataset. Their implementations are based on the following papers: the LSTM Encoder-Decoder Model is based on this paper, and the LSTM Encoder-Decoder model with attention is based on this paper.

Requirements

Install the required packages: datasets, torch, matplotlib, nltk

pip install -r requirements.txt

Additionally, you can utilize your GPU for training via CUDA. However, please note that when using CUDA, you need to ensure that its version is compatible with the version of torch being used.

Datasets

This repository provides implementations for two datasets: WMT14 fr-en and Multi30k en-de. Data preprocessing can be performed using the data.py script. You can adjust the settings by modifying the variables dataset, src_lang, tgt_lang, and reverse_flag. The dataset variable accepts either "wmt14" or "multi30k". For the src_lang and tgt_lang pairs, the WMT14 dataset supports "en" and "fr" (or vice versa), while the Multi30k dataset supports "en" and "de" (or vice versa). When reverse_flag is set to True, the input sequence is reversed during preprocessing. Additionally, due to the large size of the WMT14 dataset, you can pass a percent parameter (a value between 0.01 and 1) as the second argument to the load_data function to download only a fraction of the dataset.

WMT14 WMT14(1%) Multi30k
# of sentences 36M 326693 29000
Vocab size(en) 380M 108130 10475
Vocab size(fr/de) 400M 141797 19338

Encoder-Decoder Model

Encoder-Decoder Model diagram
The Encoder-Decoder model is implemented based on the paper. You can train the model using the train.py script. The training configuration can be adjusted by modifying the following variables: set the dataset with pkl_file, src_lang, and tgt_lang; configure the model parameters with max_target_length, hidden_size, num_layers, and batch_size; adjust the training parameters with learning_rate, teacher_forcing_ratio, num_iters, and clip_value; and set debugging parameters with print_every and num_samples. At the end of traning, a random sample of size num_samples is drawn from the validation dataset to devaluate the translation performance. Additionally, the BLEU score is checked, and the training results are visualized using the loss curve and TensorBoard.
Encoder-Decoder loss curve

BLEU Score 23.97

Encoder-Decoder Model with Attention

Encoder-Decoder Model with Attention diagram
The Encoder-Decoder Model with attention is implemented based on the paper. You can train the model using the train_attention_model.py script. The training configuration is similar to that of the previous Encoder-Decoder model, where you can modify the same variables. Additionally, you can adjust the dropout rate via the dropout_p variable. The process after training completion remains the same.
Attention Encoder-Decoder loss curve

BLEU Score 24.88

About

Implementation of LSTM Encoder-Decoder model with Attention

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages