Skip to content
This repository was archived by the owner on Jul 7, 2023. It is now read-only.

Refine automatic mixed precision support via hyper param#1681

Merged
afrozenator merged 7 commits into
tensorflow:masterfrom
vinhngx:v1.14.0-AMP-hparams
Aug 30, 2019
Merged

Refine automatic mixed precision support via hyper param#1681
afrozenator merged 7 commits into
tensorflow:masterfrom
vinhngx:v1.14.0-AMP-hparams

Conversation

@vinhngx

@vinhngx vinhngx commented Aug 28, 2019

Copy link
Copy Markdown
Contributor

In continuation of #1637 and in response to @afrozenator 's comments in #1680

In this PR, we re-organize automatic mixed precision training support to provide a cleaner implementation and an easier interface via using hyper parameters.

In particular, GPU automatic mixed precision training can now be enabled via setting a flag (and correspondingly a so-named hyper-parameter) gpu_automatic_mixed_precision for all tensor2tensor models, for example:

Transformer

PROBLEM=translate_ende_wmt32k
MODEL=transformer
HPARAMS=transformer_big
DATA_DIR=/data/translate_ende_wmt32k
TRAIN_DIR=/tmp/$MODEL-$HPARAMS

t2t-trainer \
  --data_dir=$DATA_DIR \
  --problem=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR \
  --train_steps=100000 \
  --eval_steps=1000 \
  --gpu_automatic_mixed_precision=True

Resnet:

PROBLEM=image_imagenet224
MODEL=resnet
HPARAMS=resnet_50
DATA_DIR=/data/ImageNet
TRAIN_DIR=/tmp/$HPARAMS

t2t-trainer \
  --data_dir=$DATA_DIR \
  --problem=$PROBLEM \
  --model=$MODEL \
  --hparams_set=$HPARAMS \
  --output_dir=$TRAIN_DIR \
  --hparams='batch_size=256' \
  --worker_gpu=8 \
  --gpu_automatic_mixed_precision=True

This is opposed to the previous approaches of setting the OS flag TF_ENABLE_AUTO_MIXED_PRECISION which is a non-programatic approach, or passing the flag gpu_auto_mixed_precision directly to the optimizer (which will require modification of individual models to make call to optimizer with mixed precision training option).

@googlebot googlebot added the cla: yes PR author has signed CLA label Aug 28, 2019
Comment thread tensor2tensor/utils/optimize.py Outdated
opt = tf.contrib.tpu.CrossShardOptimizer(opt)
if gpu_auto_mixed_precision or os.environ.get(
"TF_ENABLE_AUTO_MIXED_PRECISION", "0") == "1":
if hparams.gpu_automatic_mixed_precision:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get(hparams, "gpu_automatic_mixed_precision", False) is preferable -- since people may pass an hparam that doesn't have this param -- for example in tests etc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good one. I fixed this

memory_height=1
memory_height=1,
# Whether to use GPU automatic mixed precision (via graph rewrite)
gpu_automatic_mixed_precision=False

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good, but as in your earlier PR, based on a flag can you set this to true?

i.e. after we make the hparams in t2t_trainer, based on your flag, flip this on

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just add the flag again to trainer and turn hparams on accordingly.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@afrozenator afrozenator left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments -- thanks for the changes

@vinhngx

vinhngx commented Aug 30, 2019

Copy link
Copy Markdown
Contributor Author

Thanks for the feedbacks @afrozenator . Let me know if the latest revision works.

@afrozenator

Copy link
Copy Markdown
Contributor

Thanks a lot @vinhngx for contributing this in the first place and now making it better!

Will merge it in shortly.

@vinhngx

vinhngx commented Aug 30, 2019

Copy link
Copy Markdown
Contributor Author

great thanks. I'm closing #1680 then.

@afrozenator afrozenator merged commit d973bc8 into tensorflow:master Aug 30, 2019
tensorflow-copybara pushed a commit that referenced this pull request Aug 30, 2019
PiperOrigin-RevId: 266390503
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

cla: yes PR author has signed CLA

3 participants