Skip to content

Conversation

eracah
Copy link
Contributor

@eracah eracah commented Aug 17, 2022

This PR:

  • removes log_data from LoggerDestination and its subclasses
  • adds log_metrics, and log_hyperparameters to LoggerDestination to FileLogger, InMemoryLogger, ProgressBarLogger, WandBLogger, and TensorboardLogger
  • Adds log_traces to FileLogger
  • removes data, data_fit, data_epoch, and data_batch from Logger
  • adds log_traces, log_metrics, and log_hyperparameters to Logger
  • replaces all log_data calls with log_metrics, log_hyperparameters, or log_traces calls
  • replaces all data_fit calls with log_hyperparameters
  • replaces all data_epoch and data_batch calls with log_metrics
  • removes all LogLevel stuff except for in log_artifacts
  • Along for the ride:
    • moved train metrics before next_batch call, so that the batch is correct for train metrics
    • reordered the hooks in pre-commit, so that pyright is last.
    • configured isort to fix the wandb import bug
  • reformats how things are logged. Examples below

Reformatting Examples

FileLogger

  • FileLogger before:
[FIT][batch=0]: { "blurpool/num_blurpool_layers": 0, "blurpool/num_blurconv_layers": 0, }
[EPOCH][batch=0]: { "epoch": 0, }
[EPOCH][batch=469]: { "metrics/train/Accuracy": 0.8292, }
  • FileLogger after:
[hyperparameter]: blurpool/num_blurpool_layers: 0 
[hyperparameter]: blurpool/num_blurconv_layers: 0
[metric][batch=0]: trainer/epoch: 0 
[metric][batch=469]: metrics/train/Accuracy: 0.9453 

ProgressBarLogger

  • ProgressBarLogger log to console before:
[FIT]: { "blurpool/num_blurpool_layers": 0, "blurpool/num_blurconv_layers": 0, }
[EPOCH][batch=0/512]: { "epoch": 0, }
[EPOCH][batch=469/512]: { "metrics/train/Accuracy": 0.8292, }
  • ProgressBarLogger log to console after:
[hyperparameter]: blurpool/num_blurpool_layers: 0
[hyperparameter]: blurpool/num_blurconv_layers: 0
[batch=0/512]: trainer/epoch: 0
[batch=469/512]: metrics/train/Accuracy: 0.9609
  • ProgressBarLogger locally before:
eval           Batch    128:  100%|█████████████████████████| 157/157 [00:01<00:00, 132.27ba/s, metrics/eval/Accuracy=0.8830]                 
eval           Batch    256:  100%|█████████████████████████| 157/157 [00:01<00:00, 127.26ba/s, metrics/eval/Accuracy=0.9390]                 
eval           Batch    384:  100%|█████████████████████████| 157/157 [00:01<00:00, 130.67ba/s, metrics/eval/Accuracy=0.9522]                 
eval           Batch    512:  100%|█████████████████████████| 157/157 [00:00<00:00, 161.95ba/s, metrics/eval/Accuracy=0.9571]                 
train                         100%|█████████████████████████| 512/512 [00:23<00:00, 35.27ba/s, loss/train=0.1819]
  • ProgressBarLogger locally after (no change?):
eval           Batch    128:  100%|█████████████████████████| 157/157 [00:01<00:00, 125.96ba/s, metrics/eval/Accuracy=0.9083]         
eval           Batch    256:  100%|█████████████████████████| 157/157 [00:01<00:00, 125.61ba/s, metrics/eval/Accuracy=0.9239]         
eval           Batch    384:  100%|█████████████████████████| 157/157 [00:01<00:00, 120.01ba/s, metrics/eval/Accuracy=0.9494]         
eval           Batch    512:  100%|█████████████████████████| 157/157 [00:01<00:00, 128.42ba/s, metrics/eval/Accuracy=0.9321]         
train                         100%|█████████████████████████| 512/512 [00:23<00:00, 31.37ba/s, loss/train=0.9353] 

ProgressBarLogger in a notebook before:
Screen Shot 2022-08-17 at 2 16 15 PM

ProgressBarLogger in a notebook after (no change?):
Screen Shot 2022-08-17 at 2 17 36 PM

ProgressbarLogger from remote logs:

  3% 1/29 [00:00<00:20,  1.39ba/s, loss/train=0.4214]
  3% 1/29 [00:01<00:20,  1.39ba/s, loss/train=0.3871]
  7% 2/29 [00:01<00:18,  1.45ba/s, loss/train=0.3871]
  7% 2/29 [00:02<00:18,  1.45ba/s, loss/train=0.3705]
 10% 3/29 [00:02<00:17,  1.49ba/s, loss/train=0.3705]
 10% 3/29 [00:02<00:17,  1.49ba/s, loss/train=0.3852]
 14% 4/29 [00:02<00:16,  1.48ba/s, loss/train=0.3852]
 14% 4/29 [00:03<00:16,  1.48ba/s, loss/train=0.3254]

ProgressBarLogger in remote interactive session:
Screen Shot 2022-08-23 at 12 07 08 PM

ProgressBarLogger log_to_console remote:

[epoch=2][batch=27/29]: loss/train: 0.1747
[epoch=2][batch=28/29]: trainer/global_step: 86
[epoch=2][batch=28/29]: trainer/batch_idx: 28
[epoch=2][batch=28/29]: trainer/grad_accum: 1
[epoch=2][batch=28/29]: loss/train: 0.1650
[epoch=3][batch=0/10]: epoch: 3
[epoch=3][batch=0/10]: trainer/global_step: 87
[epoch=3][batch=0/10]: metrics/eval/CrossEntropy: 0.1488
[epoch=3][batch=0/10]: metrics/eval/Accuracy: 0.9547

Solves JIRA Issues:
CO-585
CO-916
CO-207
CO-838

@eracah eracah requested a review from mvpatel2000 August 17, 2022 00:31
@eracah eracah mentioned this pull request Aug 17, 2022
23 tasks
@mvpatel2000
Copy link
Contributor

Why does the before have empty algo charts?

Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a massive lift -- huge kudos! No comments on the actual implementation -- it looks the same as the last PR / previous review? so lgtm. Put a few minor comments otherwise.

I'm happy to approve post-tests looking good. Definitely want to make sure we test all the loggers before rolling it out. Maybe add some research folks who use the loggers more often to validate the usage pictures look good? Besides that, this looks good to merge for me.

As a side, it looks like the diff is weird and showing things from another PR I merged into dev. Maybe because this PR hasn't been merged up or something...

@eracah
Copy link
Contributor Author

eracah commented Aug 17, 2022

Why does the before have empty algo charts?

these were the traces that were getting logged as metrics

@eracah eracah marked this pull request as ready for review August 18, 2022 00:17
@eracah eracah requested review from a team, bandish-shah and dskhudia as code owners August 18, 2022 00:17
@mvpatel2000
Copy link
Contributor

I'll be OOO, but this lgtm. Not approving because some lint stuff is off, but the logic looks great. Thanks for dealing with some nasty merges

@eracah eracah requested a review from mvpatel2000 August 18, 2022 17:26
Copy link
Contributor

@hanlint hanlint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, a significant improvement in our logging API. I am OK with the documentation deferring to a different PR.

But had one significant concern wrt the API (noted in comment below), perhaps we can close offline.

@eracah
Copy link
Contributor Author

eracah commented Aug 22, 2022

  • accuracy metrics don't show up in wandb when using yahp entrypoint, but they do show up when using a custom entrypoint

fixed: error caused by passing step explicitly to wandb.log. If the step value is less than wandb's internal step counting then the data point gets dropped

  • I can't figure out why jenkins is failing

strange isort issue, where I have to manually reorder the imports in wandb_logger.py

Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Can we open Jira for progressbarlogger logs?

I suspect this will have some issues we'll discover with internal testing, but no reason to hold this off any longer.

Copy link
Contributor

@dskhudia dskhudia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is already reviewed, accepting to unblock.

@eracah eracah requested review from hanlint and bandish-shah and removed request for a team, bandish-shah and hanlint August 23, 2022 18:47
Copy link
Contributor

@hanlint hanlint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made some comments here -- looking good! Using Request Changes to hold this PR until #1419 is merged.

Copy link
Contributor

@hanlint hanlint left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock now that #1419 is merged

@eracah
Copy link
Contributor Author

eracah commented Aug 26, 2022

Approving to unblock now that #1419 is merged

🙌

@eracah eracah merged commit 8d4bf9c into mosaicml:dev Aug 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants