Skip to content

Releases: mosaicml/composer

v0.11.1

16 Nov 21:41
Compare
Choose a tag to compare

πŸš€ Composer v0.11.1

Composer v0.11.1 is released! Install via pip:

pip install --upgrade mosaicml==0.11.1

Bug Fixes

  • Fixes for Notebooks (#1659)
  • Documentation updates and fixes (#1685, #1696, #1702, #1709)
  • Addressed warnings and speed improvements for Torchmetrics (#1674)
  • Fixes to Gated Linear Units method (#1575, #1689)
  • Set NCCL_ASYNC_ERROR_HANDLING ENV variable in Composer launcher to enable distributed timeout (#1695)
  • Fix epoch count when eval is called before fit (#1697)
  • Constrain PyTorch package versions to avoid unintended upgrades (#1688)
  • Fix Optimizer state sharding issue with FSDP (#1732)
  • Rase ValueError with if evaluation dataloader of infinite length is specified

Full Changelog: v0.11.0...v0.11.1

v0.11.0

25 Oct 00:36
Compare
Choose a tag to compare

πŸš€ Composer v0.11.0

Composer v0.11.0 is released! Install via pip:

pip install --upgrade mosaicml==0.11.0

New Features

  1. 🧰 FSDP Beta Support

    Composer now supports PyTorch FSDP! PyTorch FSDP is a strategy for distributed training, similar to PyTorch DDP, that distributes work using data-parallelism only. On top of this, FSDP uses model, gradient, and optimizer sharding to dramatically reduce device memory requirements, and enables users to easily scale and train large models.

    Here's how easy it is to use FSDP with Composer:

    import torch.nn as nn
    from composer import Trainer
    
    class Block (nn.Module):
        ...
    
    # Your custom model
    class Model(nn.Module):
        def __init__(self, n_layers):
            super().__init__()
            self.blocks = nn.ModuleList([
                Block(...) for _ in range(n_layers)
            ]),
            self.head = nn.Linear(...)
        def forward(self, inputs):
            ...
    
        # FSDP Wrap Function
        def fsdp_wrap_fn(self, module):
            return isinstance(module, Block)
    
        # Activation Checkpointing Function
        def activation_checkpointing_fn(self, module):
            return isinstance(module, Block)
    
    # ComposerModel wrapper, used by the Trainer
    # to compute loss, metrics, etc.
    class MyComposerModel(ComposerModel):
    
        def __init__(self, n_layers):
            super().__init__()
            self.model = Model(n_layers)
            ...
    
        def forward(self, batch):
            ...
    
        def eval_forward(self, batch, outputs=None):
            ...
    
        def loss(self, outputs, batch):
            ...
    
    # Pass your ComposerModel and fsdp_config into the Trainer
    composer_model = MyComposerModel(n_layers=3)
    fsdp_config = {
        'sharding_strategy': 'FULL_SHARD',
        'min_params': 1e8,
        'cpu_offload': False, # Not supported yet
        'mixed_precision': 'DEFAULT',
        'backward_prefetch': 'BACKWARD_POST',
        'activation_checkpointing': False,
        'activation_cpu_offload': False,
        'verbose': True
    }
    
    trainer = Trainer(
        model=composer_model,
        fsdp_config=fsdp_config,
        ...
    )
    
    trainer.fit()

    For more information, please see our FSDP docs.

  2. 🚰 Streaming v0.1

    We've spun off Streaming datasets into it's own repository! Streaming datasets is a high-performance drop-in for TorchΒ IterableDataset, enabling users to stream training data from cloud based object stores. Streaming is shipping with built-in support for popular open source datasets (ADE20K, C4, COCO, Enwiki, ImageNet, etc.)

    To get started, install the Streaming PyPi package:

    pip install mosaicml-streaming

    You can use the streaming Dataset class with the PyTorch native DataLoader class as follows:

    import torch
    from streaming import Dataset
    
    dataloader = torch.utils.data.DataLoader(dataset=Dataset(remote='s3://...'))

    For more information, please check out the Streaming docs.

  3. βœ”πŸ‘‰ Simplified Checkpointing Interface

    With this release we’ve greatly simplified configuration of loading and saving checkpoints in Composer.

    To save checkpoints to S3, all you need to do is:

    • Specify with save_folder your full URI to your save directory destination (e.g. 's3://my-bucket/{run_name}/checkpoints')
    • Optionally, set save_filename to the pattern you want for your checkpoint file names
    from composer.trainer import Trainer
    
    # Checkpoint saving to S3.
    trainer = Trainer(
        model=model,
        save_folder="s3://my-bucket/{run_name}/checkpoints",
            run_name='my-run',
        save_interval="1ep",
        save_filename="ep{epoch}.pt",
        save_num_checkpoints_to_keep=0,  # delete all checkpoints locally
            ...
    )
    
    trainer.fit()

    Likewise, to load checkpoints from S3, all you have to do is:

    • Set load_path to the full URI to your desired checkpoint file (e.g.'s3://my-bucket/my-run/checkpoints/epoch13.pt')
    from composer.trainer import Trainer
    
    # Checkpoint loading from S3.
    new_trainer = Trainer(
        model=model,
        train_dataloader=train_dataloader,
        max_duration="10ep",
        load_path="s3://my-bucket/my-run/checkpoints/ep13.pt",
       )
    
        new_trainer.fit()

    For more information, please see our Checkpointing guide.

  4. 𐄳 Improved Distributed Experience

    We’ve made it easier to write your own custom distributed entry points by exposing our distributed API. You can now leverage all of our helpful distributed functions and contexts.

    For example, let's say we want to need to download a dataset in a distributed training application. To avoid race conditions where different ranks try to write the dataset to the same place, we need to ensure that only rank 0 downloads the dataset first:

    import datetime
    from composer.trainer.devices import DeviceGPU
    from composer.utils import dist
    
    dist.initialize(DeviceGPU(), datetime.timedelta(seconds=30)) # Initialize distributed module
    
    if dist.get_local_rank() == 0: # Download dataset on rank zero
        dataset = download_my_dataset()
    dist.barrier() # All ranks wait until dataset is downloaded
    
    # Create and train your model!

    For more information, please check out our Distributed API docs.

Bug Fixes

  • fix loss and eval_forward for HF models (#1597)
  • add more robust casting to int for fsdp min_params (#1608)
  • Deepspeed Docs Typo (#1605)
  • Fix mmdet typo (#1618)
  • Blurpool idempotent (#1625)
  • When model is not on meta device, initialization should occur on compute device not CPU (#1623)
  • Auto resumption (#1615)
  • Adjust speed monitor (#1645)
  • Hot fix console logging (#1643)
  • Lazy Logging + pretty print dict for hparams (#1653)
  • Fix many failing notebook tests (#1646)

What's Changed

Read more

v0.10.1

06 Oct 00:17
Compare
Choose a tag to compare

πŸš€ Composer v0.10.1

Composer v0.10.1 is released! Install via pip:

pip install --upgrade mosaicml==0.10.1

New Features

  1. 𐄷 Weight Standardization

    Weight Standardization reparametrizes convolutional weights such that the fan-in dimensions have zero mean and unit standard deviation. This could slightly improve performance at the expensive of 5% lower throughput. This has been used in several papers to train with smaller batch sizes, with normalization layers besides batch norm, and for transfer learning.

    Using Weight Standardization with the Composer Trainer:

    import composer
     
    # Apply Weight Standardization (when training is initialized)
    weight_std = composer.algorithms.WeightStandardization()
    
    # Train with Weight Standardization
    trainer = composer.trainer.Trainer(
        ...
        algorithms=[weight_std]
    )
    trainer.fit()

    Using Weight Standardization with the Composer functional interface:

    import composer
    from torchvision.models import resnet50
     
    my_model = resnet50()
     
    # Apply weight standardization to model
    my_model = composer.functional.weight_standardization(my_model)

    Please see the Weight Standardization Method Card for more details.

Bug Fixes

  • Fix for checkpoints not being saved automatically at the end of a run (#1552)
  • Fix Onnx export for Composer HuggingFaceModels (#1557)
  • Fix for MIoU metric producing NaN's (#1558)
  • CometML logger documentation updates and fixes (#1567, #1570, #1571)
  • WandB image visualizer fix (#1591)

What's Changed

New Contributors

Full Changelog: v0.10.0...v0.10.1

v0.10.0

22 Sep 06:25
Compare
Choose a tag to compare

πŸš€ Composer v0.10.0

Composer v0.10.0 is out! This latest release adds support for CometML Experiment tracking, automatic selection of evaluation batch size, API enhancements for Evaluation/Logging/Metrics and a preview of our new streaming datasets repository!

pip install --upgrade mosaicml==0.10.0

New Features

  1. β˜„οΈ Comet Experiment Tracking (#1490)

    We've added support for the popular Comet experiment tracker! To enable, simply create the logger and pass it to the Trainer object at initialization:

    from composer import Trainer
    from composer.loggers import CometMLLogger
    
    cometml_logger = CometMLLogger()
    
    trainer = Trainer(
        ...
        loggers=[cometml_logger],
    )

    Please see our Logging and CometMLLogger docs pages for details on usage.

  2. πŸͺ„ Automatic Evaluation Batch Size Selection (#1417)

    Composer now supports eval_batch_size='auto', which will choose the right evaluation batch size to avoid CUDA OOMs! Now, in conjunction with grad_accum='auto', you can run the same code on any hardware with no changes necessary. This makes it easy to add evaluation to a training script without having to pick and choose the right batch sizes to avoid CUDA OOMs.

  3. 🎯 Evaluation API Changes (#1479)

    The Evaluation API has been updated to be consistent with the Trainer API. If the eval_dataloader was provided to the Trainer during initialization, eval can be invoked without needing to provide anything additional:

    trainer = Trainer(
        eval_dataloader=...
    )
    trainer.eval()

    Alternatively, the eval_dataloader can be passed directly to the eval() method:

    trainer = Trainer(
        ...
    )
    trainer.eval(
        eval_dataloader=...
    )

    The eval_dataloader can be a pytorch dataloader, or for multiple metrics, a list of Evaluator objects.

  4. πŸͺ΅ Simplified Logging (#1416)

    We've significantly simplified our internal logging interface:

    • Removed the use of LogLevel throughout the logging, which was a mostly unused feature. Filtering logs are the responsibility of the logger.
    • For better compatibility with external logging interfaces such as CometML or Weights & Biases, loggers now support the following methods: log_metrics, log_hyperparameters, and log_artifacts. Previous calls to data_fit, data_epeoch, .. have been removed.
  5. 🎯 validate --> eval_forward (#1411 , #1419)

    Previously, ComposerModel implemented the validate(batch: Any) -> Tuple[Any, Any] method which returns an (input, target) tuple, and the Trainer handles updating the metrics. In v0.10, we return the metrics updating control to the user.

    Now, models instead implement def eval_forward(batch: Any) which returns the outputs of evaluation, and also def update_metric(batch, outputs, metric) which updates the metric.

    An example implementation for classification can be found in our ComposerClassifer base class:

        def update_metric(self, batch: Any, outputs: Any, metric: Metric) -> None:
            _, targets = batch
            metric.update(outputs, targets)
    
        def eval_forward(self, batch: Any, outputs: Optional[Any] = None) -> Any:
            return outputs if outputs is not None else self.forward(batch)
  6. πŸ•΅οΈβ€β™€οΈ Evaluator changes

    The Evaluator class now stores evaluation metric names instead of metric instances. For example:

    glue_mrpc_task = Evaluator(
        label='glue_mrpc',
        dataloader=mrpc_dataloader,
        metric_names=['BinaryF1Score', 'Accuracy']
    )

    These metric names are matched against the metrics returned by the ComposerModel. The metric instances are now stored as deep copies in the State class as state.train_metrics or state.eval_metrics.

  7. 🚧 Streaming Datasets Repository Preview

    We're in the process of splitting out streaming datasets into it's own repository! Streaming datasets is a high-performance drop-in replacement for Torch IterableDataset objects and enables you to stream your training data from cloud based object stores. For an early preview, please checkout the Streaming repo.

  8. ❌ YAHP deprecation

    We are deprecating support for yahp, our hyperparameter configuration tool. Support for this will be removed in the following minor version release of Composer. We recommend users migrate to OmegaConf, or Hydra as tools.

Bug Fixes

What's Changed

Read more

v0.9.0

16 Aug 06:11
Compare
Choose a tag to compare

πŸš€ Composer v0.9.0

Excited to share the release of Composer v0.9.0, which comes with an Inference Export API, beta support for Apple Silicon and TPU training, as well as expanded usability of NLP-related speed-up methods. This release includes 175 commits from 34 contributors, including 10 new contributors πŸ™Œ !

pip install --upgrade mosaicml==0.9.0

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.9.0

New Features

  1. πŸ“¦ Export for inference APIs

    Train with Composer and deploy anywhere! We have added a dedicated export API as well as an export training callback to allow you to export Composer-trained models for inference, supporting popular formats such as torchscript and ONNX.

    For example, here’s how to export a model in torchscript format:

    from composer.utils import export_for_inference
    
    # Invoking export with a trained model
    export_for_inference(model=model, 
                         save_format='torchscript', 
                         save_path=model_save_path)

    Here’s an example of using the training callback, which automatically exports the model at the end of training to ONNX format:

    from composer.callbacks import ExportForInferenceCallback
    
    # Initializing Trainer with the export callback
    callback = ExportForInferenceCallback(save_format='onnx', 
                                                                                save_path=model_save_path)
    trainer = Trainer(model=model,
                                    callbacks=callback,
                                    train_dataloader=dataloader,
                                    max_duration='10ep')
    
    # Model will be exported at the end of training
    trainer.fit()

    Please see our Exporting for Inference notebook for more information.

  2. πŸ“ˆ ALiBi support for BERT training

    You can now use ALiBi (Attention with Linear Biases; Press et al., 2021) when training BERT models with Composer, delivering faster training and higher accuracy by leveraging shorter sequence lengths.

    ALiBi improves the quality of BERT pre-training, especially when pre-training uses shorter sequence lengths than the downstream (fine-tuning) task. This allows models with ALiBi to reach higher downstream accuracy with less pre-training time.

    Example of using ALiBi as an algorithm with the Composer Trainer:

    # Create an instance of a BERT masked language model
    model = composer.models.create_bert_mlm()
    
    # Apply ALiBi (when training is initialized)
    alibi = composer.algorithms.alibi(max_sequence_length=1024)
    
    # Train with ALiBi
    trainer = composer.trainer.Trainer(
        model=model,
        train_dataloader=train_dataloader,
        algorithms=[alibi]
    )
    trainer.fit()

    Example using the Composer Functional API:

    import composer.functional as cf
    
    # Create an instance of a BERT masked language model
    model = composer.models.create_bert_mlm()
    
    # Apply ALiBi and expand the model's maximum sequence length to 1024
    cf.apply_alibi(model=model, max_sequence_length=1024)

    AliBi can also now be extended to work with custom models by registering your attention and embedding layers. Please see our ALiBi method card for more information.

  3. 🧐 Entry point for GLUE tasks pre-training and fine-tuning

    You can now easily pre-train and fine-tune NLP models across all GLUE (General Language Understanding Evaluation) tasks through one simple entry point! The entry point handles model saving and loading, spawns GLUE tasks in parallel across all available GPUs, and delivers a highly efficient evaluation of model performance.

    Example of launching the entrypoint:

    # This runs pre-training followed by fine-tuning.
    # --training_scheme can take either pretrain, finetune, or all depending on the task!
    python run_glue_trainer.py -f glue_example.yaml --training_scheme all

    Please see our GLUE entrypoint notebook for more information.

  4. πŸ€– TPU support (in beta)

    You can now use Composer to train your models on TPUs! Support is now available in Beta, and currently only supports single-core TPU training. Try it out, explore optimizations, and share your feedback and feature requests with us so we can make it better for you and for the community.

    To use TPUs with Composer, simply specify a tpu device:

    # Set device to `tpu`
    trainer = composer.trainer.Trainer(
        model=model,
        train_dataloader=train_dataloader,
        max_duration=train_epochs,
        device='tpu')
    
    # Run fit
    trainer.fit()

    Please see our Training with TPUs notebook for more information.

  5. 🍎 Apple Silicon support (beta)

    Leverage Apple Silicon chips to train your models with Composer by providing the device='mps' argument:

    trainer = Trainer(
        ...,
        device='mps'
    )

    We use the latest PyTorch MPS backend to execute the training. This requires torch version β‰₯1.12, and Max OSX 12.3+.

    For more information on training with Apple M chips, see the PyTorch 1.12 blog and our API Reference for Composer specific details.

  6. 🚧 Contrib repository

    Got a new method idea, or published a paper and want those methods to be easily accessible? We’ve created the mcontrib repository, with a lightweight process to contribute new algorithms. We’re happy to work directly with you to benchmark these methods and eventually β€œpromote” them to Composer for use by end customers.

    Please checkout the README for details on how to contribute a new algorithm. For more details on how to write speed-up methods, see our notebook on custom speed-up methods.

Additional API Changes

  1. πŸ”’ Passes Module

    The order in which algorithms are run matters significantly during composition. With this release we refactored algorithm passes into their own passes module. Users can now register custom passes (for custom algorithms) with the Engine. Please see #1377 for more information.

  2. πŸ—„οΈ Default Checkpoint Extension

    The CheckpointSaver now defaults to using the *.pt extension for checkpoint fienames. Please see #1370 for more information.

  3. πŸ‘οΈ Models Refactor

    Most vision models (ResNet, MNIST, ViT, EfficientNet) have been refactored from classes to a factory function. For example ComposerResNet -> composer_resnet.

    # before
    from composer.models import ComposerResNet
    model = ComposerResNet(..)
    
    from composer.models import composer_resnet  # after
    model = composer_resnet(..)

    The same refactor has been done for NLP as well, e.g. BERTModel -> create_bert_mlm and create_bert_classification.

    See #1227 (vision) and #1130 (NLP) for more details.

  4. βž• Misc API Changes

    • BreakEpochException has been removed.
    • state.is_model_deepspeed has been moved to composer.utils.is_model_deepspeed.
    • Helper function monitored_barrier has been added to composer distributed.

Bug Fixes

  • Add informative error for infer batch size issues (#1401)
  • Fix ImagenetDatasetHparams bug (#1392), resolves #1111
  • Fix hparams error condition checking (#1394)
  • Fix AMP resumption with grad scaler (#1376)
  • Auto Grad Accum Cache Clearing (#1380), fixes issue reported in #1331
  • Fix default precision (#1369)
  • Fix the profiler on multi-node training (#1358), resolves #1270
  • Retry SFTP on Size Mismatch (#1300)
  • Fix scheduler edge cases (#1350), resolves #1077
  • Fix a race condition in the object store logger (#1328)
  • Fix WandB load from checkpoint (#1326)
  • Fix Notebook Progress Bars (#1313)

Commits

What's Changed

Read more

v0.8.2

27 Jul 23:36
Compare
Choose a tag to compare

πŸš€ Composer v0.8.2

Composer v0.8.2 is released! Install via pip:

pip install --upgrade mosaicml==0.8.2

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.8.2

πŸ› Bug Fixes

  1. Fixed Notebook Progress Bars in Colab

    Fixes a bug introduced by #1264 which causes Composer running in Colab notebooks to error out with:
    UnsupportedOperation: fileno.

    Closes #1312. Fixed in PR #1314.

Changelog

v0.8.1...v0.8.2

v0.8.1

22 Jul 23:23
8418a67
Compare
Choose a tag to compare

πŸš€ Composer v0.8.1

Composer v0.8.1 is released! Install via pip:

pip install --upgrade mosaicml==0.8.1

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.8.1

🎁 New Features

  1. πŸ–ΌοΈ Image Visualizer

    The ImageVisualizer callback periodically logs the training and validation images when using the WandB logger. This is great for validating your dataloader pipeline, especially if extensive data augmentations are used. Also, when training on a semantic segmentation task, the callback can log the target segmentation mask and the predicted segmentation mask by setting the argument mode='segmentation'. See PR #1266 for more details. Here is an example of using the ImageVisualizer callback:

    from composer import Trainer
    from composer.callbacks import ImageVisualizer
    
    # Callback to log 8 training images after every 100 batches
    image_visualizer = ImageVisualizer()
    
    # Construct trainer
    trainer = Trainer(
        ...,
        callbacks=image_visualizer
    )
    
    # Train!
    trainer.fit()

    Here is an example visualization from the training set of ADE20k:

  2. πŸ“Ά TensorBoard Logging

    You can now log metrics and losses from your Composer training runs with Tensorboard! See #1250 and #1283 for more details. All you have to do is create a TensorboardLogger object and add it
    to the list of loggers in your Trainer object like so:

    from composer import Trainer
    from composer.loggers import TensorboardLogger
    
    tb_logger = TensorboardLogger(log_dir="./my_tensorboard_logs")
    
    trainer = Trainer(
        ...
        # Add your Tensorboard Logger to the trainer here.
        loggers=[tb_logger],
    )
    
    trainer.fit()

    For more information, see this tutorial.

  3. πŸ”™ Multiple Losses

    Adds support for multiple losses. If a model returns a tuple of losses, they are summed before the loss.backward() call. See #1240 for more details.

  4. 🌎️ Stream Datasets from HTTP URIs

    You can now specify a HTTP URI for a Streaming Dataset remote. See #1258 for more detials. For example:

    from composer.datasets.streaming import StreamingDataset
    from torch.utils.data import DataLoader
    
    # Construct the Dataset
    dataset = StreamingDataset(
        ...,
        remote="https://example.com/dataset/",
    )
    
    # Construct the DataLoader
    train_dl = DataLoader(dataset)
    
    # Construct the Trainer
    trainer = Trainer(
        ...,
        train_dataloader=train_dl,
    )
    
    # Train!
    trainer.fit()

    For more information on streaming datasets, see this tutorial.

  5. πŸ„οΈ GPU Devices default to TF32 Matmuls

    Beginning with PyTorch 1.12, the default behavior for computing FP32 matrix multiplies on NVIDIA Ampere devices was switched from TF32 to FP32. See PyTorch documentation here.

    Since Composer is designed specifically for ML training with a focus on efficiency, we choose to preserve the old default of using TF32 on Ampere devices. This leads to significantly higher throughput when training in single precision, without impact training convergence. See PR #1275 for implementation details.

  6. πŸ‘‹ Set the Device ID for GPU Devices

    Specify the device ID within a DeviceGPU to train on when instantiating a Trainer object instead of using the local ID! For example,

    from composer.trainer.devices.device_gpu import DeviceGPU
    
    # Specify to use GPU 3 to train 
    device = DeviceGPU(device_id=3)
    
    # Construct the Trainer
    trainer = Trainer(
        ...,
        device = device
    )
    
    # Train!
    trainer.fit()
  7. BERT and C4 Updates

    We make some minor adjustments to our bert-base-uncased.yaml training config. In particular, we make the global train and eval batch sizes a power of 2. This maintains divisibility when using many GPUs in multi-node training. We also adjust the max_duration so that it converts cleanly to 70,000 batches.

    We also upgrade our StreamingDataset C4 conversion script (scripts/mds/c4.py) to use a multi-threaded reader. On a 64-core machine we are able to convert the 770GB train split to .mds format in ~1.5hr.

  8. πŸ“‚ Set a prefix when using a S3ObjectStore

    When using S3ObjectStore for applications like checkpointing, it can be useful to provide path prefixes, mimicking folder/subfolder directories like on a local filesystem. When prefix is provided, any objects uploaded with S3ObjectStore will be stored at f's3://{self.bucket}/{self.prefix}{object_name}'.

  9. βš–οΈ Scale the Warmup Period of Composer Schedulers

    Added a new flag scale_warmup to schedulers that will scale the warmup period when a scale schedule ratio is applied. Default is False to mirror default behavior. See #1268 for more detials.

  10. 🧊 Stochastic Depth on Residual Blocks

    Residual blocks are detected automatically and replaced with stochastic versions. See #1253 for more details.

πŸ› Bug Fixes

  1. Fixed Progress Bars

    Fixed a bug where the the Progress Bars jumped around and did not stream properly when tailing the terminal over the network. Fixed in #1264, #1287, and #1289.

  2. Fixed S3ObjectStore in Multithreaded Environments

    Fixed a bug where the boto3 crashed when creating the default session in multiple threads simultaniously (see boto/boto3#1592). Fixed in #1260.

  3. Retry on ChannelException errors in the SFTPObjectStore

    Catch ChannelException SFTP transient error and retry. Fixed in #1245.

  4. Treating S3 Permission Denied Errors as Not Found Errors

    We update our handling of botocore 403 ClientErrors to interpret them as FileNotFoundErrors. We do this because of a situation that occurs when a user has no S3 credentials configured, and tries to read from a bucket with public files. For privacy, Amazon S3 raises 403 (Permission Denied) instead of 404 (Not Found) errors. As such, PR #1249 treats 403 ClientErrors as FileNotFoundErrors.

  5. Fixed Parsing of grad_accum in the TrainerHparams

    Fixes an error where the command line override --grad_accum lead to incorrect parsing. Fixed in #1256.

  6. Fixed Example YAML Files

    Our recipe configurations (YAML) are updated to the latest version, and a test was added to enforce correctness moving forward. Fixed in #1235 and #1257.

Changelog

v0.8.0...v0.8.1

v0.8.0

01 Jul 04:15
Compare
Choose a tag to compare

πŸš€ Composer v0.8.0

Composer v0.8.0 is released! Install via pip:

pip install --upgrade mosaicml==0.8.0

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.8.0

New Features

  1. πŸ€— HuggingFace ComposerModel

    Train your HuggingFace models with Composer! We introduced a HuggingFaceModel that converts your existing πŸ€— Transformers models into a ComposerModel.

    For example:

    import transformers
    from composer.models import HuggingFaceModel
    
    # Define the model
    hf_model = transformers.AutoModelForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
    
    # Convert it into a ComposerModel
    model = HuggingFaceModel(hf_model)
    
    # Construct the trainer
    trainer = Trainer(
        ...,
        model,
    )
    
    # Train!
    trainer.fit()

    For more information, see the example on fine-tuning a pretrained BERT with Composer.

  2. πŸ«• Fused Layer Norm

    Fused LayerNorm replaces implementations of torch.nn.LayerNorm with a apex.normalization.fused_layer_norm. The fused kernel provides increased GPU utilization.

    For example:

    from composer.trainer import Trainer
    from composer.algorithms import FusedLayerNorm
    
    # Initialize the algorithm
    alg = FusedLayerNorm()
    
    # Construct the trainer
    trainer = Trainer(
        algorithms=alg,
    )
    
    # Train!
    trainer.fit()

    See the method card for more information.

  3. πŸ’Ύ Ignore Checkpoint Parameters

    If you have a checkpoint and don't want to restore some elements of the chceckpoint to the state, we added a load_ignore_keys parameter. Any specified (nested) keys will be ignored. Glob syntax is supported!

    For example, to restore a checkpoint without the seed:

    from composer import Trainer
    
    trainer = Trainer(
        ...,
        load_path="path/to/my/checkpoint.pt",
        load_ignore_keys=["state/rank_zero_seed", "rng"],
    )

    See the Trainer API Reference for more information.

  4. πŸͺ£ Object Stores

    Composer v0.8.0 introduces an abstract Object Store API to support multiple object store drivers, such as boto3 (for Amazon S3) and Paramiko (for SFTP), in addition to the existing libcloud implementation.

    For example, if you are training on AWS where credentials are available in the environment, here's how to to save checkpoints to a S3 object store via Boto3.

    from composer import Trainer
    from composer.loggers import ObjectStoreLogger
    from composer.utils.object_store import S3ObjectStore
    
    logger = ObjectStoreLogger(
        object_store_cls=S3ObjectStore,
        object_store_kwargs={
            # These arguments will be passed into the S3ObjectStore -- e.g.:
            # object_store = S3ObjectStore(**object_store_kwargs)
            # Refer to the S3ObjectStore class for documentation
            'bucket': 'my-bucket',
        },
    )
    
    trainer = Trainer(
        ...,
        loggers=logger,
    )
    
    # Train!
    trainer.fit()

    See the Object Store API Reference for more information.

  5. πŸͺ¨ Artifact Metadata

    Composer automatically logs the epoch, batch, sample, and token counts as metadata when storing artifacts in Weights & Biases. See the API Reference for more information.

API Changes

  1. βœ‚οΈ Gradient Clipping is now an Algorithm

    To clean up the Trainer, we moved gradient clipping into an Algorithm. The grad_clip_norm argument in the Trainer is deprecated and will be removed in a future version of Composer. Instead, use the Gradient Clipping algorithm:

    For example:

    from composer.algorithms import GradientClipping
    from composer.trainer import Trainer
    
    # Configure gradient clipping
    gradient_clipping = GradientClipping()
    
    # Configure the trainer
    trainer = Trainer(
        ...,
        algorithms=gradient_clipping,
    )
    
    # Train!
    trainer.fit()

    See the method card for more information.

  2. πŸ•’οΈ Removed batch_num_samples and batch_num_tokens from the state.

    State properties batch_num_samples and batch_num_tokens have been removed.
    Instead, use State.timestamp for token and sample tracking.

  3. πŸ§‘β€πŸ€β€πŸ§‘ DDP Sync Strategy

    We changed the default DDP Sync Strategy to MULTI_AUTO_SYNC, as FORCED_SYNC doesn't work with all algorithms.

  4. πŸƒ Moved the run_name into the State

    The run_name has been added to the State object, so it is persisted with checkpoints. It has been removed from the Logger.

Bug Fixes

  • In the Object Store Logger, added in retries for credential validation, and validating credentials only on global rank zero. (#1144)
  • Fixed a bug in the speed monitor where it returned negative wall clock times. (#1123)
  • Fixed how block-wise Stochastic Depth could freeze the trainer. (#1087)
  • Fixed a bug in the [MLPerfCallback] where sample counts were incorrect on per-sharded datasets. (#1156)

Changelog

v0.7.1...v0.8.0

v0.7.1

07 Jun 00:21
Compare
Choose a tag to compare

πŸš€ Composer v0.7.1

Composer v0.7.1 is released! Install via pip:

pip install --upgrade mosaicml==0.7.1

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.7.1

Bug Fixes

  • Upgraded wandb>=0.12.17, to fix incompatibility with protobuf >= 4 (wandb/wandb#3709)

Changelog

v0.7.0...v0.7.1

v0.7.0

24 May 00:56
Compare
Choose a tag to compare

πŸš€ Composer v0.7.0

Composer v0.7.0 is released! Install via pip:

pip install --upgrade mosaicml==0.7.0

Alternatively, install Composer with Conda:

conda install -c mosaicml mosaicml=0.7.0

New Features

  1. 🏎️ FFCV Integration

    Composer supports FFCV, a fast dataloader for image datasets. We've found FFCV can speed up ResNet-56 training by 16%, in addition to existing speed-ups already supported by Composer! It's easy to use FFCV with any existing image dataset:

    import ffcv
    from ffcv.fields.decoders import IntDecoder, SimpleRGBImageDecoder
    from torchvision.datasets import ImageFolder
    
    from composer import Trainer
    from composer.datasets.ffcv_utils import write_ffcv_dataset, ffcv_monkey_patches
    
    # Convert the dataset to FFCV format
    # This step needs to be done only once per dataset
    dataset = ImageFolder(...)
    ffcv_dataset_path = "my_ffcv_dataset.ffcv"
    write_ffcv_dataset(dataset=dataset, write_path=ffcv_dataset_path)
    
    # In FFCV v0.0.3, len(dataloader) is expensive. Fix that via a monkeypatch
    ffcv_monkey_patches()
    
    # Construct the train dataloader
    train_dl = ffcv.Loader(
        ffcv_dataset_path,
        ...
    )
    
    # Construct the trainer
    trainer = Trainer(
        train_dataloader=train_dl,
    )
    
    # Train using FFCV!
    trainer.fit()

    See our notebook on training with FFCV for a full example.

  2. βœ… Autoresume from Checkpoints

    When setting autoresume=True, Composer can automatically resume from an existing checkpoint before starting a new training run. Specifically, the trainer will look in the save_folder (and any loggers that save artifacts) for the latest checkpoint; if none is found, then it'll start from the beginning.

    This feature does not require a different entrypoint to distinguish between starting a new training run or automatically resuming from an existing one, making it easy to use Composer on spot preemptable cloud instances. Simply set autoresume=True, point the instance to your training script, and Composer will handle the rest!

    from composer import Trainer
    
    # When using `autoresume`, it is required to specify the
    # `run_name`, so Composer will know which training run to
    # resume
    run_name = "my_autoresume_training_run"
    
    trainer = Trainer(
        ...,
        run_name=run_name,
        # specify where to save checkpoints
        save_folder="./my_autoresume_training_run",
        autoresume=True,
    )
    
    # Train! Composer will handle loading an existing
    # checkpoint or starting a new training run
    trainer.fit()

    See the Trainer API Reference for more information.

  3. ♻️ Reuse the Trainer

    Want to train on multiple dataloaders sequentially? Each trainer object now supports multiple calls to Trainer.fit(), so you can continue training an existing model on a new dataloader, with new schedulers, all while using the same model and trainer object.

    For example:

    from torch.utils.data import DataLoader
    
    from composer import Trainer
    
    train_dl_1 = DataLoader(...)
    trainer = Trainer(
        model=model,
        max_duration='5ep',
        train_dataloader=train_dl_1,
    )
    
    # Train once!
    trainer.fit()
    
    # Train again with a new dataloader for another 5 epochs
    train_dl_2 = DataLoader(...)
    trainer.fit(
        train_dataloader=train_dl_2,
        duration='5ep',
    )

    See the Trainer API Reference for more information.

  4. βš–οΈ Eval or Predict Only? No Problem

    You can evaluate or predict on an existing model, without having to supply a train dataloader or training duration argument -- they're now optional.

    import torchmetrics
    from torch.utils.data import DataLoader
    
    from composer import Trainer
    
    # Construct the trainer
    trainer = Trainer(model=model)
    
    # Evaluate!
    eval_dl = DataLoader(...)
    trainer.eval(
        dataloader=eval_dl,
        metrics=torchmetrics.Accuracy(),
    )
    
    # Examine evaluation metrics
    print("Eval metrics", trainer.state.metrics['eval'])
    
    # Or, predict!
    predict_dl = DataLoader(...)
    trainer.predict(dataloader=predict_dl)

    See the Trainer API Reference for more information.

  5. πŸ›‘ Early Stopper and Threshold Stopper Callbacks

    The Early Stopper and Threshold Stopper callbacks end training early when the target metrics are met:

    from composer.callbacks.early_stopper import EarlyStopper
    from torchmetrics.classification.accuracy import Accuracy
    
    # Construct the callback
    early_stopper = EarlyStopper(
        monitor="Accuracy",
        dataloader_label="eval",
        patience=2,
    )
    
    # Construct the trainer
    trainer = Trainer(
        ...,
        callbacks=early_stopper,
        max_duration="100ep",
    )
    
    # Train!
    # Training will end early if the accuracy does not improve
    # over two epochs
    trainer.fit()
  6. πŸͺ΅ Load Checkpoints from Loggers

    It's now possible to restore checkpoints from loggers that support file artifacts (such as the Weights & Baises Logger). No need to download your checkpoints manually anymore.

    from composer import Trainer
    from composer.loggers import WandBLogger
    
    # Configure the W&B Logger
    wandb_logger = WandBLogger(
        # set to True to capture artifacts, like checkpoints
        log_artifacts=True,
        init_params={
            'project': 'my-wandb-project-name',
        },
    )
    
    # Then, to train and save checkpoints to W&B:
    trainer = Trainer(
        ...,
        loggers=wandb_logger,
        save_folder="/tmp/checkpoints",
        save_interval="1ep",
        save_artifact_name="epoch{epoch}.pt",
    )
    
    # Finally, to load checkpoints from W&B
    trainer = Trainer(
        ...,
        load_object_store=wandb_logger,
        load_path="epoch1.pt:latest",
    )
  7. βŒ› Wall Clock, Evaluation, and Prediction Time Tracking

    The timestamp object measures wall clock time via three new fields: total_wct, epoch_wct, and batch_wct. These fields track the total elapsed training time, the elapsed training time of the current epoch, and the time to train the last batch. Read the wall clock time via a callback:

    from composer import Callback, Trainer
    
    class MyCallback(Callback):
        def batch_end(self, state, event):
            print(f"Total wct: {state.timetsamp.total_wct}")
            print(f"Epoch wct: {state.timetsamp.epoch_wct}")
            print(f"Batch wct: {state.timetsamp.batch_wct}")
    
    # Construct the trainer with this callback
    trainer = Trainer(
        ...,
        callbacks=MyCallback(),
    )
    
    # Train!
    trainer.fit()

    In addition, the training state object has two new fields for tracking time during evaluation and prediction: eval_timestamp and predict_timestamp. These fields, just like any others on the state object, are accessible to algorithms, callbacks, and loggers.

  8. Training DeepLabv3+ on the ADE20k Dataset

    DeepLabv3+ is a common baseline model for semantic segmentation tasks. We provide a ComposerModel implementation for DeepLabv3+ built using torchvision and mmsegmentation for the backbone and head, respectively.

    We found the DeepLabv3+ baseline can be significantly improved using the new PyTorch pre-trained weights. Additional gains are made through a hyperparameter sweep.

    We benchmark our DeepLabv3+ model on a single 8xA100 machine using ADE20k, a popular semantic segmentation dataset. The final results on ADE20k are:

    Model mIoU Time-to-Train
    Unoptimized DeepLabv3+ 44.17 +/-...
Read more