Skip to content

Latest commit

 

History

History

README.md

forecasting_demos

Multi-model training, tuning, and serving are common tasks in machine learning. They require training and tuning multiple models, on the same or different data segments. The data segments typcially correspond to different locations, products, or groups of locations or products, etc. Using distributed compute to train hundreds or thousands of models takes less time than traditional Python because the data and model training/tuning/inferencing can be split up into batches and run in parallel!

These notebooks demonstrate how to use Ray v2 for quick and easy distributed forecasting - a special case of multi-model training, tuning, inferencing, and prediction. You will learn how to convert existing code so it can run in parallel on multiple compute nodes. The compute can be cores on your laptop or clusters in the cloud.

Ray can be used with any AI/ML Python library! But, in these notebooks, we will demo:

Data

These notebooks use the public NYC Taxi rides dataset.


👩 Setup Instructions for Anyscale

We recommend running Ray on Anyscale to take full advantage of developing on a personal laptop, then quickly spinning up resources in a cloud to run your same laptop code on bigger compute resources.

To configure an Anyscale cluster Configuration, use the latest Ray (right now it is v2.2) on a Python 3.8 ML docker image, example anyscale/ray-ml:2.2.0-py38-gpu. Don't worry, you can on-the-fly remove the GPU per cluster just before you spin one up, if you don't need expensive GPU. 'ml' docker image means standard ml libraries automatically installed, e.g. pandas, matplotlib. Python3.8 is important! Since, at the time of writing this, Prophet still has this dependency.

The first time you configure your cluster:

  1. In your browser, open `console.anyscale.com`.
  2. Click on `Configurations` > `Create a new environment`.
  3. Give the configuration a name example `myname-forecasting`.
  4. Select a base docker image, example `anyscale/ray-ml:2.2.0-py38-gpu`.
  5. Specify `Pip packages` in this order:
      protobuf==3.19.*
      Cython
      numba
      numpy==1.21.6
      pystan==2.19.1.1
      cmdstanpy==0.9.68
      prophet==1.0
      plotly
      statsforecast==1.3.1
      scikit-learn
      pyarrow==10.0.0
      statsmodels
      ax-platform
      gpytorch
      scipy
      seaborn
      torch
      kats
      For PyTorch Forecasting add these:
      ray_lightning
      pytorch-forecasting
      mlflow
  6. For PyTorch Forecasting specify `Conda packages` in this order:
      tqdm
      grpcio-tools
      tensorflow
      tensorboard
      tensorboardx
  7. Put your github repo in the `Post build commands` section:
    • If you have a project name:
      • git clone your-git-repo-url ../your-project-name/
    • Otherwise if you do not have a project:
      • git clone your-git-repo-url
  8. Click 'Create'.


The first time you spin up a cluster:

  1. In your browser, open `console.anyscale.com`.
  2. Click on `Clusters` > `Create`.
  3. Give the cluster a name.
  4. Select a project that the cluster belongs to.
  5. Select the latest cluster environment name that you just created, example `myname-forecasting` and latest version.
  6. Leave the default radio button on `Compute config` = `Create a one-off configuration`.
  7. Select a default cloud config from your organization, e.g. AWS, region=us-west-2, zones=any.
  8. Node types. Here is where you can delete the GPU if you are not going to use it, example Remove `g4dn.4xlarge`. You can also specify min/max number of worker node clusters, memory, and AWS spot instances option here.
  9. Click `Start`.
  10. Wait until the cluster is ready, then click `Jupyter` button.

Anyscale by default will automatically shut down your cluster for you after 2 hours of inactivity. That way you don't have to worry about accidentally leaving it running over a weekend.


From now on, whenever you want to spin up a cluster, it will be quicker:

  • In your browser, open `console.anyscale.com`.
  • Click on `Clusters` > `Created by me`.
  • Click on the cluster.
  • Click `Start`.
  • Wait until the cluster is ready, then click `Jupyter` button.


🎓 To further speed up your development process (especially convenient if you are contributing to open-source Ray), use Anyscale Workspaces, to develop and save your code directly on a cloud, instead of on your laptop!


Let's have fun 😜 and Thank you 🙏.