Using the Azure Machine Learning Python SDK, learn how to construct Azure Machine Learning datasets to access data for your local or remote experiments. See the Securely access data article to see how datasets fit into Azure Machine Learning's entire data access procedure.

AzureML provides two basic assets for working with data:

Data Store

Each Azure ML workspace comes with a default datastore:

It is also accessible through the Azure Portal (under the same resource group as your Azure ML Workspace).

Datastores are linked to workspaces and are used to store connection information to Azure storage services so that you may refer to them by name rather than having to memorise the connection information and secret.Use this class to perform management operations, including register, list, get, and remove datastores.

A data store is a repository for storing and managing collections of data that includes not just databases, but also simpler store types such basic files, emails, and so on. A database is a collection of bytes that a database management system keeps track of.

DataSet

A data set is a set of information. A data set corresponds to one or more database tables in the case of tabular data, where each column of a table represents a specific variable and each row represents a specific record of the data set in question. Wikipedia \sCite

Each worspace comes with a default datastore

Connect to, or create, a datastore backed by one of the multiple data-storage options that Azure provides. For example:

To register a datastore via an account key

by using SAS token

The workspace object ws has access to its datastores via

Any datastore that is registered to workspace can thus be accessed by name.

The account name and account key may then be used to connect to the Datastore directly in Azure Storage Explorer.

Blob DataBase/DataStore

You may use use if you're dealing with numerous files in various places.

To Download the data from the Blob DataStorage

Read from Datastore Explorer

Data Reference

Creating datareference as a mount

The Above step is done after connecting to the basic resources like the Workspace, ComputeTarget and Datastore.

or else you can acess the datareference as a Download

Consume DataReference in ScriptRunConfig

Add this DataReference to a ScriptRunConfig as follows.

The environment variable $AZUREML_DATAREFERENCE_example_data example data is returned by the command-line parameter str(data_ref). Finally, data_ref.to_config() tells the run to mount the data on the compute target and set the environment variable to the correct value.

Without specifying argument

Specify a path_on_compute to reference your data without the need for CLI arguments.

Create Dataset

From local data

A dataset might be created and registered from a folder on your own workstation. Note that src_dir must point to a folder, not file.

Warning Method upload_directory: This is a test technique that might change at any time. For additional information, see to https://aka.ms/azuremlexperimental.

From a datastore

The following code snippet demonstrates how to make a Dataset given a relative path on datastore.It's worth noting that the path might link to a folder or a file. (e.g. local/test/) or a single file (e.g. local/test/data.tsv).

From outputs using OutputFileDatasetConfig

Upload to datastore

To upload a local directory ./data/:

This will copy the entire directory to your computer. ./data from your local datastore to the workspace ws default datastore.

Create dataset from files in datastore

To make a dataset from a directory <path/on/datastore> on a datastore, follow these steps:

Use Dataset

ScriptRunConfig

You may either mount or download a dataset to use in a ScriptRunConfig to reference data from it:

Path on compute Both as_mount and as_download accept an (optional) parameter path_on_compute. This specifies where the data is made available on the compute target.

This data can be used in a remote run, such as in mount-mode:

use the following code in the run.py

The following code can be used in the train.py