azure-samples / azure-machinelearning-clientlibrary-python Goto Github PK

License: MIT License

Python 99.24% Batchfile 0.76%

azure-machinelearning-clientlibrary-python's Introduction

Microsoft Azure Machine Learning Python client library for Azure ML Studio

NOTE This content is no longer maintained. Visit the Azure Machine Learning Notebook project for sample Jupyter notebooks for ML and deep learning with Azure Machine Learning using the Python SDK.

The preview of Azure Machine Learning Python client library lets you access your Azure ML Studio datasets from your local Python environment.

You can download datasets that are available in your ML Studio workspace, or intermediate datasets from experiments that were run. You can upload new datasets and update existing datasets. The data is optionally converted to/from a Pandas DataFrame.

This is a technology preview. The APIs exposed by the library and the REST endpoints it connects to are subject to change.

Installation

The SDK has been tested with Python 2.7, 3.3 and 3.4.

It has a dependency on the following packages:

requests
python-dateutil
pandas

You can install it from PyPI:

pip install azureml

Usage

Note: We recommend that you use the Generate Data Access Code feature from Azure Machine Learning Studio in order to get Python code snippets that give you access to your datasets. The code snippets include your workspace id, authorization token, and other necessary identifiers to get to your datasets.

Accessing your workspace

You'll need to obtain your workspace id and token in order to get access to your workspace.

from azureml import Workspace

ws = Workspace(workspace_id='4c29e1adeba2e5a7cbeb0e4f4adfb4df',
               authorization_token='f4f3ade2c6aefdb1afb043cd8bcf3daf')

If you're using AzureML in a region other than South Central US you'll also need to specify the endpoint:

from azureml import Workspace

ws = Workspace(workspace_id='4c29e1adeba2e5a7cbeb0e4f4adfb4df',
               authorization_token='f4f3ade2c6aefdb1afb043cd8bcf3daf',
               endpoint='https://europewest.studio.azureml.net/')

Specify workspace via config

If you don't want to store your access tokens in code you can also put them in a configuration file. The SDK will look for ~/.azureml/settings.ini and if available use that:

[workspace]
id=4c29e1adeba2e5a7cbeb0e4f4adfb4df
authorization_token=f4f3ade2c6aefdb1afb043cd8bcf3daf
api_endpoint=https://studio.azureml.net
management_endpoint=https://management.azureml.net

And then the workspace can be created without arguments:

from azureml import Workspace

ws = Workspace()

Accessing datasets

To enumerate all datasets in a given workspace:

for ds in ws.datasets:
    print(ds.name)

Just the user-created datasets:

for ds in ws.user_datasets:
    print(ds.name)

Just the example datasets:

for ds in ws.example_datasets:
    print(ds.name)

You can access a dataset by name (which is case-sensitive):

ds = ws.datasets['my dataset name']

By index:

ds = ws.datasets[0]

Dataset metadata

Every dataset has metadata in addition to its content.

Some metadata values are assigned by the user at creation time:

print(ds.name)
print(ds.description)
print(ds.family_id)
print(ds.data_type_id)

Others are values assigned by Azure ML:

print(ds.id)
print(ds.created_date)
print(ds.size)

See the SourceDataset class for more on the available metadata.

Reading contents

You can import the dataset contents as a pandas DataFrame object. The data_type_id metadata on the dataset is used to determine how to import the contents.

frame = ds.to_dataframe()

If a dataset is in a format that cannot be deserialized to a pandas DataFrame, the dataset object will not have a to_dataframe method.

You can still read those datasets as text or binary, then parse the data manually.

Read the contents as text:

text_data = ds.read_as_text()

Read the contents as binary:

binary_data = ds.read_as_binary()

You can also just open a stream to the contents:

with ds.open() as file:
    binary_data_chunk = file.read(1000)

This gives you more control over the memory usage, as you can read and parse the data in chunks.

Accessing intermediate datasets

You can access the intermediate datasets at the output ports of the nodes in your experiments.

Note that the default binary serialization format (.dataset) for intermediate datasets is not supported. Make sure to use a Convert to TSV or Convert to CSV module and read the intermediate dataset from its output port.

First, get the experiment, using the experiment id:

experiment = ws.experiments['my experiment id']

Then get the intermediate dataset object:

ds = experiment.get_intermediate_dataset(
    node_id='5c457225-68e3-4b60-9e3a-bc55f9f029a4-565',
    port_name='Results dataset',
    data_type_id=DataTypeIds.GenericCSV
)

To determine the values to pass to get_intermediate_dataset, use the Generate Data Access Code command on the module output port in ML Studio.

You can then read the intermediate dataset contents just like you do for a regular dataset:

frame = ds.to_dataframe()

You can also use open, read_as_text and read_as_binary.

Note that intermediate datasets do not have any metadata available.

Creating a new dataset

After you've manipulated the data, you can upload it as a new dataset on Azure ML.

This will serialize the pandas DataFrame object to the format specified in the data_type_id parameter, then upload it to Azure ML.

dataset = workspace.datasets.add_from_dataframe(
    dataframe=frame,
    data_type_id=DataTypeIds.GenericCSV,
    name='my new dataset',
    description='my description'
)

If you want to serialize the data yourself, you can upload the raw data. Note that you still have to indicate the format of the data.

raw_data = my_own_csv_serialization_function(frame)
dataset = workspace.datasets.add_from_raw_data(
    raw_data=raw_data,
    data_type_id=DataTypeIds.GenericCSV,
    name='my new dataset',
    description='my description'
)

After it's added, it's immediately accessible from the datasets collection.

If you attempt to create a new dataset with a name that matches an existing dataset, an AzureMLConflictHttpError will be raised.

from azureml import AzureMLConflictHttpError

try:
    workspace.datasets.add_from_dataframe(
        dataframe=frame,
        data_type_id=DataTypeIds.GenericCSV,
        name='not a unique name',
        description='my description'
    )
except AzureMLConflictHttpError:
    print('Try again with a unique name!')

To update an existing dataset, you can use update_from_dataframe or update_from_raw_data:

name = 'my existing dataset'
dataset = workspace.datasets[name]

dataset.update_from_dataframe(dataframe=frame)

You can optionally change the name, description or the format of the data too:

name = 'my existing dataset'
dataset = workspace.datasets[name]

dataset.update_from_dataframe(
    dataframe=frame,
    data_type_id=DataTypeIds.GenericCSV,
    name='my new name',
    description='my new description'
)

If you attempt to create a new dataset with an invalid name, or if Azure ML rejects the dataset for any other reason, an AzureMLHttpError will be raised. AzureMLHttpError is raised when the http status code indicates a failure. A detailed error message can displayed by printing the exception, and the HTTP status code is stored in the status_code field.

from azureml import AzureMLHttpError

try:
    workspace.datasets.add_from_dataframe(
        dataframe=frame,
        data_type_id=DataTypeIds.GenericCSV,
        name='invalid:name',
        description='my description'
    )
except AzureMLHttpError as error:
    print(error.status_code)
    print(error)

Services Usage

The services subpackage allows you to easily publish and consume AzureML Web Services. Currently only Python 2.7 is supported for services because the back end only has Python 2.7 installed.

Publishing

Python functions can either be published using the @publish decorator or by calling the publish method directly. To publish a function using the decorator you can do:

from azureml import services

@services.publish(workspace, workspace_token)
@services.types(a = float, b = float)
@services.returns(float)
def func(a, b):
    return a / b

This publishes a function which takes two floating point values and divides them. Alternately you can publish a function by calling the publish method directly:

my_func = publish(my_func, workspace, workspace_token, files_list, endpoint=None)

If a function has no source file associated with it (for example, you're developing inside of a REPL environment) then the functions byte code is serialized. If the function refers to any global variables those will also be serialized using Pickle. In this mode all of the state which you're referring to needs to be already defined (e.g. your published function should come after any other functions you are calling).

If a function is saved on disk then the entire module the function is defined in will be serialized and re-executed on the server to get the function back. In this mode the entire contents of the file is serialized and the order of the function definitions don't matter.

After the function is published there will be a "service" property on the function. This object has several properties of interest:

Property	Description
url	this is the end point for executing the function
api_key	this is the API key which is required to invoke the function
help_url	this is a human readable page which describes the parameters and results of the function. It also includes sample code for executing it from various languages.
service_id	this is a unique GUID identifying the service in your workspace. You can re-use this ID to update the service once it's published

You can specify a list of files which should be published along with the function. The resulting files will be stored in a subdirectory called 'Script Bundle'. The list of files can be one of:

Format	Description
(('file1.txt', None), )	file is read from disk
(('file1.txt', b'contents'), )	file contents are provided
('file1.txt', 'file2.txt')	files are read from disk, written with same filename
((('file1.txt', 'destname.txt'), None), )	file is read from disk, written with different destination name filenames.

The various formats for each filename can be freely mixed and matched. Files can also be attached using the @attach decoator:

@publish(...)
@attach('file1.txt')
def f(x):
    pass

And this supports the same file formats as the list.

If you are using AzureML from a different geography (for example West Europe or East Asia) you'll need to specify the endpoint that you need to connect to. The end point is your region plus "management.azureml.net", for example: https://europewest.management.azureml.net

Consumption

Existing services can be consumed using the service decorator. An empty function body is supplied and the resulting function becomes invokable and calls the published service:

from azureml import services

@services.service(url, api_key)
@services.types(a = float, b = float)
@services.returns(float)
def func(a, b):
    pass

Controlling publishing / consumption

There are several decorators which are used to control how the invocation occurs.

types(**kwargs)

Specifies the types used for the arguments of a published or consumed service.

The type annotations are optional and are used for providing information which allows the service to interoperate with other languages. The type information will be seen on the help page of the published service. If the type information is not provided a Python specific format will be used and other languages may not be able to call the sevice.

Supported types are: int, bool, float, unicode.

When an unsupported type is specified the type will be serialized using an internal representation based upon Python's Pickle protocol. This will prevent the web service from being used with other languages.

When working with strings you need to use the unicode data type. This is because the string data type used for interop is actually a Unicode string and Python's "str" objects are actually byte arrays.

For

returns(return_type)

Specifies the return type for a published service.

Like the parameter types this is also optional, and when omitted an internal Python format will be used and interoperability with other languages may be reduced.

Supported types are: int, bool, float, unicode.

service_id(id)

Specifies the service ID for a service. When publishing to the same service ID the service is updated instead of having a new service created.

name(name)

Specifies a friendly name for a service. By default the name is the function name, but this allows names with spaces or other characters which are not allowed in functions.

attach(name, contents)

Attaches a file to the payload to be uploaded.

If contents is omitted the file is read from disk. If name is a tuple it specifies the on-disk filename and the destination filename.

dataframe_service

Indicates that the function operations on a data frame. The function will receive a single input in the form of a data frame, and should return a data frame object. The schema of the data frame is specified with this decorator.

@publish(...)
@dataframe_service(a = int, b = int)
@returns(int)
def myfunc(df):
    return pandas.DataFrame([df['a'][i] + df['b'][i] for i in range(df.shape[0])])

This code can then be invoked either with:

myfunc(1, 2)

or:

myfunc.map([[1,2], [3,4]])

input_name

Specifies the name of the input the web service expects to receive. Defaults to 'input1' Currently this is only supported on consumption.

output_name

Specifies the name of the output the web service expects to receive. Defaults to 'output1'. Currently this is only supported on consumption.

Those include the types decorator for specifying the format of the inputs, the returns decorator for specifying the return value, the attach decorator for attaching files to a published function,

azure-machinelearning-clientlibrary-python's People

Contributors

Stargazers

Watchers

Forkers

lipengyu dinov oaastest farukc amykatenicho cfored proftom fooway t2abdulg pengxiads gupta-rajat abokov arnoldandersson smarie akaichen henri-lo tuyenth satoshirobatofujimoto anirban18 chasehere xidxcn manojgl clustersdata sinivr aiedward liqsword davidifeoluwa bolaben j-martens b3nn9 chris-han mdietterle jamesborn marwan1023 valmach afnanalghmlas isabella232 dciborow sbruices arpitjain799

azure-machinelearning-clientlibrary-python's Issues

"Can't pickle function objects"

We support pickling function objects, but there's some cases where they can be embedded and we fail to pickle them. Maybe we should more aggressively try and serialize these?

class C:
    def __init__(self):
        self.a = lambda: 42
        self.b = 42

inst = C()

@services.publish
def test(name):
    return inst.b

When publishing we should include users config information if available

When you publish a function if we have config that provides access to the workspace we should include that as well. Then users can continue to just do "Workspace()." and access their resources from the published function.

Make pandas dependency optional

Some uses of this library do not require pandas, so it would be nice to be able to install it without having to track down numpy/pandas builds.

Dataframe Service not working with Excel Add-in

Hi,

I want to access the entire data at once in the webservice hence i used the Dataframe service.
But when i try to access it from Azure ML Excel Add-in, it throws DataframetoRObject error stating "object of type numpy.int32 has no len()"

Can someone point out where am I going wrong.
Also how do i exactly access the entire dataframe sent from the excel in the webservice fucntion?

Thanks

Add option to save index for pandas DataFrame

Sometimes it is necessary to save pandas.DataFrame.index column with other columns.

Difficult to test service locally

If I declare my service with the publish decorator, it deploys every time I import the file.

However, sometimes I want to import the file so I can use the function locally. Deploying the service is part of my publish process, not normal execution.

It would be nice if the @publish decorator (or a similar one) did not publish by default, but required an extra call to do so. For example:

@publish('...', '...')
def my_func(a, b):
    return a + b

assert my_func(1, 2) == 3   # validate locally

if '--publish' in sys.argv:    # or whatever condition I choose, perhaps in a different file
    my_func.publish()
    assert my_func.service(1, 2) == 3   # validate remote

Unable to view the newly added dataset in the dataset collections in azure ml studio

Cannot deploy file with UTF-8 BOM

The _get_source function does not handle encodings well, in particular, it breaks on a UTF-8 BOM even if the rest of the file is ASCII-compatible.

Would be better as:

source = codecs.open(ourfile, 'r', 'utf-8-sig').read().encode('ascii', errors='strict')

But it would probably be better overall if the entire code generation used Unicode throughout (probably via a StringIO instance to avoid the concatenation performance penalties, though those aren't generally going to be a big deal here).

publish should include error code from JSON response

Right now we only include the HTTP status code, but if you have an invalid workspace ID the body includes:

{
"error": {
"code": "InvalidWorkspaceId",
"message": "Invalid workspace ID provided. Verify the workspace ID is correct and try again."
}
}

We should pull this out and display it to the user if it is available

Rename workspace_token to authorization_token for consistency

services.publish calls it workspace_token, elsewhere in the SDK we call it the authorization_token, and Studio UI calls it the authorization token

Unable to publish service, @services.returns(float) annotation doesn't work

I went through the tutorial and got stuck at the publish service fragment:

from azureml import services
@services.publish(workspace_id, authorization_token)
@services.types(activ=int, beaver=float, time=int)
@services.returns(float)
def beaver_body_temp_predictor(activ, beaver, time):
    return regressor.predict([activ, beaver, time])

In the line @services.returns(float) I get the error

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-19-3efde61743a5> in <module>()
      2 @services.publish(workspace_id, authorization_token)
      3 @services.types(activ=int, beaver=float, time=int)
----> 4 @services.returns(float)
      5 def beaver_body_temp_predictor(activ, beaver, time):
      6     return regressor.predict([activ, beaver, time])
...
~/anaconda3_501/lib/python3.6/site-packages/azureml/services.py in _get_main_source(function)
    626         # we're marshalling the arguments in.
    627         main_source += u'    for i in range(df1.shape[0]):' + chr(10)
--> 628         for arg in _get_args(function):
    629             arg_type = _get_arg_type(arg, function)
    630             if pandas is not None and arg_type is pandas.DataFrame:

~/anaconda3_501/lib/python3.6/site-packages/azureml/services.py in _get_args(func)
    481     if args.varargs is not None:
    482         all_args.append(args.varargs)
--> 483     if args.keywords is not None:
    484         all_args.append(args.keywords)
    485     return all_args


AttributeError: 'Arguments' object has no attribute 'keywords'

Report better error on GetWorkspaceFailed

When publishing to the US end point with a Europe workspace ID we get this response:

500
{

"error": {

"code": "GetWorkspaceFailed",

"message": "Internal error. The error code has been logged. If you retry and see this error again, report the error code and the request ID to the online forum."

}

But we just report "Failed to publish function: Internal error" We should give a more helpful error message from the Python SDK

Saving dataset with size over 4 mb

Function workspace.datasets.add_from_dataframe could not save a dataset with size over 4 mb within Azure ML. It raised an error AzureMLHttpError: Maximum request length exceeded.

return array of floats for multiclass (or multiclass/multilabel) algorithms

I can't find a way to return anything but a single value of a basic type from the client code. I would like to be able to return an array (of any dimension) of basic values from my multiclass algorithm.

Encoding issue with Python 2 notebooks and add_from_dataframe functions

Getting a 'ascii' codec can't encode character u'\xe9' in position 5: ordinal not in range(128) when the dataframe is written to a csv and then uploaded, Python 3 notebooks work fine.

Add ability to generate code for service, and published w/ filename + function

Publish only works with a live Python function. Add a version which takes a filename + function name.

Also add a function to get the Python code for an existing web service.

In general this will be done to support IDE tooling scenarios.

Difficult to deploy a package

I've deployed a package for use by my service in the following way:

# If we've published, we need to add the package to sys.path
# so the following import succeeds.
import os, sys
deployed_package = os.path.abspath(r'Script Bundle\vsop.zip')
if os.path.isfile(deployed_package):
    sys.path.append(deployed_package)

from vsop.planets import *

@publish('..., '...', files=[('vsop.zip', None)])
def get_all_planets(year, month, day, hour):
    ...

It would be nice if we could take the package name (or imported module) in a decorator, create and deploy the ZIP file automatically, and update sys.path on the server automatically.

Cannot deploy web service from Azure Notebooks Python 3.5.1 with azureml package version 0.2.7

Hi folks,

I'm finding that I can't deploy a web service from inside a Python 3.5.1 notebook on Azure Notebooks. I'm not sure whether this Python version is on the list that you officially support, but since Azure Notebooks is a Microsoft service, I thought you might like to know about the problem.

The following code snippet (featured as an example on the repo) works fine for me in Python 2.7.6 but not in Python 3.5.1 notebooks:

import azureml
workspace = '1e--redacted--f8'
workspace_token = '91--redacted--5c'
ws = azureml.Workspace(workspace, workspace_token)

from azureml import services
@services.publish(workspace, workspace_token)
@services.types(a = float, b = float)
@services.returns(float)
def func(a, b):
    return a / b

The error messages I receive in Python 3.5.1 notebooks with version 0.2.7 of the azureml package installed are:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-1fc96cecc6c2> in <module>()
      7 @services.publish(workspace, workspace_token)
      8 @services.types(a = float, b = float)
----> 9 @services.returns(float)
     10 def func(a, b):
     11     return a / b

/home/nbuser/anaconda3_410/lib/python3.5/site-packages/azureml/services.py in do_publish(func)
    939     if not callable(func_or_workspace_id):
    940         def do_publish(func):
--> 941             func.service = _publish_worker(func, files, func_or_workspace_id, workspace_id_or_token, endpoint)
    942             return func
    943         return do_publish

/home/nbuser/anaconda3_410/lib/python3.5/site-packages/azureml/services.py in _publish_worker(func, files, workspace_id, workspace_token, management_endpoint)
    812     workspace_id, workspace_token, _, management_endpoint = azureml._get_workspace_info(workspace_id, workspace_token, None, management_endpoint)
    813 
--> 814     script_code = _get_source(func) + chr(10)
    815     ret_type = _get_annotation('return', func)
    816 

/home/nbuser/anaconda3_410/lib/python3.5/site-packages/azureml/services.py in _get_source(function)
    702         source = services_file.read()
    703 
--> 704     main_source = _get_main_source(function)
    705 
    706     source += chr(10) + main_source

/home/nbuser/anaconda3_410/lib/python3.5/site-packages/azureml/services.py in _get_main_source(function)
    626         # we're marshalling the arguments in.
    627         main_source += u'    for i in range(df1.shape[0]):' + chr(10)
--> 628         for arg in _get_args(function):
    629             arg_type = _get_arg_type(arg, function)
    630             if pandas is not None and arg_type is pandas.DataFrame:

/home/nbuser/anaconda3_410/lib/python3.5/site-packages/azureml/services.py in _get_args(func)
    481     if args.varargs is not None:
    482         all_args.append(args.varargs)
--> 483     if args.keywords is not None:
    484         all_args.append(args.keywords)
    485     return all_args

AttributeError: 'Arguments' object has no attribute 'keywords'

Please let me know if there is any more information you need. Thanks for your help!

Inspecting the DOM in IE

I read in the tests readme:
"You'll need the experiment id (appears in URL), the node id (can be found in the HTML DOM), the port name (displayed as a tooltip when you hover on the output port) and the data type id."

Could you please explain more in detail how to inspect the DOM in IE and where to look exactly for that information?

Unable to transform Dataset into pandas DataFrame

Hi,
I did some data processing in Azure ML Studio and saved the intermediate results as a Dataset.
I'm now trying to explore that dataset in a jupyter notebook but don't manage to convert it to a pandas DataFrame.
I use the following code :

print(type(ds))
ds.to_dataframe()

but get the following error:

AttributeError: 'SourceDataset' object has no attribute 'to_dataframe'

Any idea how to solve that? I tried _to_dataframe() and it didn't work either.
Thanks for the help.

Kernel Error while loading the Jupyter note book

When I opened the principles of Machine learning module 5 i.e Bias Variance trade off, most of the data like images or Graphs & code is not showing properly and top right corner showing as Kernel error. I opened through Firefox, windows.

Please help me on this account!

Thanks,
Krishna

`azureml.services.service_id` is not shimmed

If you use @service_id and publish a web service, it crashes because service_id cannot be imported.

Support for other versions of Python and Conda

Hi,

Azure ML now allows for one to choose between Python 2.7.7, 2.7.11 and 3.5 (with their respective Conda versions). As far as I can tell the library can only deploy as a service to an instance running 2.7.7. It would be great if an option was added so we could choose from the versions mentioned above.

Attached is the script to reproduce the observation. Please add your own workspace ID and workspace authorisation token.

script.zip

Experiment fails with custom docker base image

This might not be this library, but AzureML Experiments. I'm submitting a TensorFlow estimator and getting errors when building the docker image for the experiment for "conda not found".

I'm using the tensorflow/tensorflow:latest-gpu-py3 docker image on a BatchAI Linux DSVM of Standard_NC6. I'm not using any other dependencies that wouldn't exist on this base image.

The script is a MNIST CNN job (written with TensorFlow) can be seen at https://github.com/damienpontifex/batchai-tfconfig-workaround/blob/master/mnist.py

The experiment setup code used is:

from azureml.train.dnn import TensorFlow
from azureml.core import Workspace, Experiment
from azureml.core.compute import ComputeTarget, BatchAiCompute
from azureml.core.compute_target import ComputeTargetException

ws = Workspace(subscription_id='<subscription-id>, resource_group='ml', workspace_name='pontify')

try:
  compute_target = ComputeTarget(workspace=ws, name='<compute-name>')
except ComputeTargetException:
  compute_config = BatchAiCompute.provisioning_configuration(vm_size='STANDARD_NC6', vm_priority='lowpriority', autoscale_enabled=True, cluster_min_nodes=0, cluster_max_nodes=3)
  compute_target = ComputeTarget.create(ws, '<compute-name>', compute_config)
  compute_target.wait_for_completion(show_output=True)
  print(compute_target.get_status())

exp = Experiment(workspace=ws, name='mnist')

ds = ws.get_default_datastore()
script_params = {
  '--model-folder': ds.as_mount()
}
est = TensorFlow(
  source_directory='.', script_params=script_params, entry_script='mnist.py',
  compute_target=compute_target, use_gpu=True,
  node_count=3, worker_count=3, parameter_server_count=1, distributed_backend='ps',
  use_docker=True, custom_docker_base_image='tensorflow/tensorflow:latest-gpu-py3')
run = exp.submit(config=est)
run

The full experiment log is:

Logging into Docker registry: pontify0218013767.azurecr.io
Login Succeeded
Docker login(s) took 2.0648694038391113 seconds
Building image with name pontify0218013767.azurecr.io/azureml/azureml_601f5c943281fa956a21be4d099f73c8
Sending build context to Docker daemon    170kB

Step 1/13 : FROM tensorflow/tensorflow:latest-gpu-py3
latest-gpu-py3: Pulling from tensorflow/tensorflow
8ee29e426c26: Pulling fs layer
6e83b260b73b: Pulling fs layer
e26b65fd1143: Pulling fs layer
40dca07f8222: Pulling fs layer
b420ae9e10b3: Pulling fs layer
a579c1327556: Pulling fs layer
b440bb8df79e: Pulling fs layer
de3b2ccf9562: Pulling fs layer
9d9bb1fc2021: Pulling fs layer
fd8417f445f6: Pulling fs layer
ae12176de4be: Pulling fs layer
79fcb4b65373: Pulling fs layer
f400084f9b81: Pulling fs layer
e307428456fa: Pulling fs layer
0cf825aad3c9: Pulling fs layer
d6194e5926fa: Pulling fs layer
e9ff58a10f66: Pulling fs layer
40dca07f8222: Waiting
b420ae9e10b3: Waiting
a579c1327556: Waiting
b440bb8df79e: Waiting
de3b2ccf9562: Waiting
9d9bb1fc2021: Waiting
fd8417f445f6: Waiting
ae12176de4be: Waiting
79fcb4b65373: Waiting
f400084f9b81: Waiting
e307428456fa: Waiting
0cf825aad3c9: Waiting
d6194e5926fa: Waiting
e9ff58a10f66: Waiting
e26b65fd1143: Verifying Checksum
e26b65fd1143: Download complete
8ee29e426c26: Verifying Checksum
8ee29e426c26: Download complete
6e83b260b73b: Verifying Checksum
6e83b260b73b: Download complete
40dca07f8222: Verifying Checksum
40dca07f8222: Download complete
b420ae9e10b3: Verifying Checksum
b420ae9e10b3: Download complete
a579c1327556: Verifying Checksum
a579c1327556: Download complete
de3b2ccf9562: Verifying Checksum
de3b2ccf9562: Download complete
8ee29e426c26: Pull complete
6e83b260b73b: Pull complete
e26b65fd1143: Pull complete
40dca07f8222: Pull complete
b420ae9e10b3: Pull complete
b440bb8df79e: Verifying Checksum
b440bb8df79e: Download complete
a579c1327556: Pull complete
b440bb8df79e: Pull complete
de3b2ccf9562: Pull complete
fd8417f445f6: Verifying Checksum
fd8417f445f6: Download complete
9d9bb1fc2021: Verifying Checksum
9d9bb1fc2021: Download complete
79fcb4b65373: Verifying Checksum
79fcb4b65373: Download complete
e307428456fa: Verifying Checksum
e307428456fa: Download complete
ae12176de4be: Verifying Checksum
ae12176de4be: Download complete
0cf825aad3c9: Verifying Checksum
0cf825aad3c9: Download complete
f400084f9b81: Verifying Checksum
f400084f9b81: Download complete
e9ff58a10f66: Verifying Checksum
e9ff58a10f66: Download complete
d6194e5926fa: Verifying Checksum
d6194e5926fa: Download complete
9d9bb1fc2021: Pull complete
fd8417f445f6: Pull complete
ae12176de4be: Pull complete
79fcb4b65373: Pull complete
f400084f9b81: Pull complete
e307428456fa: Pull complete
0cf825aad3c9: Pull complete
d6194e5926fa: Pull complete
e9ff58a10f66: Pull complete
Digest: sha256:4252dd3dd509e608c8722157aad1a5fedfcd3b76ad422dec8340b14dea6cbb7c
Status: Downloaded newer image for tensorflow/tensorflow:latest-gpu-py3
 ---> 6243acd2b19f
Step 2/13 : USER root
 ---> Running in 4629c602519f
 ---> bad4fa7b62ba
Removing intermediate container 4629c602519f
Step 3/13 : RUN mkdir -p $HOME/.cache
 ---> Running in 84def190ffc1
 ---> 1e849194f2d7
Removing intermediate container 84def190ffc1
Step 4/13 : WORKDIR /
 ---> 1446ac2e1692
Removing intermediate container 4c0f73b6654b
Step 5/13 : COPY azureml-setup/99brokenproxy /etc/apt/apt.conf.d/
 ---> 7c4a4edb6b24
Step 6/13 : RUN if dpkg --compare-versions `conda --version | grep -oE '[^ ]+$'` lt 4.4.0; then conda install conda==4.4.11 -c anaconda; fi
 ---> Running in 35a0d3821e2b
�[91m/bin/sh: 1: conda: not found
�[0m�[91mdpkg: error: --compare-versions takes three arguments: <version> <relation> <version>

Type dpkg --help for help about installing and deinstalling packages [*];
Use 'apt' or 'aptitude' for user-friendly package management;
Type dpkg -Dhelp for a list of dpkg debug flag values;
Type dpkg --force-help for a list of forcing options;
Type dpkg-deb --help for help about manipulating *.deb files;

Options marked [*] produce a lot of output - pipe it through 'less' or 'more' !
�[0m ---> 610a4fd1fc1a
Removing intermediate container 35a0d3821e2b
Step 7/13 : COPY azureml-setup/mutated_conda_dependencies.yml azureml-setup/mutated_conda_dependencies.yml
 ---> 614f8e5e176e
Step 8/13 : RUN ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_079d27a647dac0e3b594f81064adeded -f azureml-setup/mutated_conda_dependencies.yml && ldconfig
 ---> Running in eba5b41dc90d
�[91m/bin/sh: 1: conda: not found
�[0mDocker image build failed
Removing any dangling images
The command '/bin/sh -c ldconfig /usr/local/cuda/lib64/stubs && conda env create -p /azureml-envs/azureml_079d27a647dac0e3b594f81064adeded -f azureml-setup/mutated_conda_dependencies.yml && ldconfig' returned a non-zero code: 127
Deleted: sha256:614f8e5e176e4f8af6acc935e70b9ddba858dfd4e493d68175a8e73a6cdf941e
Deleted: sha256:610a4fd1fc1a561bc80fc8d78ce2a0660d2e90a0e5e98dedf2059db6b226525e
Deleted: sha256:7c4a4edb6b246454186e18c4f7ecf601922ab366712f07889e531fce4e8333ee
Deleted: sha256:1446ac2e1692b49b7a144cd034cadb239a41069804665bef874e7fabf66d0d5f
Deleted: sha256:1e849194f2d7dbc1c4d405f5d71b1323077e650ebfeb22c0b5b30e34b98d5b23
Deleted: sha256:bad4fa7b62baeb8ceb8e1b7d78f1ced1d8e72c7edf8a754760c59b490fb9c0cf
Docker build took 129.79894399642944 seconds
Total task took 131.93517518043518 secs

Load workspace id/token/endpoint from config file doesn't work - validation of None too early

Improve error messages when interacting with data sets

Error message in Ipython Notebook is not clear enough. Right now, the error message from azureml package is not clear to me. For example, if I push two datasets with the same name to AML, there would be a conflict of datasets. But the error message is “AzureMLConflictHttpError: Request ID: 9a59338a-e27b-4d9d-a20b-ec4ceb6d6fc8 2015-11-15 20:15:42Z”, which is not very clear to me.

Having issues publishing Python function on new AzureML workspace as web services

I created a new AzureML workspace in West Central US from new portal. I am trying to operationalize a Python script (that I am able to operationalize on South Central) and facing errors.

ValueError: Failed to publish function: Internal error. The error code has been logged. If you retry and see this error again, report the error code and the request ID to the online forum.
Set azureml.services._DEBUG = True to enable writing predictIris.req/predictIris.res files

I did pass the endpoint parameter in @services.publish() to the management endpoint of West Central US.

For what it is worth, there does seem to be one change in workspace auth token for this new workspace created through new portal. The token seems to be encoded (i.e has a == at the end. The older workspace where it is working does not have this encoding for the tokens. Not sure if there is some decoding I have to do before passing it in @services.publish(). BTW - I tried base64 decode on this token I see in studio and it does not seem to be an ASCII string.

Service based on GBM model can't be consumed

After I published a web service based on a GBM model, the service can't be consumed. Below is the code I used to set up the service, along with a screenshot of the error when calling the service. I did not have this problem when publishing a service based on a linear model. Any insights will be greatly appreciated.

from azureml import Workspace
from sklearn.ensemble import GradientBoostingRegressor
ws = Workspace(

 workspace_id='b2bbeb56a1d04e1599d2510a06c59d87',
 authorization_token='<removed>',
 endpoint='https://studioapi.azureml.net'
)
experiment = ws.experiments['b2bbeb56a1d04e1599d2510a06c59d87.f-id.911630d13cbe4407b9fe408b5bb6ddef']
ds = experiment.get_intermediate_dataset(
 node_id='a0a931cf-9fb3-4cb9-83db-f48211be560c-323',
 port_name='Results dataset',
 data_type_id='GenericCSV'
)
frame = ds.to_dataframe()

mydata = frame

# create X and y
feature_cols = ['crim', 'zn', 'indus', 'chas', 'nox', 'rm', 'age', 'dis', 'rad', 'tax', 'ptratio', 'black', 'lstat']
X = mydata[feature_cols]
y = mydata.medv

# fit model with the best set of parameter values
params = {'n_estimators': 500, 'max_depth': 2, 'min_samples_split': 4,
          'learning_rate': 0.1, 'loss': 'ls', 'random_state': 0}

gbm = GradientBoostingRegressor(**params)

gbm.fit(X, y)

# set up web service
from azureml import services
@services.publish('b2bbeb56a1d04e1599d2510a06c59d87', '<removed>')
@services.types(crim=float, zn=float, indus=float, chas=float, nox=float, rm=float, 
                age=float, dis=float, rad=float, tax=float, ptratio=float, black=float, lstat=float)
@services.returns(float)
def mygbm(crim, zn, indus, chas, nox, rm, age, dis, rad, tax, ptratio, black, lstat):
    # predict the label
    feature_vector = [crim, zn, indus, chas, nox, rm, age, dis, rad, tax, ptratio, black, lstat]
    return gbm.predict(feature_vector)

# information about the web service
print("url: " + mygbm.service.url + "\n")
print("api_key: " + mygbm.service.api_key + "\n")
print("help_url: " + mygbm.service.help_url + "\n")
print("service id: " + mygbm.service.service_id + "\n")

The screenshot: