The cookiecutter-neuro-project from neuro-inc

Install modules defined in .py files during make setup

Currently we are only installing requirements, without installing project code as a package

Mute some "no such job" errors

In some make commands, we do neuro kill <job-name> to make the <job-name> available. Since in most cases the job with this name does not exist or is not running, we need to mute the output of neuro kill (both stdout and stderr):

Cannot kill job setup-goods-on-shelves-detection: {"error": "no such job setup-goods-on-shelves-detection"}

If no license specified, no need to create the file LICENSE

Update README

Provide more details on how to modify Makefile.

Cleanup template: rename directory `{{cookiecutter.code_directory}}` -> `modules`

Remove the LICENSE from the template

Let the users decide if they need one.

Add LICENCE to the template

Allow user to choose a licence.
See https://github.com/ionelmc/cookiecutter-pylibrary as inspiration.

Docker API error during `make setup`

Saving job-c8131d2a-a0a4-457c-9815-1ccc2a07d252 -> image://artemyushkovskiy/neuromation-test-project:latest
Creating image image://artemyushkovskiy/neuromation-test-project:latest image from the job container
ERROR: Docker API error: Failed to save job 'job-c8131d2a-a0a4-457c-9815-1ccc2a07d252': DockerError(502, '<html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n<hr><center>nginx/1.17.3</center>\r\n</body>\r\n</html>\r\n')
make[1]: *** [Makefile:64: setup] Error 7

Set default package name to `modules`

When set to modules the user will be able to put Python files into '/modules` and import them as

from modules import train`

However they can override it, typing a short project name, for example, and import it as

from my_project import train

minor: rename underscored commands to dash-separated commands

In neuro, we use neuro job port-forward. Let's use same make upload-code instead of make upload_code?

Simplify Makefile

The following variables are used only once:

PROJECT_PATH_STORAGE?=storage:{{cookiecutter.project_slug}}
CODE_PATH_STORAGE?=$(PROJECT_PATH_STORAGE)/$(CODE_PATH)
DATA_PATH_STORAGE?=$(PROJECT_PATH_STORAGE)/$(DATA_PATH)
NOTEBOOKS_PATH_STORAGE?=$(PROJECT_PATH_STORAGE)/$(NOTEBOOKS_PATH)
REQUIREMENTS_PIP_STORAGE?=$(PROJECT_PATH_STORAGE)/$(REQUIREMENTS_PIP)
REQUIREMENTS_APT_STORAGE?=$(PROJECT_PATH_STORAGE)/$(REQUIREMENTS_APT)
RESULTS_PATH_STORAGE?=$(PROJECT_PATH_STORAGE)/$(RESULTS_PATH)

PROJECT_PATH_ENV?=/project
CODE_PATH_ENV?=$(PROJECT_PATH_ENV)/$(CODE_PATH)
DATA_PATH_ENV?=$(PROJECT_PATH_ENV)/$(DATA_PATH)
NOTEBOOKS_PATH_ENV?=$(PROJECT_PATH_ENV)/$(NOTEBOOKS_PATH)
REQUIREMENTS_PIP_ENV?=$(PROJECT_PATH_ENV)/$(REQUIREMENTS_PIP)
REQUIREMENTS_APT_ENV?=$(PROJECT_PATH_ENV)/$(REQUIREMENTS_APT)
RESULTS_PATH_ENV?=$(PROJECT_PATH_ENV)/$(RESULTS_PATH)

Let's not declare them, but instead just inline their values. Example:
Instead of:

.PHONY: upload-data
upload-data:  ### Upload data directory to the platform storage
	$(NEURO_CP) $(DATA_PATH) $(DATA_PATH_STORAGE)

We'll use:

.PHONY: upload-data
upload-data:  ### Upload data directory to the platform storage
	$(NEURO_CP) $(DATA_PATH) $(PROJECT_PATH_STORAGE)/$(DATA_PATH)

Or more readable:

.PHONY: upload-data
upload-data:  ### Upload data directory to the platform storage
	$(NEURO_CP) $(DATA_PATH) $(PROJECT_PATH_STORAGE)/data

Fix cleanup_jobs.py script

Problems:

it can fail to kill a job and report it as killed (example: try to delete ~/.nmrc file and run the script)
Circle CI should not fail the last step if the script failed to succeed

Add template testing

see example: https://github.com/neuromation/platform-web/pull/62

Extend Makefile to run most commands locally

When I was working with the tutorial https://neu.ro/docs/how_to_train_your_model, I got some problem with my setup:

[ay@archlinux nlp-from-scratch]$ make training 
neuro run \
	--name training-nlp-from-scratch \
	--preset cpu-small \
	--volume storage:nlp-from-scratch/data:/project/data:ro \
	--volume storage:nlp-from-scratch/rnn:/project/rnn:ro \
	--volume storage:nlp-from-scratch/results:/project/results:rw \
	--env PLATFORMAPI_SERVICE_HOST="." \
	image:neuromation-nlp-from-scratch \
	"python -u /project/rnn/char_rnn_classification_tutorial.py"
Job ID: job-24f6b127-5680-4234-a6a7-c387eb0f3640 Status: pending
Name: training-nlp-from-scratch
Http URL: https://training-nlp-from-scratch--artemyushkovskiy.jobs-staging.neu.ro
Shortcuts:
  neuro status training-nlp-from-scratch  # check job status
  neuro logs training-nlp-from-scratch    # monitor job stdout
  neuro top training-nlp-from-scratch     # display real-time job telemetry
  neuro kill training-nlp-from-scratch    # kill job
Status: pending Initializing
Status: pending ContainerCreating
Status: failed Error (Server listening on 0.0.0.0 port 22.  Server listening on :: port 22.  [] Slusarski Traceback (most recent call last):   File "/project/rnn/char_rnn_classification_tutorial.py", line 122, in <module>     print(category_lines['Italian'][:5]) KeyError: 'Italian' )
Terminal is attached to the remote job, so you receive the job's output.
Use 'Ctrl-C' to detach (it will NOT terminate the job), or restart the job
with `--detach` option.

Server listening on 0.0.0.0 port 22.
Server listening on :: port 22.
[]
Slusarski
Traceback (most recent call last):
  File "/project/rnn/char_rnn_classification_tutorial.py", line 122, in <module>
    print(category_lines['Italian'][:5])
KeyError: 'Italian'

So at this point I'd like to test my project setup (paths stored in variables in Makefile) locally. Somehting like make training-local would be useful to debug the project without copying anything to the Storage, waiting for the job to be scheduled, etc.

`data` folder content should be ignored

It should contain .gitignore which ignores everything but this file.

Add information to README.md inside the template

Refactor setup.py

Use author name and email.
Use license name.

Update the year in LICENSE

Use the following construction:

Copyright (c) {% now 'utc', '%Y' %}, {{ cookiecutter.author_name }}

Fix `cp` commands

All neuro cp usages should have the same list of options: -r -u -T.

HTTP authentication in jobs started from Makefile

Currently --no-http-auth is being used which means that there is no auth by default and we are not communicating that to the user, neither do I think that's their expectation. Let's extract that into a variable, and don't set it by default, explaining the user how to turn authentication off should they need that.

Copy all files in project root directory during `make setup`

In make setup, instead of

	neuro cp $(REQUIREMENTS_APT) $(REQUIREMENTS_APT_STORAGE)
	neuro cp $(REQUIREMENTS_PIP) $(REQUIREMENTS_PIP_STORAGE)

we should do

	neuro cp -r . storage:

so that all files are copied, including setup.py

Setup terminates after pip install during neuro save

Reproducible in this repo: https://github.com/neuromation/course-fastai-nlp.git

> make setup
...
<   Found existing installation: matplotlib 3.1.1
    Uninstalling matplotlib-3.1.1:
      Successfully uninstalled matplotlib-3.1.1
  Found existing installation: scikit-learn 0.21.3
    Uninstalling scikit-learn-0.21.3:
      Successfully uninstalled scikit-learn-0.21.3
command terminated with exit code 137
Connection to ssh-auth.neuro-ai.org.neu.ro closed.
make: *** [Makefile:48: setup] Error 137

Extract the 'training' command as a variable

The command which runs a script is clearly something that a user will change.

Add support of Weights and Biases

TBD

Add real paths to `make help`

Example:
upload_code Upload code directory to Storage
will better look like
upload_code Upload code directory: /home/artem/project/modules->storage://artem/project/modules`

But it's complicated to do so with current make help approach as make as the variables are encoded into env vars, that are expanded by make itself, but the comments are written as make comments.

`neuro job save --network-timeout` is no longer needed

https://github.com/neuromation/cookiecutter-neuro-project/blob/d0250513234d592b25ffb18edb50e11200e93a42/%7B%7Bcookiecutter.project_slug%7D%7D/Makefile#L70

--network-timeout is irrelevant once neuro-inc/neuro-cli#1062 is released.

Deprecation warning in make upload

...
neuro storage load -p -u -T data storage:detection_kit/data
DeprecationWarning: The command load is deprecated.

`setup` doesn't work

	$(NEURO_CP) $(REQUIREMENTS_APT) $(REQUIREMENTS_APT_STORAGE)
	$(NEURO_CP) $(REQUIREMENTS_PIP) $(REQUIREMENTS_PIP_STORAGE)

should be

	neuro cp $(REQUIREMENTS_APT) $(REQUIREMENTS_APT_STORAGE)
	neuro cp $(REQUIREMENTS_PIP) $(REQUIREMENTS_PIP_STORAGE)

Update README to reflect correct usage

Readme says ... cookiecutter gh:neuromation/neuromation-template should be cookiecutter gh:neuromation/cookiecutter-neuro-project

Review command help in Makefile

Add 'jupyter-cpu' and 'kill-jupyter-cpu' targets

They should be similar to 'jupyter' and 'kill-jupyter', but run on 'cpu-small' preset.
Also, kill 'jupyter-cpu' job in 'kill' target.

Expand commands options in Makefile

Since one of the purposes of the Makefile is to teach users to work with neuro, let's use full options instead of short options.

Example: neuro cp -r -p -u -T $(DATA_PATH) $(DATA_PATH_STORAGE) -> neuro cp --recursive --progress --update --no-target-directory $(DATA_PATH) $(DATA_PATH_STORAGE)

Refine template parameters

Look at the following output:

full_name [Your name]: Mariya Davydova
email [Your email address (e.g. [email protected])]: [email protected]
project_name [Name of the project]: neuro-tutorial
project_slug [neuro-tutorial]: 
project_short_description [A short description of the project]: Tutorial
code_directory [modules]: rnn
Select year_from:
1 - 2019
Choose from 1 (1) [1]: 
year_to [2019]: 
Select license:
1 - BSD 2-Clause License
2 - BSD 3-Clause License
3 - MIT license
4 - ISC license
5 - Apache Software License 2.0
6 - no
Choose from 1, 2, 3, 4, 5, 6 (1, 2, 3, 4, 5, 6) [1]: 3

Some fields are hard to understand, like year_from and year_to.
Other fields have strangely looking option selector, like licence.

Rename and make repo public

Cookiecutter templates usually have 'cookiecutter' in the name of the repo, so I suggest to do 2 things ASAP:

Rename the repo to 'cookiecutter-neuro-project'
Make it public

Replace `storage load` with `cp`

The former is deprecated since Neuro CLI 19.9.10.

Not clear what's the `slug`

During the installation, cookiecutter asks:

email [Your email address (e.g. [email protected])]: 
project_name [Name of the project]: 
project_slug [name-of-the-project]: 
project_short_description [A short description of the project]:
...

From the user's point of view, it's unclear what's the project_slug and what's the difference between it and project_name. We need to add a better description, for example: project_slug [ID of the project. Can contain only alphanumeric characters and dashes. Default: name-of-the-project]

Support applying Template to existing GitHub project

Exclude cache files while uploading files from `modules/`

Copy 'file:///home/ay/github/temp/name-of-the-project/modules' => 'storage://artemyushkovskiy/name-of-the-project/modules'
'file:///home/ay/github/temp/name-of-the-project/modules' ...
'file:///home/ay/github/temp/name-of-the-project/modules/__pycache__' DONE
'file:///home/ay/github/temp/name-of-the-project/modules' DONE

__pycache__ definitely should not be uploaded.

Refine template parameters [again]

Let's use descriptive names and values instead of default values.
For example: code_directory [A valid python module name, e.g. "modules"].

Add add project name suffix to job names

Tried running make jupyter while another job with the same was running and got an error. It was easy for me to identify the root cause, but might not be the case for those who is not familiar with neuro. I was in the situation when I had job named jupyter running already and wanted to keep it.

Add `--no-tty` to all `neuro exec` commands

Neuro exec uses --tty (for interactive shell) by default:

  # Executes a single command in the container and returns the control:
  neuro exec --no-tty my-job ls -l

We need to add the flag --no-tty to all neuro exec commands

Think about `data` folder

It is said that ML engineers hardly ever have data and code in the same place. Code usually goes on SSD, while data is stored on slow drives and often shared between many projects.

We should think about this folder usage.

Solution to the mis-branding problem

Problem: project template with make commands are to be used by the very beginners, who will type make run instead of neuro run. Since for the first time we can not get rid of make (can we?), let's prefix all Makefile targets with neuro-:
make neuro-setup
make neuro-upload-code
make neuro-run-training

so that our users don't forget that they work with Neuromation Platform 😄

Change requirements position

In an initial version I suggested the following placement of requirements:

requirements/
    apt.txt ## environment requirements, like vim
    pip.txt ## pip requirements

However, this location seems to be contr-intuitive, as normally in Python projects pip requirements are put in a requirements.txt file. So I suggest to do the same and update the Makefile accordingly:

requirements.txt   ## pip requirements
environment-requirements.txt   ## vim, etc

Change Makefile variables order

Put them in the following order:

The variables that have to be changed (like TRANING_COMMAND)
The variables that are likely to be changed (like DISABLE_HTTP_AUTH)
The variables that may be changed (like paths and job names)
The variables that are unlikely to be changed (like APT_COMMAND)

Add comments to the first two groups.

Handle invalid project names correctly

We need to handle correctly projects with names that contain characters invalid for Python module name (Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. https://www.python.org/dev/peps/pep-0008/).

Currently, we allow some invalid names (for example, we allow names with dashes), but some features may be broken in future (for example, make lint will fail with detection-kit is not a valid Python package name).

where the project name (which the user chooses) is used (and also its derivatives, like the one defined by transformation: project_name.lower().replace(" ", "_").replace("-", "_")),
which directories (locally, on storage) are created,
which images are created/pulled

Mount storage project root on `rw` during `make setup`

.PHONY: setup
setup: ### Setup remote environment
        neuro kill $(SETUP_NAME) >/dev/null 2>&1
        neuro run \
                --name $(SETUP_NAME) \
                --preset cpu-small \
                --detach \
                --volume $(PROJECT_PATH_STORAGE):$(PROJECT_PATH_ENV):ro \
                --env PLATFORMAPI_SERVICE_HOST="." \
                $(BASE_ENV_NAME) \
                'sleep 1h'

should be:

                --volume $(PROJECT_PATH_STORAGE):$(PROJECT_PATH_ENV):rw \

neuro-inc / cookiecutter-neuro-project Goto Github PK

cookiecutter-neuro-project's People

Contributors

Stargazers

Watchers

cookiecutter-neuro-project's Issues

Recommend Projects

Recommend Topics

Recommend Org