microsoft / archai Goto Github PK

Accelerate your Neural Architecture Search (NAS) through fast, reproducible and modular research.

Home Page: https://microsoft.github.io/archai

License: MIT License

Python 99.70% Dockerfile 0.07% Shell 0.06% PowerShell 0.18%

python pytorch machine-learning deep-learning neural-architecture-search nas automated-machine-learning model-compression darts petridish

archai's Introduction

Archai accelerates your Neural Architecture Search (NAS) through fast, reproducible and modular research, enabling the generation of efficient deep networks for various applications.

Installation • Quickstart • Tasks • Documentation • Support

Installation

Archai can be installed through various methods, however, it is recommended to utilize a virtual environment such as conda or pyenv for optimal results.

To install Archai via PyPI, the following command can be executed:

pip install archai

Archai requires Python 3.8+ and PyTorch 1.7.0+ to function properly.

For further information, please consult the installation guide.

Quickstart

In this quickstart example, we will apply Archai in Natural Language Processing to find the optimal Pareto-frontier Transformers' configurations according to a set of objectives.

Creating the Search Space

We start by importing the TransformerFlexSearchSpace class which represents the search space for the Transformer architecture:

from archai.discrete_search.search_spaces.nlp.transformer_flex.search_space import TransformerFlexSearchSpace

space = TransformerFlexSearchSpace("gpt2")

Defining Search Objectives

Next, we define the objectives we want to optimize. In this example, we use NonEmbeddingParamsProxy, TransformerFlexOnnxLatency, and TransformerFlexOnnxMemory to define the objectives:

from archai.discrete_search.api.search_objectives import SearchObjectives
from archai.discrete_search.evaluators.nlp.parameters import NonEmbeddingParamsProxy
from archai.discrete_search.evaluators.nlp.transformer_flex_latency import TransformerFlexOnnxLatency
from archai.discrete_search.evaluators.nlp.transformer_flex_memory import TransformerFlexOnnxMemory

search_objectives = SearchObjectives()
search_objectives.add_objective(
   "non_embedding_params",
   NonEmbeddingParamsProxy(),
   higher_is_better=True,
   compute_intensive=False,
   constraint=(1e6, 1e9),
)
search_objectives.add_objective(
   "onnx_latency",
   TransformerFlexOnnxLatency(space),
   higher_is_better=False,
   compute_intensive=False,
)
search_objectives.add_objective(
   "onnx_memory",
   TransformerFlexOnnxMemory(space),
   higher_is_better=False,
   compute_intensive=False,
)

Initializing the Algorithm

We use the EvolutionParetoSearch algorithm to conduct the search:

from archai.discrete_search.algos.evolution_pareto import EvolutionParetoSearch

algo = EvolutionParetoSearch(
   space,
   search_objectives,
   None,
   "tmp",
   num_iters=5,
   init_num_models=10,
   seed=1234,
)

Performing the Search

Finally, we call the search() method to start the NAS process:

algo.search()

The algorithm will iterate through different network architectures, evaluate their performance based on the defined objectives, and ultimately produce a frontier of Pareto-optimal results.

Tasks

To demonstrate and showcase the capabilities/functionalities of Archai, a set of end-to-end tasks are provided:

Documentation

The official documentation also provides a series of notebooks.

Support

If you have any questions or feedback about the Archai project or the open problems in Neural Architecture Search, please feel free to contact us using the following information:

Email: [email protected]
Website: https://github.com/microsoft/archai/issues

We welcome any questions, feedback, or suggestions you may have and look forward to hearing from you.

Team

Archai has been created and maintained by Shital Shah, Debadeepta Dey, Gustavo de Rosa, Caio Mendes, Piero Kauffmann, Chris Lovett, Allie Del Giorno, Mojan Javaheripi, and Ofer Dekel at Microsoft Research.

Contributions

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repositories using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademark

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

License

This project is released under the MIT License. Please review the file for more details.

archai's People

Contributors

Stargazers

Watchers

Forkers

sytelus sundarnut abhi2610 rssaketh queuecumber saksham-s pallabig shgaurav1 j-alex-hanson lilhuang haochen-rye debadeepta shatadru99 cavalleria taffywrinkle claudiusgonzo deepframwork codeaudit varenyambakshi aalfaizz rishirelan khwajawisal bfodonnell alexhiggins732 negishubham cclauss charudatta10 macca69 arseniysky heavy02011 bassampourco dikshantsagar ankisho dimkiriakos fuzesoft 502110983 ntoxeg anthar rmoin bluetyson nolll77 joyjeet chengzchengzhan stardust-xs mosdav global-localhost global19 global19-atlassian-net singlasahil14 qitsweauca antonpolishko dongkuanx27 jfabriciocp qpc-database cuulee standardgalactic bigdatasciencegroup nducanh255 basciple nanoautoml laymond1 vanrao-stack mojanjp sharath-girish playfloor nasa03 physolia test-mass-forker-org-1 isabella232 nannanyou wchen-github kennethzhao24 utkarsh0203 brunotech saranshrana phymucs straitrobot unknowndev31 vikinglee sgunasekar bettssheryll iq-scm srikantvv shocksun aryan-seth megh1241 tosemml evelynmitchell piero2c zhaopufeng

archai's Issues

[BUG] typos and feedback on getting_started/notebooks/discrete_search/search_space.ipynb

Describe the bug

First a question about api naming, why is "archai" used again in namespace archai_model, it seems redundant when we have the long import string from archai.discrete_search.api.archai_model already establishes we are in the archai package. Is the Archai API going to provide multiple model api's besides "ArchaiModel" ? Then just call it "Model". Perhaps you also should have an init.py that flattens this namespace a bit, for example, you could lift ArchaiModel out of the archai_model.py file so you can write from archai.discrete_search.api import ArchaiModel .
We can now wrap a DummyModel instance into an ArchaiModel: => Why is this a DummyModel ? I don't see that term used anywhere in the code that follows? Perhaps you could write "wrap a dummy model instance" if you are trying to explain that the resulting model somehow a "dummy"... why is it a dummy? Should ArchaiModel be renamed DummyModel? Perhaps your class MyModel used to be class DummyModel because I saw the output of cell [11] just change when I executed it.
What is the significance of the structure in the archid 'L=2, K=3, H=16' ? Is this always required, is this only user defined? How unique do these id's have to be, some explanation would be nice. You do say "Architecture ids are used to identify and deduplicate architectures during search" but some more would be nice, like "Architecture ids are used to identify a unique model architecture and the contents is up to you, the idea is they need to be able to identify unique architectures generated during the search process".
Why isn't "save_weights" and "load_weights" on ArchaiModel instead of DiscreteSearchSpace? I mean I understand why, ArchaiModel knows nothing about my torch model actually. But it just seems clunky and non-object oriented, I think if I were to use this API I'd subclass ArchaiModel and move a lot of stuff there, including the saving and loading of weights and the saving and loading of the archid components, layers, kernel size and hidden dimensions, allowing the CNNSearchSpace to focus on what it does which is really just random_sample... I think that would look more clean anyway...
!cat 'arch.json is not cross-platform, perhaps this is print(open('arch.json').read())
crossover example, only blends 2 models, ignoring the rest of the list, is this a valid implementation? Wouldn't it be better to consider all the models in the list? The code is not really that much more complicated:

    @overrides
    def crossover(self, model_list: List[ArchaiModel]) -> ArchaiModel:
        
        new_config = {
            'nb_layers': self.rng.choice([m.arch.nb_layers for m in model_list]),
            'kernel_size': self.rng.choice([m.arch.kernel_size for m in model_list]),
            'hidden_dim': self.rng.choice([m.arch.hidden_dim for m in model_list]),
        }
        
        crossover_model = MyModel(**new_config)
        
        return ArchaiModel(
            arch=crossover_model, archid=self.get_archid(crossover_model)
        )

and test it with:

  models = [ss.random_sample() for _ in range(10)]
  [print(m.archid) for m in models]
  ss.crossover(models).archid

"used for multiple tasks" => I know archai uses the term "task" to mean a type of model (what the model does is the task) but this language will be unclear to new readers. How about "used for a set of commonly used model types" or something...
from archai.discrete_search.search_spaces.cv import SegmentationDagSearchSpace much nicer, I see this has an init.py that is lifting the SegmentationDagSearchSpace out of segmentation_dag.search_space.

[BUG] getting_started requirements.txt

Describe the bug

To Reproduce
Steps to reproduce the behavior:

Create new python 3.10 conda environment
pip install archai
Try and run the getting-started notebooks

Expected behavior
should work

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [Windows 11]
Virtual Environment [conda,]
Python Version [3.10.1]

Additional context
Add any other context about the problem here.

fear_ranking [NEW]

The code for "FEAR: Ranking Architectures by their Feature Extraction Capabilities" is missing

Could you please share the link for this repository thanks

[BUG] einops version 0.6.1 is probably required as dependency with archai 1.0.0

Describe the bug
einops package version 0.6.1 is probably needed for discrete_search with archai 1.0.0
Doesn't work with 0.5.0 but works with 0.6.1. Haven't tested at other versions.

To Reproduce
Steps to reproduce the behavior:

Install archai 1.0.0 with python setup.py install
cd tasks/text_generation
python search.py -h
See the following error if einops is 0.5.0 :
Traceback (most recent call last):
File "./search.py", line 9, in
from archai.discrete_search.evaluators.nlp.transformer_flex_latency import (
File "path/python3.8/site-packages/archai-1.0.0-py3.8.egg/archai/discrete_search/evaluators/nlp/transformer_flex_latency.py", line 17, in
from archai.discrete_search.search_spaces.nlp.transformer_flex.search_space import (
File "path/python3.8/site-packages/archai-1.0.0-py3.8.egg/archai/discrete_search/search_spaces/nlp/init.py", line 5, in
from archai.discrete_search.search_spaces.nlp.tfpp import TfppSearchSpace
File "path/python3.8/site-packages/archai-1.0.0-py3.8.egg/archai/discrete_search/search_spaces/nlp/tfpp/init.py", line 1, in
from .backbones import *
File "path/python3.8/site-packages/archai-1.0.0-py3.8.egg/archai/discrete_search/search_spaces/nlp/tfpp/backbones/init.py", line 1, in
from .codegen.model import CodeGenForCausalLM, CodeGenConfig
File "path/python3.8/site-packages/archai-1.0.0-py3.8.egg/archai/discrete_search/search_spaces/nlp/tfpp/backbones/codegen/model.py", line 32, in
from .block import CodeGenBlock
File path/site-packages/archai-1.0.0-py3.8.egg/archai/discrete_search/search_spaces/nlp/tfpp/backbones/codegen/block.py", line 15, in
from ...mixed_op import MixedAttentionBlock
File "path/python3.8/site-packages/archai-1.0.0-py3.8.egg/archai/discrete_search/search_spaces/nlp/tfpp/mixed_op.py", line 13, in
from .ops import OPS
File "path/python3.8/site-packages/archai-1.0.0-py3.8.egg/archai/discrete_search/search_spaces/nlp/tfpp/ops/init.py", line 8, in
from .local_attention import LocalMHA
File "path/python3.8/site-packages/archai-1.0.0-py3.8.egg/archai/discrete_search/search_spaces/nlp/tfpp/ops/local_attention.py", line 10, in
from einops import rearrange, repeat, pack, unpack
ImportError: cannot import name 'pack' from 'einops' (/path/python3.8/site-packages/einops/init.py)

Fix
Probably need to bump up the einops version in setup.py

[BUG] visualizing graphs in evaluators.ipynb

Describe the bug

arch.view() is cool, this must be a pytorch feature? Unfortunately, for this to work on windows one needs to manually install the graphviz installer from https://www.graphviz.org/download/#windows, I wonder if there's a better solution that integrates more nicely into jupyter?

What if we did this instead?

m = ss.random_sample()
import torch.onnx
x = torch.zeros(1,3,64,64)  # don't know if there's an easier way to jin up a valid input for the model.
torch.onnx.export(m.arch, x, 'model.onnx', do_constant_folding=True)

# m.arch.view()
!pip install netron
import netron
netron.start('model.onnx', 8081)

Everyone has netron installed, and this works nicely on windows and the result is very pretty:

You can also add this to make the diagram show up inline:

import IPython
IPython.display.IFrame(f"http://localhost:8081", width=1000, height=1000)

[BUG] idea to reduce duplicate code in algos.ipynb

Describe the bug

When going through the notebooks in the intended order, there is a lot of duplicate code in algos.ipynb, you could eliminate that duplicate code using this trick:

import nbimporter
from search_space import MyModel, CNNSearchSpaceExt as CNNSearchSpace

Dependencies not fully listed

Code requires redis as a dependency

TypeError: PoolBN.forward: `input` must be present

Hello, I have installed Archai follow through tutorial. But when I run python scripts/main.py --algos darts, the following error occurred：

Traceback (most recent call last):
File "/home/hjy/PalmNAS/archai/scripts/main.py", line 11, in
from archai.nas.exp_runner import ExperimentRunner
File "/home/hjy/PalmNAS/archai/archai/nas/exp_runner.py", line 11, in
from archai.nas.model_desc_builder import ModelDescBuilder
File "/home/hjy/PalmNAS/archai/archai/nas/model_desc_builder.py", line 14, in
from archai.nas.operations import StemBase, Op
File "/home/hjy/PalmNAS/archai/archai/nas/operations.py", line 133, in
class PoolBN(Op):
File "/home/hjy/PalmNAS/archai/archai/nas/operations.py", line 162, in PoolBN
def forward(self, x):
File "/home/hjy/miniconda3/envs/nas/lib/python3.9/site-packages/overrides/overrides.py", line 88, in overrides
return _overrides(method, check_signature, check_at_runtime)
File "/home/hjy/miniconda3/envs/nas/lib/python3.9/site-packages/overrides/overrides.py", line 114, in _overrides
_validate_method(method, super_class, check_signature)
File "/home/hjy/miniconda3/envs/nas/lib/python3.9/site-packages/overrides/overrides.py", line 135, in _validate_method
ensure_signature_is_compatible(super_method, method, is_static)
File "/home/hjy/miniconda3/envs/nas/lib/python3.9/site-packages/overrides/signature.py", line 97, in ensure_signature_is_compatible
ensure_all_positional_args_defined_in_sub(
File "/home/hjy/miniconda3/envs/nas/lib/python3.9/site-packages/overrides/signature.py", line 211, in ensure_all_positional_args_defined_in_sub
raise TypeError(f"{method_name}: `{super_param.name}` must be present")
TypeError: PoolBN.forward: `input` must be present

Exciting project!

Hey all, I'm from the Ray team. Really excited to see this repository!

I saw that there were some things that needed to be monkey patched on Ray Tune. Is there anything we should do to make that integration easier?

Happy to jump on a call to chat more.

[BUG] why is training_epochs a float in algos.ipynb

Describe the bug

This code looks weird to me:

partial_tr = PartialTrainingValAccuracy(training_epochs=0.001, progress_bar=True)

I've never thought of the #epochs as a floating point number before, what is the value of making it a float?

[BUG] algos.ipynb does its own train-validation split?

Describe the bug

Why is this:

        # Loads the dataset
        tr_data = dataset_provider.get_train_dataset()
        
        # Train-validation split
        tr_data, val_data = torch.utils.data.random_split(
            tr_data, lengths=[len(tr_data) - 1_000, 1_000], 
            generator=torch.Generator().manual_seed(42)
        )

not just this?

        # Loads the datasets
        tr_data = dataset_provider.get_train_dataset()
        val_data = dataset_provider.get_val_dataset()

Dependencies not installed properly by running install.sh

Following was the output after running install.sh

+ conda install -y -c conda-forge pickle5
Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: \ 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
Examining _sysroot_linux-64_curr_repodata_hack:   5%| | 3/56 [00:00<00:00, 29194Examining pyparsing:   7%|█▍                  | 4/56 [00:00<00:00, 31068.92it/s]Examining certifi:  11%|██▎                   | 6/56 [00:00<00:00, 33599.23it/s]
Comparing spe-  that have this dependency:   0%|          | 0/2 [00:00<?, ?it/s]
Finding shortest conflict path for certifi[version='>=2016.9.26']:   0%| | 0/3 [Finding shortest co\ lict path for certifi[version='>=2016.09']:  33%|▎| 1/3 [00Finding shortest conflict path for certifi[version='>=2016.09']:  67%|▋| 2/3 [00Finding shortest conflict path for ca-certificates:  67%|▋| 2/3 [00:00<00:00, 15Finding shortest conflict path for ca-certificates: 100%|█| 3/3 [00:00<00:00,- 1Comparing specs that have this dependency:  50%|█ | 1/2 [00:00<00:00,  3.38it/s]Finding shortest conf\ ct path for certifi[version='>=2016.9.26']:  25%|▎| 1/4 [Finding shortest conflict path for certifi[version='>=2016.9.26']:  50%|▌| 2/4 [Finding shortest co| lict path for certifi[version='>=2016.09']:  50%|▌| 2/4 [00Finding shortest conflict path for certifi[version='>=2016.09']:  75%|▊| 3/4 [00Findin- shortest conflict path for ca-certificates:  75%|▊| 3/4 [00:03<00:00,  1Examining python:  55%|█████████████▊           | 31/56 [00:04<00:08,  2.92it/s]Comparing specs that have this dependency:   0%|          | 0/2 [00:00<?, ?it/s]Finding shortest confli\  path for python=3.8:  10%| | 1/10 [00:00<00:00, 18558.Finding shortest conflict path for python_abi=3.8[build=*_cp38]:  20%|▏| 2/10 [0Finding shortest conflict path for python:  30%|▎| 3/10 [00:00<00:00, 23.39it/s]Finding shortest conflict path for python[version='>=3.8,<3.9.0a0']:  40%|▍| 4/1Finding shortest conflict path for python=3.8:  50%|▌| 5/10 [00:00<00:00, 31.18iFinding shortest conflict path for python[version='>=3.6,<3.7.0a0']:  60%|▌| 6/1Finding shortest conflict path for python[version='>=3']:  70%|▋| 7/10 [00:00<00Finding shortest conflict path for python=3.6:  80%|▊| 8/10 [00:00<00:00, 31.18iFinding shortest conflict path for python[version='>=3.7,<3.8.0a0']:  90%|▉| 9/1Comparing spe|  that have this dependency:  50%|█ | 1/2 [00:00<00:00,  7.61it/s]Finding shortest conflict path for python_abi=3.6[build=*_cp36m]:   0%| | 0/10 [Finding shortest conflict path for python=3.7:  10%| | 1/10 [00:00<00:02,  3.42iFinding shortest conflict path for python=3.7:  20%|▏| 2/10 [00:00<00:01,  6.83iFinding shortest conf/ ct path for python_abi=3.7[build=*_cp37m]:  20%|▏| 2/10 [Finding shortest conflict path for python:  30%|▎| 3/10 [00:00<00:01,  6.83it/s]Finding shortest conflict path for python=3.8:  40%|▍| 4/10 [00:00<00:00,  6.83iFinding shortest conflict path for python=3.8:  50%|▌| 5/10 [00:00<00:00,  8.80iFinding shortest conflict pa-  for python[version='>=3.6,<3.7.0a0']:  50%|▌| 5/1Finding shortest conflict path for python_abi==3.6[build=*_pypy36_pp73]:  60%|▌|Finding short\ t conflict path for python[version='>=3']:  70%|▋| 7/10 [00:00<00Finding shortest conflict path for python[version='>=3']:  80%|▊| 8/10 [00:00<00Finding shortest conflict path for python=3.6:  80%|▊| 8/10 [00:00<00:00, 10.49ifailed                                                                                                                                                          UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:

Specifications:

  - pickle5 -> python[version='3.6.*|3.7.*']
  - pickle5 -> python[version='>=3.6,<3.7.0a0|>=3.7,<3.8.0a0']

Your python: python=3.8

If python is on the left-most side of the chain, that's the version you've asked for.
When python appears to the right, that indicates that the thing on the left is somehow
not available for the python version you are constrained to. Note that conda will not
change your python version to a different minor version unless you explicitly specify
that.

The following specifications were found to be incompatible with each other:



Package setuptools conflicts for:
pickle5 -> python[version='>=3.6,<3.7.0a0'] -> pip -> setuptools
python=3.8 -> pip -> setuptools
Package certifi conflicts for:
python=3.8 -> pip -> setuptools -> certifi[version='>=2016.09|>=2016.9.26']
pickle5 -> python[version='>=3.6,<3.7.0a0'] -> pip -> setuptools -> certifi[version='>=2016.09|>=2016.9.26']
Package ca-certificates conflicts for:
python=3.8 -> openssl[version='>=1.1.1g,<1.1.2a'] -> ca-certificates
pickle5 -> python[version='>=3.6,<3.7.0a0'] -> openssl[version='>=1.0.2o,<1.0.3a'] -> ca-certificates
Package python_abi conflicts for:
python=3.8 -> pip -> setuptools -> python_abi=3.8[build=*_cp38]

After running install.sh
tried running main.py

got following error

import yaml

ModuleNotFoundError: No module named 'yaml'

So I installed yaml using
Conda installed yaml

Once I fixed it - another error popped up

from overrides import EnforceOverrides

ModuleNotFoundError: No module named 'overrides'

[BUG] crash in algos.ipynb when I try and run it on my cuda device...

Describe the bug

I don't know if this is a windows thing or not but when I run the PartialTrainingValAccuracy on my cuda device the parallel_partial_tr block crashes with:

error 18:24:45.920: Raw kernel process exited code: 3
error 18:24:45.922: Error in waiting for cell to complete Error: Canceled future for execute_request message before replies were done
    at t.KernelShellFutureHandler.dispose (c:\Users\clovett\.vscode\extensions\ms-toolsai.jupyter-2023.1.2000312134\out\extension.node.js:2:33213)
    at c:\Users\clovett\.vscode\extensions\ms-toolsai.jupyter-2023.1.2000312134\out\extension.node.js:2:52265
    at Map.forEach (<anonymous>)
    at y._clearKernelState (c:\Users\clovett\.vscode\extensions\ms-toolsai.jupyter-2023.1.2000312134\out\extension.node.js:2:52250)
    at y.dispose (c:\Users\clovett\.vscode\extensions\ms-toolsai.jupyter-2023.1.2000312134\out\extension.node.js:2:45732)
    at c:\Users\clovett\.vscode\extensions\ms-toolsai.jupyter-2023.1.2000312134\out\extension.node.js:17:139244
    at Z (c:\Users\clovett\.vscode\extensions\ms-toolsai.jupyter-2023.1.2000312134\out\extension.node.js:2:1608939)
    at Kp.dispose (c:\Users\clovett\.vscode\extensions\ms-toolsai.jupyter-2023.1.2000312134\out\extension.node.js:17:139221)
    at qp.dispose (c:\Users\clovett\.vscode\extensions\ms-toolsai.jupyter-2023.1.2000312134\out\extension.node.js:17:146518)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)
warn 18:24:45.923: Cell completed with errors {
  message: 'Canceled future for execute_request message before replies were done'

I wonder if this description included in your markdown is missing the device="cuda" parameter on the PartialTrainingValAccuracy constructor?

RayParallelObjective(
    PartialTrainingValAccuracy(training_epochs=1),
    num_gpus=0.5, # 2 jobs per gpu available
    max_calls=1
)

Because this is what you have in the code a bit later on:

    RayParallelEvaluator(
        PartialTrainingValAccuracy(training_epochs=1, device='cuda'),
        num_gpus=0.5, # 2 jobs per gpu available
        max_calls=1
    ),

So you might want to mention here that this will require your machine have GPU and CUDA python setup... I did and so this worked on my machine, but a heads up might be necessary for other readers... is there a "first notebook" entry point to all these notebooks?

error while running python setup.py install

creating /opt/anaconda3/lib/python3.7/site-packages/archai-0.4.2-py3.7.egg
Extracting archai-0.4.2-py3.7.egg to /opt/anaconda3/lib/python3.7/site-packages
  File "/opt/anaconda3/lib/python3.7/site-packages/archai-0.4.2-py3.7.egg/archai/data_aug/search.py", line 343
    , aug, cutout,
    ^
SyntaxError: invalid syntax

Integration with pytorch lightning?

Integration with pytorch lightning?
Hi, I like your approach with this repo. I have a question... Do you have any thoughts about joining forces with the pytorch lightning project?

Kind regards,
Christofer

[BUG] TypeError: 'NoneType' object is not subscriptable

Describe the bug

Playing with evaluators.ipynb, every so often when I re-run the cell

onnx_latency_obj = AvgOnnxLatency(input_shape=(1, 3, 64, 64))
onnx_latency_obj.evaluate(model=ss.random_sample(), dataset_provider=None, budget=None)

I get this error:

Output exceeds the [size limit](command:workbench.action.openSettings?[). Open the full output data [in a text editor](command:workbench.action.openLargeOutput?a89a71ed-6585-4096-9ba1-48e719d84f0f)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
d:\git\microsoft\archai\docs\getting_started\notebooks\discrete_search\evaluators.ipynb Cell 15 in <cell line: 2>()
      [1](vscode-notebook-cell:/d%3A/git/microsoft/archai/docs/getting_started/notebooks/discrete_search/evaluators.ipynb#X16sZmlsZQ%3D%3D?line=0) onnx_latency_obj = AvgOnnxLatency(input_shape=(1, 3, 64, 64))
----> [2](vscode-notebook-cell:/d%3A/git/microsoft/archai/docs/getting_started/notebooks/discrete_search/evaluators.ipynb#X16sZmlsZQ%3D%3D?line=1) onnx_latency_obj.evaluate(model=ss.random_sample(), dataset_provider=None, budget=None)

File d:\git\microsoft\archai\archai\discrete_search\evaluators\onnx_model.py:69, in AvgOnnxLatency.evaluate(self, model, dataset_provider, budget)
     67 # Exports model to ONNX
     68 exported_model_buffer = io.BytesIO()
---> 69 torch.onnx.export(
     70     model.arch,
     71     self.sample_input,
     72     exported_model_buffer,
     73     input_names=[f"input_{i}" for i in range(len(self.sample_input))],
     74     **self.export_kwargs,
     75 )
     77 exported_model_buffer.seek(0)
     79 # Benchmarks ONNX model

File c:\Users\clovett.REDMOND\Anaconda3\envs\archai\lib\site-packages\torch\onnx\__init__.py:305, in export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, opset_version, do_constant_folding, dynamic_axes, keep_initializers_as_inputs, custom_opsets, export_modules_as_functions)
     39 r"""
     40 Exports a model into ONNX format. If ``model`` is not a
     41 :class:`torch.jit.ScriptModule` nor a :class:`torch.jit.ScriptFunction`, this runs
   (...)
    301     model to the file ``f`` even if this is raised.
...
    431     scales = g.op("Constant", value_t=torch.tensor(scales_constant, dtype=torch.float32))
    432 return scales

TypeError: 'NoneType' object is not subscriptable 
(Occurred when translating upsample_nearest2d).

Desktop (please complete the following information):

OS: [Windows]
Virtual Environment [conda]
Python Version [3.8.13]

[NEW] Use a single plot library across Archai

Is your feature request related to a problem? Please describe.
Right now, we have imports for matplotlib, plotly and seaborn, which at the end of the day, do the same thing.

Describe the solution you'd like
Use a single plot library for the package.

Describe alternatives you've considered
Either matplotlib or preferentially plotly.

[BUG] some minor notebook bugs

Describe the bug

This is unfiltered "first impression" playing with some of the new jupyter notebooks. The notebooks are great, but there's a lot of overlap across them, is that by design? For example, I the notebooks in docs\getting_started\notebooks\cv\pl_trainer.ipynb seem to be low level and doesn't show the connection to archai so much as algos.ipynb which ties things together and also shows how to create the MnistDatasetProvider already, so is this redundancy by design or is it just a work in progress?

Feedback on docs\getting_started\notebooks\cv\pl_trainer.ipynb

Some typos :

Archai's offers => Archai offers
and exposes proper name for methods => ??? Not clear what this means, is it renaming methods so they fit our discrete_search interface?
searches' evaluators => the possesive noun is a bit clunky, especially with a lower case noun, in plural possessive format, I think it reads better as just "search evaluators".

Every time I execute this block the val_loss increases:

After about 10 runs I see this:

If I re-execute the initial code blocks to recreate the dataset and model it drops back to around 2. Is the model stateful or something? Perhaps model = Model() should be in the last block not the block that defines the model?

When I set max_epochs=100 instead of max_steps=1, the progress bar output is confusing as it says "1/1":

But I think it did do the 100 epochs?

When I ask for the test dataset: test_dataset = dataset_provider.get_test_dataset() I get a weird error saying archai.datasets.cv.mnist_dataset_provider — WARNING — Testing set not available for mnist. Returning validation set which is weird. It would be nice if we could show proper procedure here even with MNist and add the test code block trainer.test(model, DataLoader(test_dataset)) at the end.
It is unfortunate that the PlTrainer takes "accelerator='gpu'" whereas the PartialTrainingValAccuracy in algos.ipynb takes device="cuda"...

Using NAS for Regression

Is Archai designed to handle image classification challenges exclusively or one could use it for regression type of challenges e.g. model input is a sequence and model output is a float ?

[BUG] unit test failures due to external dataset downloads

Describe the bug

We've seen some test failures caused by remote servers being unavailable, like this one:
https://github.com/microsoft/archai/actions/runs/4621878743/jobs/8173850396

Perhaps we should figure out how to cache all these remote datasets in an azure blob store and fetch them from there so that unit tests are not dependent on the availability of external datasets.

To Reproduce
See https://github.com/microsoft/archai/actions/runs/4621878743/jobs/8173850396

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. Ubuntu Linux]
Virtual Environment [e.g. conda, venv]
Python Version [e.g. 3.7]

Additional context
Add any other context about the problem here.

[BUG] `ConfigSearchSpace.load_arch` archid inconsistency

Describe the bug
Architecture id inconsistency when loading from a file using ConfigSearchSpace

To Reproduce

p = ArchParamTree({'a': DiscreteChoice([1, 2, 3])})
ss = ConfigSearchSpace(lambda x: x.pick('a'), p)
m = ss.random_sample()
ss.save_arch(m, 'tmp.json')
assert ss.load_arch('tmp.json').archid == m.archid

Additional context
Add any other context about the problem here.

Custom dataset

How to use custom datsets?

I want to add script for AlexNet under cifar10_models

This is an awesome initiative.
I want to make contributions to this repository thus helping in making it better and also acquire knowledge in the process.
I want to add code for AlexNet architecture in the archai/archai/cifar10_models directory.
@debadeepta @sytelus

[BUG] AvgOnnxLatency blows up if you have installed multiple Onnx RuntimeProviders

Describe the bug

The AvgOnnLatency evaluator creates an OnnxRuntime InferenceSession and InferenceSession requires that you specify the runtime provider when multiple are available, otherwise it throws an error.

To Reproduce
Steps to reproduce the behavior:

Install multiple OnnxRuntime providers (eg. CPU and CUDA).
Run this code:

import onnxruntime as rt
onnx_session = rt.InferenceSession('model.onnx')

See error

ValueError: This ORT build has ['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'] enabled. Since ORT 1.9, you are required to explicitly set the providers parameter when instantiating InferenceSession. For example, onnxruntime.InferenceSession(..., providers=['TensorrtExecutionProvider', 'CUDAExecutionProvider', 'CPUExecutionProvider'], ...)

Expected behavior

AvgOnnLatency should allow me to specify which device to use like this: e = AvgOnnLatency(device='gpu').

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: [e.g. Ubuntu Linux]
Virtual Environment [e.g. conda, venv]
Python Version [e.g. 3.7]

Additional context
Add any other context about the problem here.

[BUG] `FastHfDatasetProvider` producing wrong dataset splits

Describe the bug
FastHfDatasetProvider.from_hub is changing the original dataset splits.

To Reproduce

from transformers import AutoTokenizer
from archai.datasets.nlp.fast_hf_dataset_provider import FastHfDatasetProvider

tokenizer = AutoTokenizer.from_pretrained('gpt2')

dataset_config = {
    'dataset_name': 'wikitext',
    'dataset_config_name': 'wikitext-103-raw-v1',
}


dp = FastHfDatasetProvider.from_hub(
    **dataset_config, tokenizer=tokenizer, num_workers=10
)

d = dp.get_train_dataset(10)
print(len(d)) 
# gives me 28617

d = dp.get_val_dataset(10)
print(len(d)) # gives me 11908516

In this example, training and validation sets are inverted.

Interpretation of the model viz

Hi
Could you please assist in interpreting the model viz.
I couldn't figure out what is c_{k-2} c_{k-1} 0 and 1 ...
Is there a way to viz the type of connection (concat (from the yaml file)) and the number of filters.

Thank you.

No init.py in archai/algos/didarts

fix in PR #8

Ideas for improving Petridish

Hi...

I have an idea that I would like to hear your thoughts about.
In the petridish algorithm, you gradually grow a neural network according to some objective.

What if you combined petridish with ideas from this paper:
https://arxiv.org/abs/2006.04647
https://github.com/BayesWatch/nas-without-training

So instead of incremental steps, you did a Monte Carlo Tree Search using ideas from above to find suitable candidates in the search.

So you alternate between two modes:

One pass you add several growing steps at once.
(Basically traversing from the root node to the candidate found in the MCTS search.)
You do the MCTS using a search without training.

This is at least something I would like to try out...

//Christofer

[BUG] some typos in algos.ipynb

Describe the bug

This doesn't work on windows:

!ls ./out_evo

But this does:

%ls out_evo

What does sinalized mean: # Used to sinalized if this evaluator is compute intensive.
'# The evaluation cached is built' => perhaps "cache" is better here?
Given most people use the dark theme, I wonder if we can improve the black on dark gray problem in the plots:

The MoBananasSearch search seems to have failed saying:

2023-02-06 19:07:03,551 - archai.discrete_search.algos.bananas — WARNING —  No mutations found after 30 tries for each one of the 10 parents.
2023-02-06 19:07:03,552 - archai.discrete_search.algos.bananas — INFO —  Found 0 new architectures satisfying constraints.
2023-02-06 19:07:03,552 - archai.discrete_search.algos.bananas — INFO —  No new architectures found. Stopping search ...

But I think this is saying "No new useful mutations found" and so it is choosing to stop and report what it found so far? I think the warning is misleading.

It would be very interesting to compare the models found by each search. Their pareto curves look very similar, but are the model architectures the same of wildly different? MoBananasSearch found one bigger model with 0.989 accuracy that was twice as big, but everything else looks very similar in the range of latencies and accuracies. But some kind of conclusion blurb would be nice at the end of this notebook...

Questions about DARTS

For DARTS complexity analysis, anyone have any idea how to derive the (k+1)*k/2 expression ? Why 2 input nodes ? How will the calculated value change if graph isomorphism is considered ? Why "2+3+4+5" learnable edges ? If there is lack of connection, the paper should not add 1 which does not actually contribute to learnable edges configurations at all ?
Why need to train the weights for normal cells and reduction cells separately as shown in Figures 4 and 5 below ?
How to arrange the nodes such that the NAS search will actually converge with minimum error ? Note: Not all nodes are connected to each and every other nodes
Why is GDAS 10 times faster than DARTS ?

[BUG] my CNNSearchSpace implementation had to change in order for distributed training to work.

Describe the bug
I had the quick start CNNSearchSpace to start with, but when I started using it in an AsyncModelEvaluator it started crashing because in that case the mutate and crossover methods are called with ArchaiModel objects that have an uninitialize arch model. All they have is an archid.

To Reproduce

See cnn_search_space.py and compare this with the one in the quickstart notebook.

Expected behavior

We need to change the EvolutionarySearchSpace api so that it encourages the creation of a search space that works in both sync and async modes.

[BUG] onnx_model.py is generating onnxruntime.capi.onnxruntime_pybind11_state.Fail

Describe the bug
A clear and concise description of what the bug is.

To Reproduce

from archai.discrete_search.evaluators.onnx_model import AvgOnnxLatency
from archai.discrete_search.search_spaces.config import ArchConfig
from search_space.hgnet import StackedHourglass
from archai.discrete_search.api import ArchaiModel

arch_config = ArchConfig.from_file('config.json')
model = StackedHourglass(arch_config, num_classes=18)
archid = "123"
am = ArchaiModel(model, archid)
input_shape = (1, 3, 256, 256)

lat = AvgOnnxLatency(input_shape=input_shape, export_kwargs={'opset_version': 11})
lat.evaluate(am)

Expected behavior
Should just work.

Screenshots

Desktop (please complete the following information):

OS: Windows 11
Virtual Environment: conda
Python Version: 3.10

Additional context

Removing the tmpfile and writing to a file named "model.onnx" and then loading that in the ONNX inference session works fine. So there is some weird interplay between with tempfile.NamedTemporaryFile(delete=False) as tmp_file: and the ONNX inference session. Perhaps we could just write the onnx model to the given --output folder somewhere near the checkpoints?

I posted a minimal repro here, we'll see what they say:
microsoft/onnxruntime#15295

[BUG] Remove extra argument for EvolutionParetoSearch in tasks/text_generation/search.py

Bug description
tasks/text_generation/search.py#L129 has an extra argument "None" which is not required in archai/discrete_search/algos/evolution_pareto.py#L32

To Reproduce
Steps to reproduce the behavior:

cd tasks/text_generation
python search.py
See error:
Traceback (most recent call last):
File "./search.py", line 126, in
algo = EvolutionParetoSearch(
TypeError: init() got multiple values for argument 'num_iters'

Possible Fix
Remove the 'None'

Visualize Model Searched[REG]

Description

I'm trying to visualize the model that has been searched.
I'm using DARTS algorithm and giving a directory to save the plots in the darts.yaml file in the "nas.search.trainer.plotsdir", however when I run it, it gives me an error as seen in the traceback.

Traceback (most recent call last):
File "main.py", line 74, in
main()
File "main.py", line 70, in main
runner.run(search=not args.no_search, eval=not args.no_eval)
File "/opt/conda/lib/python3.8/site-packages/archai/nas/exp_runner.py", line 49, in run
search_result = self.run_search(conf['nas']['search'])
File "/opt/conda/lib/python3.8/site-packages/archai/nas/exp_runner.py", line 35, in run_search
return search.search(conf_search, model_desc_builder, trainer_class, finalizers)
File "/opt/conda/lib/python3.8/site-packages/archai/nas/searcher.py", line 61, in search
model_desc, search_metrics = self.search_model_desc(conf_search, model_desc,
File "/opt/conda/lib/python3.8/site-packages/archai/nas/searcher.py", line 142, in search_model_desc
search_metrics = arch_trainer.fit(data_loaders)
File "/opt/conda/lib/python3.8/site-packages/archai/common/trainer.py", line 117, in fit
self.post_epoch(data_loaders)
File "/phinet_nas/darts_phinets_bilevel_arch_trainer.py", line 68, in post_epoch
super().post_epoch(data_loaders)
File "/opt/conda/lib/python3.8/site-packages/archai/nas/arch_trainer.py", line 53, in post_epoch
self._draw_model()
File "/opt/conda/lib/python3.8/site-packages/archai/nas/arch_trainer.py", line 74, in _draw_model
draw_model_desc(self.model.finalize(), filepath=plot_filepath,
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 947, in getattr
raise AttributeError("'{}' object has no attribute '{}'".format(
AttributeError: 'Model' object has no attribute 'finalize'

How can I visualize for each epoch?
I see there is a script for vis_model_desc.py.. how can I use it? In its argument it requires a modelDesc. I have the final model description in the yaml file. How can I convert it into model desc to use this script?

microsoft / archai Goto Github PK

archai's Introduction

Installation

Quickstart

Creating the Search Space

Defining Search Objectives

Initializing the Algorithm

Performing the Search

Tasks

Documentation

Support

Team

Contributions

Trademark

License

archai's People

Contributors

Stargazers

Watchers

Forkers

archai's Issues

Feedback on docs\getting_started\notebooks\cv\pl_trainer.ipynb

Description

Recommend Projects

Recommend Topics

Recommend Org