Coder Social home page Coder Social logo

dpro's People

Contributors

chenyu-jiang avatar eric-haibin-lin avatar jasperzhong avatar joapolarbear avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

dpro's Issues

AssertionError: No explicit directory found under server_logs

I tried to run the script but got this error. Here is my command.

python3 analyze.py --option optimize --platform TENSORFLOW --comm_backend NCCL --nccl_algo RING --pretty --path capture_file_tf --workspace capture_file_tf

log

[2021-01-03 14:49:35] [analyze.py:16] INFO - Namespace(ckpt=False, clean=False, comm_backend='NCCL', cost_model_tmp_dir='./', debug_traces=False, del_queue=False, delay_ratio=1.1, disable_revise=False, filter=None, force=False, full_trace=False, head=None, heat_window_size=5, logging_level='INFO', mcmc_beta=100, metadata_path=None, nccl_algo='RING', no_mutation=False, optimizer='MCMC', option='optimize', path='capture_file_tf', pcap_file_path=None, platform='TENSORFLOW', pretty=True, profile_duration=None, profile_start_step=None, progress=False, relabel=False, server_log_path=None, show_queue=False, simulate=False, sort=False, step_num=1, sub_option=None, trace_level='info', ucb_gamma=0.1, ucb_type='AVG', ucb_visual=False, update_barrier=False, workspace='capture_file_tf', xlsx=False, zmq_log_path=None)
[2021-01-03 14:49:37] [dataloader.py:19] INFO - Use TENSORFLOW metadata
WARNING:tensorflow:From /home/yuchen/repos/byteprofile-analysis/cost_model_xla/gen_dataset_utils.py:11: The name tf.GraphDef is deprecated. Please use tf.compat.v1.GraphDef instead.

WARNING:tensorflow:From /home/yuchen/repos/byteprofile-analysis/cost_model_xla/gen_dataset_utils.py:15: The name tf.NodeDef is deprecated. Please use tf.compat.v1.NodeDef instead.

WARNING:tensorflow:From /home/yuchen/repos/byteprofile-analysis/cost_model_xla/gen_dataset_utils.py:23: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.

2021-01-03 14:49:38.839543: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-01-03 14:49:38.875380: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: 
name: Tesla P100-PCIE-12GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:02:00.0
2021-01-03 14:49:38.876548: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 1 with properties: 
name: Tesla P100-PCIE-12GB major: 6 minor: 0 memoryClockRate(GHz): 1.3285
pciBusID: 0000:03:00.0
2021-01-03 14:49:38.877014: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0
2021-01-03 14:49:38.878811: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0
2021-01-03 14:49:38.880434: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0
2021-01-03 14:49:38.880841: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0
2021-01-03 14:49:38.882994: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0
2021-01-03 14:49:38.884649: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0
2021-01-03 14:49:38.889754: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-01-03 14:49:38.893663: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0, 1
Set cost model to use GPU 1
/home/yuchen/repos/byteprofile-analysis/capture_file_tf
Traceback (most recent call last):
  File "analyze.py", line 126, in <module>
    clct = Collector(path_list[0], comm_backend=args_.comm_backend, platform=args.platform)
  File "/home/yuchen/repos/byteprofile-analysis/collect.py", line 70, in __init__
    self.pm = PathManager(root_path)
  File "/home/yuchen/repos/byteprofile-analysis/trace_utils.py", line 751, in __init__
    self.dir_level = self.get_dir_level(self.path)
  File "/home/yuchen/repos/byteprofile-analysis/trace_utils.py", line 772, in get_dir_level
    level = recur_look_up(_dir)
  File "/home/yuchen/repos/byteprofile-analysis/trace_utils.py", line 770, in recur_look_up
    return 1 + recur_look_up(os.path.join(root, target_dir))
  File "/home/yuchen/repos/byteprofile-analysis/trace_utils.py", line 769, in recur_look_up
    assert target_dir is not None, "No explicit directory found under {}".format(root)
AssertionError: No explicit directory found under /home/yuchen/repos/byteprofile-analysis/capture_file_tf/server_logs

Here is my trace directory structure.

capture_file_tf
├── collect_data.sh
├── comm_traces
│   ├── server_0.pcap
│   ├── server_1.pcap
│   ├── worker_0.pcap
│   └── worker_1.pcap
├── log_option-optimize.txt
├── run_0
│   ├── bps_cache.pickle
│   ├── bps_comm_aligned.json
│   ├── bps_trace_final.json
│   ├── comm_timeline.json
│   ├── ip_to_rank.txt
│   ├── log_option-replay.txt
│   ├── server_timeline.json
│   ├── statistic.txt
│   ├── synthetic.json
│   ├── traces_0
│   │   ├── 0
│   │   │   ├── comm.json
│   │   │   ├── dag.gml
│   │   │   ├── final_graph.json
│   │   │   ├── final_graph.pbtxt
│   │   │   ├── run_meta.json
│   │   │   ├── temp.json
│   │   │   ├── temp_json.tar
│   │   │   ├── tensor_shapes.json
│   │   │   └── variables_meta.json
│   │   └── key_dict.txt
│   ├── traces_1
│   │   └── 0
│   │       ├── comm.json
│   │       ├── dag.gml
│   │       ├── final_graph.json
│   │       ├── graph.json
│   │       ├── run_meta.json
│   │       ├── temp.json
│   │       ├── temp_json.tar
│   │       ├── tensor_shapes.json
│   │       └── variables_meta.json
│   └── trail_dag.gml
└── server_logs
    ├── server_log_0.txt
    └── server_log_1.txt

7 directories, 37 files

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.