Coder Social home page Coder Social logo

aws-neuron-driver's People

Contributors

amazon-auto avatar micwade-aws avatar samueldotj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

aws-neuron-driver's Issues

No Package aws-neuron-dkms available

Started a Amazon Linux DLAMI with pytorch 1.9.1 and followed the steps to install the torch neuron using this URL

https://awsdocs-neuron.readthedocs-hosted.com/en/latest/neuron-intro/pytorch-setup/pytorch-install.html

While performing these steps:
# Install Neuron Driver
sudo yum versionlock delete aws-neuron-dkms
sudo yum install aws-neuron-dkms -y

First error
versionlock delete has no matches
no package aws-neuron-dkms available

Checked this existing issue: aws-neuron/aws-neuron-sdk#400 but it didn't work.

Error 2 during apt install of aws-neuron-dkms 2.1.5.0 on Ubuntu 20.04

I'm seeing this error on an inf1.xlarge instance when trying to install the aws-neuron-dkms driver according to the docs:

DKMS make.log for aws-neuron-2.1.5.0 for kernel 5.11.0-1019-aws (x86_64)
Fri Oct  8 19:23:28 UTC 2021
make: Entering directory '/usr/src/linux-headers-5.11.0-1019-aws'
  CC [M]  /var/lib/dkms/aws-neuron/2.1.5.0/build/neuron_module.o
  CC [M]  /var/lib/dkms/aws-neuron/2.1.5.0/build/neuron_pci.o
  CC [M]  /var/lib/dkms/aws-neuron/2.1.5.0/build/neuron_mempool.o
  CC [M]  /var/lib/dkms/aws-neuron/2.1.5.0/build/neuron_dma.o
  CC [M]  /var/lib/dkms/aws-neuron/2.1.5.0/build/neuron_ring.o
  CC [M]  /var/lib/dkms/aws-neuron/2.1.5.0/build/neuron_ds.o
  CC [M]  /var/lib/dkms/aws-neuron/2.1.5.0/build/neuron_core.o
  CC [M]  /var/lib/dkms/aws-neuron/2.1.5.0/build/neuron_crwl.o
  CC [M]  /var/lib/dkms/aws-neuron/2.1.5.0/build/neuron_cdev.o
  CC [M]  /var/lib/dkms/aws-neuron/2.1.5.0/build/neuron_topsp.o
  CC [M]  /var/lib/dkms/aws-neuron/2.1.5.0/build/neuron_pid.o
  CC [M]  /var/lib/dkms/aws-neuron/2.1.5.0/build/neuron_reset.o
/var/lib/dkms/aws-neuron/2.1.5.0/build/neuron_cdev.c: In function ‘ncdev_crwl_nc_range_mark’:
/var/lib/dkms/aws-neuron/2.1.5.0/build/neuron_cdev.c:1131:3: error: ignoring return value of ‘copy_to_user’, declared with attribute warn_unused_result [-Werror=unused-result]
 1131 |   copy_to_user(param, &arg, sizeof(arg));
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  CC [M]  /var/lib/dkms/aws-neuron/2.1.5.0/build/neuron_cinit.o
  CC [M]  /var/lib/dkms/aws-neuron/2.1.5.0/build/neuron_mmap.o
  CC [M]  /var/lib/dkms/aws-neuron/2.1.5.0/build/neuron_p2p.o
  CC [M]  /var/lib/dkms/aws-neuron/2.1.5.0/build/neuron_nq.o
  CC [M]  /var/lib/dkms/aws-neuron/2.1.5.0/build/neuron_metrics.o
cc1: all warnings being treated as errors
make[1]: *** [scripts/Makefile.build:288: /var/lib/dkms/aws-neuron/2.1.5.0/build/neuron_cdev.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:1849: /var/lib/dkms/aws-neuron/2.1.5.0/build] Error 2
make: Leaving directory '/usr/src/linux-headers-5.11.0-1019-aws'

Kernel info:
Linux localhost.lan 5.11.0-1019-aws #20~20.04.1-Ubuntu SMP Tue Sep 21 10:40:39 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

I've also tried a previous version (2.0.450.0) and received a similar error:

DKMS make.log for aws-neuron-2.0.450.0 for kernel 5.11.0-1019-aws (x86_64)
Fri Oct  8 19:28:46 UTC 2021
make: Entering directory '/usr/src/linux-headers-5.11.0-1019-aws'
  CC [M]  /var/lib/dkms/aws-neuron/2.0.450.0/build/neuron_module.o
  CC [M]  /var/lib/dkms/aws-neuron/2.0.450.0/build/neuron_pci.o
  CC [M]  /var/lib/dkms/aws-neuron/2.0.450.0/build/neuron_mempool.o
  CC [M]  /var/lib/dkms/aws-neuron/2.0.450.0/build/neuron_dma.o
  CC [M]  /var/lib/dkms/aws-neuron/2.0.450.0/build/neuron_ring.o
  CC [M]  /var/lib/dkms/aws-neuron/2.0.450.0/build/neuron_ds.o
  CC [M]  /var/lib/dkms/aws-neuron/2.0.450.0/build/neuron_core.o
  CC [M]  /var/lib/dkms/aws-neuron/2.0.450.0/build/neuron_crwl.o
  CC [M]  /var/lib/dkms/aws-neuron/2.0.450.0/build/neuron_cdev.o
  CC [M]  /var/lib/dkms/aws-neuron/2.0.450.0/build/neuron_topsp.o
  CC [M]  /var/lib/dkms/aws-neuron/2.0.450.0/build/neuron_pid.o
  CC [M]  /var/lib/dkms/aws-neuron/2.0.450.0/build/neuron_reset.o
  CC [M]  /var/lib/dkms/aws-neuron/2.0.450.0/build/neuron_cinit.o
/var/lib/dkms/aws-neuron/2.0.450.0/build/neuron_cdev.c: In function ‘ncdev_crwl_nc_range_mark’:
/var/lib/dkms/aws-neuron/2.0.450.0/build/neuron_cdev.c:1128:3: error: ignoring return value of ‘copy_to_user’, declared with attribute warn_unused_result [-Werror=unused-result]
 1128 |   copy_to_user(param, &arg, sizeof(arg));
      |   ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  CC [M]  /var/lib/dkms/aws-neuron/2.0.450.0/build/neuron_mmap.o
  CC [M]  /var/lib/dkms/aws-neuron/2.0.450.0/build/neuron_p2p.o
  CC [M]  /var/lib/dkms/aws-neuron/2.0.450.0/build/neuron_nq.o
  CC [M]  /var/lib/dkms/aws-neuron/2.0.450.0/build/neuron_metrics.o
cc1: all warnings being treated as errors
make[1]: *** [scripts/Makefile.build:288: /var/lib/dkms/aws-neuron/2.0.450.0/build/neuron_cdev.o] Error 1
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:1849: /var/lib/dkms/aws-neuron/2.0.450.0/build] Error 2
make: Leaving directory '/usr/src/linux-headers-5.11.0-1019-aws'

/dev/neuron0 is missing on AWS ECS Optimised AMI for Inferentia in Tokyo

Fingers crossed that this is the right place for this information

As per https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html I am using the AWS ECS optimised AMI for Inferentia (https://ap-northeast-1.console.aws.amazon.com/systems-manager/parameters/aws/service/ecs/optimized-ami/amazon-linux-2/inf/recommended/image_id/description?region=ap-northeast-1)

As of writing, this is ami-08e781fa005b6a4cf

When the server starts, the /dev/neuron0 device is missing.

ll /dev/neuron*
ls: cannot access /dev/neuron*: No such file or directory
yum info aws-neuron-dkms
Loaded plugins: dkms-build-requires, priorities, update-motd, upgrade-helper
Installed Packages
Name        : aws-neuron-dkms
Arch        : noarch
Version     : 2.2.6.0
Release     : dkms
Size        : 393 k
Repo        : installed
Summary     : aws-neuron 2.2.6.0 dkms package
License     : Unknown
Description : Kernel modules for aws-neuron 2.2.6.0 in a DKMS wrapper.

This is fixed by re-isntallign aws-neuron-dkms with the same version

[root@ip-10-0-2-113 dev]# yum install aws-neuron-dkms
Loaded plugins: dkms-build-requires, priorities, update-motd, upgrade-helper
amzn2-core                                                                                                                                                       | 3.7 kB  00:00:00
Package aws-neuron-dkms-2.2.6.0-dkms.noarch already installed and latest version
Nothing to do
[root@ip-10-0-2-113 dev]# ls^C
[root@ip-10-0-2-113 dev]# yum reinstall aws-neuron-dkms
Loaded plugins: dkms-build-requires, priorities, update-motd, upgrade-helper
Resolving Dependencies
--> Running transaction check
---> Package aws-neuron-dkms.noarch 0:2.2.6.0-dkms will be reinstalled
--> Finished Dependency Resolution

Dependencies Resolved

========================================================================================================================================================================================
 Package                                          Arch                                    Version                                         Repository                               Size
========================================================================================================================================================================================
Reinstalling:
 aws-neuron-dkms                                  noarch                                  2.2.6.0-dkms                                    neuron                                   96 k

Transaction Summary
========================================================================================================================================================================================
Reinstall  1 Package

Total download size: 96 k
Installed size: 393 k
Is this ok [y/d/N]: y
Downloading packages:
aws-neuron-dkms-2.2.6.0.noarch.rpm                                                                                                                               |  96 kB  00:00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
  Installing : aws-neuron-dkms-2.2.6.0-dkms.noarch                                                                                                                                  1/1
Removing old aws-neuron-2.2.6.0 DKMS files...

-------- Uninstall Beginning --------
Module:  aws-neuron
Version: 2.2.6.0
Kernel:  4.14.248-189.473.amzn2.x86_64 (x86_64)
-------------------------------------

Status: This module version was INACTIVE for this kernel.

Running the post_remove script:
rmmod: ERROR: Module neuron is not currently loaded
depmod...

DKMS: uninstall completed.

------------------------------
Deleting module version: 2.2.6.0
completely from the DKMS tree.
------------------------------
Done.
Loading new aws-neuron-2.2.6.0 DKMS files...
Building for 4.14.248-189.473.amzn2.x86_64
Building initial module for 4.14.248-189.473.amzn2.x86_64
Done.

neuron.ko:
Running module version sanity check.

Running the pre_install script:
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/4.14.248-189.473.amzn2.x86_64/kernel/drivers/neuron//

Running the post_install script:
neuron

depmod...

DKMS: install completed.
  Verifying  : aws-neuron-dkms-2.2.6.0-dkms.noarch                                                                                                                                  1/1

Installed:
  aws-neuron-dkms.noarch 0:2.2.6.0-dkms

Complete!
[root@ip-10-0-2-113 dev]# ll /dev/neuron*
crw-rw-rw- 1 root root 248, 0 Nov 12 06:04 /dev/neuron0
[root@ip-10-0-2-113 dev]#

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.