Comments (18)
Hi @lastweek !
I have looked into the source code and found that LegoOS checks all the ports of IB NIC. If there is one using Ethernet, it panics. However, the IB-supported NIC on CloudLab has two ports, one IB and one Ethernet, which is configured permanently and may not be changed by software. Is it possible to use only the IB port to do the RDMA? If so, could you please give some instructions of modifying the Lego source code?
Thanks a lot!
from legoos.
Hi @fyc1007261,
Sorry for the inconvenience. The current driver does not support RoCE, thus once RoCE is detected, it will simply panic. (To be precise, I'm not sure if it can work on RoCE. I forgot if I omit some code regarding RoCE in mlx4.)
The error message from make install is fine. As long as the kernel image is installed at /boot
, and you can find it on the grub menu.
"Please wait for enough IB MAD (number 7)" means both machine are waiting for the MAD control messages from Infiniband switch. For 1P-1M configuration, you need to have a IB switch, and both machines are connected to the switch. What's the configuration you are using?
from legoos.
Hi @lastweek ,
Thanks for your reply! I looked into the source code of LegoOS and found that if the driver finds that any of the ports of Infiniband-supporting NIC is using Ethernet, then it panics. However, the NIC on CloudLab has 2 ports with one using IB and the other using Ethernet. I may try to modify some source code of LegoOS to let it think there is only one port and use that port only. Is there anything that I should pay attention to?
For the IB MAD problem, I am using another IB-supporting NIC that is not Mellanox but Qlogic QLE instead. I suspect that it might not be supported by the driver...?
Thanks so much for your help!
from legoos.
Hi @fyc1007261,
It's a LegoOS bug indeed. You should try the approach you proposed. You should pay attention to the port number, make sure you are using the IB port.
About the Qlogic QLE machine, are you running LegoOS on top of that? I don't think mlx driver can run with that.. Anyhow, can you tell me more about your hardware setup? Thanks.
from legoos.
Hi @lastweek ,
It is true that the driver does not support QLE NICs. I am now trying with 1P-1M settings with Mellanox MX354A NIC and SX6036G/U1 IB switches. (Melanox IS5035 is not provided on CloudLab)
from legoos.
Hi @lastweek ,
I finally succeeded in deploying with the 1P-1M configurations by hard-coded all the num_ports
variables to 1. Thanks a lot for you help!
from legoos.
Cool!!! Would you mind share your solutions with us? Being able to run on CloudLab is a big deal!!
from legoos.
Hi @lastweek ,
For 1P-1M settings, I used the r320 hardware in Apt Utah with CentOS 7 image and simply connect 3 raw PCs together (though only 2 are used currently). Then modify the code in the drivers/
directory to hard code all num_ports
variables to 1 (because one of the r320
NIC uses Ethernet). After this, just follow the instructions you provided on the GitHub.
As for the Storage node, there might me some problems with the CentOS image on CloudLab that I cannot install Linux 3.11.1 on it so far. I will keep trying on it.
from legoos.
Cool!! Let me know if you have issues installing 3.11.1. A very concise instruction is: 1) Download 3.11.1 from kernel.org. 2) copy /boot/config-3.10.xxx (the default config) into linux-3.11.1/, 3) make oldconfig
, 4) make modules_install && make install, 5) reboot into 3.11.1
Let me know how it goes!
from legoos.
Hi @lastweek ,
There is still something wrong with my CentOS or my 3.11.1 kernel so that I failed the intsall the 3.11.1.
Is it possible to use higher stable versions such as 3.16.70? I found there are some differences between the kernel code which the linux-modules
is using. I plan to modify some implementations in linux-modules
to fit the 3.16.70
version. Did the newer kernel just modify some interface or that the newer kernel has changed some important code inside that may lead to the failure of LegoOS's storage node? In other words, is it possible that my plan will work?
Thanks a lot for your help!
from legoos.
Hi @fyc1007261,
That might work, I've done similar things (port some old RDMA code to 4.x kernel). That time I changed some protection domain and some other stuff. However, this might be time-consuming and error-prone. Before you proceed, can you share more details on the installation failure? e.g., panic messages
from legoos.
Hi @lastweek ,
I use the cp /path/to/oldconfig .config
-> make oldconfig (default for new configurations)
-> make
-> make modules_install
-> make install
steps. The 3.11.1 kernel just didn't show anything after I type enter
to select 3.11.1 at the boot loader. I also tried same steps in my VMWare and got the same results. The VMWare monitor says that the CPU of client OS has been disabled and I cannot figure out where the problem is.
Have you ever met such problems or could you please give some suggestions? Thanks!
from legoos.
About your /path/to/oldconfig
, which kernel version is it?
from legoos.
It is 3.10.0-957.12.2.el7.x86_64, which is the default version for CentOS 7 on CloudLab
from legoos.
Hi @fyc1007261, I uploaded an old config file from our machine. Though the machine is different, do you wanna give it a try?
from legoos.
Thanks so much! I will try it soon and report to you later.
from legoos.
Hi @lastweek ,
Unfortunately, your config still won't work :(
I may try QEMU to find out what's wrong inside the kernel.
By the way, I tried running storage node with 3.16.70, but the processor monitor panicked with fatal exception
, saying BUG: unable to handle kernel paging request at ffff880439d28b10
. Might that be an error caused by the difference between the two kernel versions?
Thanks a lot for your support!
from legoos.
Hi @lastweek ,
Thanks to your previous help, I have succeeded on deploying 1P-1M-1S on Ubuntu 14.04 with 3.11.1 kernel now! I am now able to run some simple python scripts. It works quite well for printing messages, using Python original modules (like time
, copy
, etc.) and using local modules (another Python script in the same folder). But when it comes to import external modules (I tried numpy
), the processor monitor panics, saying unable to handle kernel paging request at <some address>
. I wonder if LegoOS does support external modules. I am now using Python 2.7 with pip
19.2.1 and numpy 1.16.4. Numpy
was installed via pip
. Could you please kindly give some suggestions?
Thanks again for you patience!
from legoos.
Related Issues (17)
- Port LegoOS to CloudLab or Emulab HOT 3
- The second memory node not working when trying 1P-2M-1S with a GMM HOT 10
- Does LegoOS support socket programming? Memory node failed to compile... HOT 2
- possible network issues HOT 2
- Excache Replacement Policies HOT 1
- Replacing IB with Eth HOT 2
- Accessing LegoOS via SSH HOT 7
- Failed in rebooting machine with linux-kernel 3.11.1 HOT 7
- Running other page eviction policies HOT 2
- some questions about LegoOS
- Modules Not Found
- Port Storage/Global monitor to user level HOT 1
- Fix the killer linux-modules HOT 1
- Compile Error with 1P-1M Setting HOT 19
- Confusion between Physical and Virtual Caches HOT 6
- Compile error when building `linux-kernel` on storage node HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from legoos.