Coder Social home page Coder Social logo

wuklab / legoos Goto Github PK

View Code? Open in Web Editor NEW
474.0 474.0 73.0 15.69 MB

Disseminated, Distributed OS for Hardware Resource Disaggregation. USENIX OSDI 2018 Best Paper.

Home Page: http://LegoOS.io

License: GNU General Public License v2.0

Makefile 0.83% C 94.77% Assembly 1.02% Shell 0.45% C++ 2.30% Objective-C 0.06% Perl 0.31% Lex 0.08% Yacc 0.18%
datacenter disaggregation operating-system rdma whiskey

legoos's People

Contributors

chenyilun avatar chishiro avatar hythzz avatar lastweek avatar sumukh1991 avatar yiying-zhang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

legoos's Issues

Failed in rebooting machine with linux-kernel 3.11.1

Hi, @lastweek

As LegoOS required, I tried to install linux kernel 3.11.1 in my server as storage node. However, after installing kernel-3.11.1, I couldn't reboot the machine. I tried kernel 3.11.1 in both CentOS-7 and Ubuntu(14,16,18,20).

For CentOS-7, after I rebooted machine, the monitor showed a black screen with a cursor in the upper left corner, and the system seemed to hang.

For Ubuntu(14,16,18,20) OS, the the monitor showed "loading initial Ramdisk...." with a cursor in the head of next line, and the system also seemed to hang.

Even in virtual machine with UbuntuOS, the system still hanged and couldn't reboot successfully with kernel-3.11.1.

I doubt if there is any bug in this kernel? Because this is not a long-term support version. The kernel source code is downloaded from linux kernel official site.

Could you please provide us some suggestions? Looking forward to your reply.

Thanks!
Best regards

Compile Error with 1P-1M Setting

I encountered a compile error for both processor and memory manager.
I used Oracle VirtualBox(v5.2.22) and Vagrant(v2.2.1) and the guest OS was CentOS-7.2.
The main problem might be a configuration setting. I executed commands like below with root user.

make defconfig
cp Documents/configs/1P-1M-(Processor|Memory) .config
make

For the last "make" command , I had the error below. For this error, I could solve this by commenting out this line (Line 39) in "managers/common.c", because I thought there was no need to consider storage manager for 1P-1M model. Is there any problem with this solution?

...
  CC      managers/common.o
managers/common.c:39:3: Error: #error "Please adjust default storage node."
 # error "Please adjust default storage node."
   ^
make[1]: *** [managers/common.o] Error 1
make: *** [managers] Error 2

And for memory manager, I had another error like below. I think this is caused by the wrong order of compiling codes described in Makefile.
How can it be fixed? Or could you tell me the stable version of LegoOS? In this case, what version did you use to evaluate the performance of this OS in the article(OSDI '18)?
Finally, can LegoOS be tested with processor manager and memory manager running on different nodes on the same server?
Sorry for asking many questions, but I hope you will answer them.

...
  AS      managers/memory/ramfs/piggy.o
managers/memory/ramfs/piggy.S: Assembler messages:
managers/memory/ramfs/piggy.S:15: Error: file not found: usr/general.o
make[3]: *** [managers/memory/ramfs/piggy.o] Error 1
make[2]: *** [managers/memory/ramfs] Error 2
make[1]: *** [managers/memory] Error 2
make: *** [managers] Error 2

The second memory node not working when trying 1P-2M-1S with a GMM

Hi @lastweek ,

We have successfully deployed 1P-1M-1S on CloudLab and we are now trying to do some experiments on multiple processor/memory nodes. We tried with 5 nodes with #0 as processor; #1, #4 as memory; #2 as storage and #3 as global resource monitor. We have also correctly configured linux-modules/monitor/include/monitor_config.h to let the GMM know the IDs of the memory nodes. After rebooting processor and memory nodes, we tried make fit_install on storage and GMM, then make monitor_install on GMM node and make storage_install on storage. However, when we tried to run an application which required large memory, the #1 node (the default memory node configured on all machines) used up all its memory and panicked, while the #4 node seemed not to be working. Is there anything that we have left not configured or is there anything that we did wrong?

Thanks very much for your help!

Compile error when building `linux-kernel` on storage node

The ib_alloc_pd function should be declared in LegoOS' ib_verbs.h. However, when compiling on my machine, it goes to the CentOS kernel's ib_verbs.h, which has different interface and different meanings. I wonder whether there is any error with the makefile of the linux-module.

Kernel panics saying: "not syncing - no RoCE" while another cluster infinitely "wait for enough IB MAD (number 7)"

Thanks very much for your quick reply on my last issue!

I have been continually trying to build an 1P-1M kernel but it panics saying: "not syncing - no RoCE".

I am quite sure that I am using the identical Infiniband NIC and the Infiniband works quite well that two machines are able to ping each other via Infiniband.

There is also some problems when make install saying "Your kernel headers for kernel 4.0.0-lego+ cannot be found at /lib/modules/4.0.0-lego+/build or /lib/modules/4.0.0-lego+/source." Should it be normal or is it really an error? (I notice that modules are disabled in LegoOS so should these directories really exist?)

If not, where do you suggest might be the problems are?

By the way, I also try installing LegoOS on two other machines with different Infiniband NIC but exactly the same software configurations. It is not identical with what is recommended. But these two kernels do not panic. They stuck when "Please wait for enough IB MAD (number 7)" but fail to continue. (At least they do not panic) Can different Infiniband NIC cause infinite waiting?

I would really appreciate it if you can help!! Thanks again!

Modules Not Found

Hello @lastweek!
I am trying to compile and boot a default run of LegoOS (1P-1M) using two virtual machines, and I am getting the following error after running make install in the base LegoOS directory.

sh ./arch/x86/boot/install.sh 4.0.0-lego+ arch/x86/boot/bzImage \
System.map "/boot"
depmod: ERROR: could not open directory /lib/modules/4.0.0-lego+: No such file or directory
depmod: FATAL: could not search modules: No such file or directory
Kernel version 4.0.0-lego+ has no module directory /lib/modules/4.0.0-lego+

I believe the 4.0.0-lego+ directory is created after successfully running make in the linux-modules directory, which I was able to do.

Below is a detailed account of the complete process that I went through:
I first altered linux-modules/fit/fit_machine.c and net/lego/fit_machine.c to add the relevant LID and Hostname mapping information. Then, beginning in base LegoOS directory, I did the following in order to compile the LegoOS kernel for the Processor monitor:

make defconfig
cp Documentation/configs/1P-1M-Processor .config
make
cd linux-modules
make (I also switched to a linux kernel of 3.11.1 as specified by the directions)
cd ..
sudo make install 

For the memory monitor's kernel I did the similar process, but, as specified, I compiled the test user programs in the usr directory. I got the error specified above on both machines. I also attempted the install with a higher version of linux kernel (4.4) and received the same error.

I am currently compiling LegoOS in a qemu-kvm on a host with the following specs:
Infiniband NIC: MT27500 Family [ConnectX-3]
CPU:Intel(R) Xeon(R) CPU E5-2670 v3
OS: Centos7.2
GCC: 4.8.5

Both hosts are connected to a common switch. I have attached a folder with a copy of the LegoOS directory I am working with for your reference.

Please let me know if I can provide any more information.
Thanks!
LegoOS-Copy.zip

Port Storage/Global monitor to user level

The reason they are now in kernel is because our network layer is in kernel (LITE, SOSP'17). But, there is really NO NEED to have such a kernel-level network stack. Also, LITE's original authors (Shin-Yeh and Yiying) both feel the need to have a user-level RDMA stack.

We only need a good RPC framework. And there are many ways to build it: 1) FaRM/LITE ring buffer, 2) HERD, 3) eRPC, and so on. They favor either latency or throughput. The original ring buffer design is sufficient now. Which design to use is still to be decided.

Having a user-level storage/global monitors can also:

  1. Greatly ease testing. No more kernel panic, no more.
  2. Storage code can be simplified. No more awkward workaround, no more.

Anyhow, this is a pure engineering effort. I'm not sure when we could finish it.

some questions about LegoOS

Hi, @lastweek

During the process of deploying LegoOS, I have some questions and hope to get your help.

  1. According to GitHub ReadMe, the LegoOS can use RoCE, however, the RoCE has no lid or lid is 0( according to ibstat output). If I want to use RoCE, what should I do to set the value of Lid.
  2. I also want to use VM to deploy processor and memory node, but I found that the VM console becomes "black screen"(no output) when I switched to LegoOS kernel. I use virt+qemu. It seems that I should set some configurations such as https://github.com/WukLab/LegoOS#442-setup-serial-connection option1. I don't the meaning of source path='/root/LegoOS-ttyS0'/. My machines don't have such ttyS0 file.

Looking forward for your reply.

Best regards!

possible network issues

very glad for your wonderful ideas about current computing arch. i wonders with more and more kinds of devices and more and more devices, will the network(wires and connections) be in an explosive growth as all pComponents, mComponents, sComponents and other new xComponents should be connected ervery two of them.

Running other page eviction policies

Hi @lastweek . Thank you so much for your help last time.

After reading the paper on LegoOS and learning about the different page eviction policies available, I wanted to try them out too, but I'm unable to find their implementation in the source code of this repository.

May I ask whether if the implementation exists for us to play with?

Replacing IB with Eth

Hello @lastweek

I'm interested to replace IB with Eth.

What I'm having in mind is to just support running Lego OS components on virtual machines separately and each of them talks to others using an Eth driver like e1000e and then throughout the virt-io in the host. For example in a 1P-1M-1S config we work like pcomponent -> eth -> virt-io -> host -> virt-io -> eth -> mem-component or storage-component

Do you think this is something we can achieve with some minimal implementation effort?

Thanks,
Alireza

Confusion between Physical and Virtual Caches

Hi @lastweek

I was reading your paper with great interest, and I noticed that you have virtual caches there. But the current CPUs are working based on physical caches, right?

This is my understanding:

Conventional System: Virtual Address -> TLB -> Physical Address -> SRAM Cache -> Memory

Your System: Virtual Address -> No Translation on CPU side! -> SRAM Cache -> Excache(DRAM Cache) -> Memory Component

So how you managed to have a virtual CPU cache? Although the CPU Caches(SRAM) have been designed to work with physical addresses.

Thanks

Excache Replacement Policies

Hello,

I'm wondering why you chose to have different policies for your replacement policies? I thought the default LRU which is just an approximation of LRU would suffice for your case. But you still have implemented the FIFO.

What surprised me is the fact that FIFO is actually performing better than LRU. And I don't quite understand why FIFO is better that the LRU (Figure 13).

Also, I was wondering why fully associative Excache would not work for this case and you decided to have the set-associative organization there.

I would be thankful if you could provide some insights into this issue.

Thanks,
Alireza

Does LegoOS support socket programming? Memory node failed to compile...

Hi @lastweek @hythzz ,

Thanks so much for your previous help! Our group is currently trying to make some more complex applications with socket programming to run on LegoOS. We set CONFIG_SOCKET_O_IB=y on all three (1P-1M-1S) nodes and enabled the SOCKET_SYSCALL configuration on the processor node. The processor node worked fine, while the memory node failed to compile with the message

net/built-in.o: In function `ibapi_sock_send_message':
/users/yifancai/LegoOS.skt/net/lego/fit_ibapi.c:268: undefined reference to `sock_send_message'
net/built-in.o: In function `ibapi_sock_receive_message':
/users/yifancai/LegoOS.skt/net/lego/fit_ibapi.c:274: undefined reference to `sock_receive_message'

I tried to modify some code to let the code finds the definitions, but more errors occurred... Does LegoOS currently support socket programming? If so, is there any other configurations that I should change?

Thanks again!

Accessing LegoOS via SSH

Hello @lastweek.

I am currently trying to setup LegoOS, and I am wondering if it is easy to set up a SSH connection.
From the docs, it seems like it is possible to configure the output of LegoOS via a serial port, but are there best practices to send inputs as well if SSH does not work.

Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.