hewlettpackard / quartz Goto Github PK
View Code? Open in Web Editor NEWQuartz: A DRAM-based performance emulator for NVM
Home Page: https://github.com/HewlettPackard/quartz
License: Other
Quartz: A DRAM-based performance emulator for NVM
Home Page: https://github.com/HewlettPackard/quartz
License: Other
Hi, I'm trying to get quartz to work on Skylake cpus.
According to the paper, LDM_STALL is derived from L2stalls, L3 hits L3 miss.. which in turn are derived from the performance counter events on different cpu micro-architecture. I looked up(with papi_native_avail command) the events used on Haswell and found that most of the events still exist on Skylake except CYCLE_ACTIVITY:STALLS_L2_PENDING. The closest event I know is CYCLE_ACTIVITY:STALLS_L2_MISS which counts the Execution stalls while at least one L2 demand load is outstanding . But I'm not sure. So any idea on Skylake which event is equivalent?
By the way, I'm tyring to access native event counter instead of PAPI for performance reasons. So I have to assemble integer format of event id similar to the number 0x55305a3 in here. Any useful references for how this event id is represented?
Hi guys !
I had a problem when I run command "sudo scripts/setupdev.sh load". It reports that Kernel module loading failed.I don't know how to fix it .I was following the README step by step.
My OS is 4.15.0-46-generic #49-Ubuntu
And the prerequest I guess i have been installed successfully.Because when i use apt-get install xxx it says that
cmake is already the newest version (3.10.2-1ubuntu2).
libconfig-dev is already the newest version (1.5-0.4).
libnuma-dev is already the newest version (2.0.11-2.1).
uthash-dev is already the newest version (2.0.2-1).
I use "apt-get install linux-headers-$(uname -r)“ to install linux-header it says that
linux-headers-4.15.0-46-generic is already the newest version (4.15.0-46.49).
I don't know if there is any version incompatibility problem. Could anybody give me a favour?
Big Thanks!
I export the EMUL_LOCAL_PROCESSES environment variable with the number or emulated processes on the host. And I also choice the NVM model and DRAM+NVM model to run the MPI programs and parsec benchmark for multiple programs. But I can't get the answer.
There is fatal error in MPI_Finalize , I don't know how to use this to test the multiple programs. So how to deal with this problem.
I set Quartz to pure PM mode by setting physical_nodes = "0" in numemul.ini, and set read/write latency both to 1000. Then I start running a program by using runenv.sh the runtime of a test program, which has more than 100000 malloc() called inside, the runtime is about 0.13 seconds. If I run it without using runenv.sh , the runtime is about 0.12s. If I increase the read/write latencies to 10000, then running by runenv.sh, the runtime is about 0.22s.
However, if I replace malloc()/free() with pmalloc()/pfree() in the program, then the runtime is about 2.2s. Which means in a pure PM mode, pmalloc() and malloc() have obvious performance gap. But based on my understanding from the README file, pmalloc() and malloc() should have similar performance under a pure PM environment. Am I missing something?
hello, I have a question about emulation of DRAM+NVM mode. In nvmemul.ini file, Which type of memory (DRAM or NVM) will be affected by latency? When I use NVM-only mode, I find the performance is changed with different latency(set in ./nvmemul.ini), even if I don't employ pmalloc and pfree. can you describe how to emulate DRAM+NVM mode ? Thank you!
Looks like by default, only 3 CPUs are supported:
In /src/lib/cpu/known_cpus.h line 21:
cpu_model_t* known_cpus[] = {
&cpu_model_intel_xeon_ex_v3,
&cpu_model_intel_xeon_ex_v2,
&cpu_model_intel_xeon_ex,
0
};
My question is, can we add our own CPU model names into this without causing any trouble, as long as the CPU I use is in the three processor families: Sandy Bridge, Ivy Bridge, and Haswell?
scripts/runenv.sh <your_app>
This is the command mentioned to run our application.
I am trying to run dhrystone-2.1 benchmark, but not sure how to run it on Quartz tool.
Please let me know how to run dhrystone-2.1 benchmark on Quartz tool.
My cpu is core i5 7th gen, which is Kaby lake rather than one of the three cpus mentioned in the articke, can I build and run quartz successfully?
Besides, can I run quartz in a virtual machine with linux OS?
In my case(running Jikes RVM on NVM), I need a specific virtual memory range mapped to NVM, by using some API like pmmap().
Can you give me some hint to start patching quartz?
I tried to use Quartz, on a machine with 12 CPUs. However, when I htop after the emulation there seem only 2 CPUs active. How can I restore my CPUs to the initial state?
I don't understand from the documentation how can we define the mode that we want to use. As I understand in the nvmemul.ini file we define the parameters of the NVM, however how can we select which mode of the emulator we will use? Thanks
I wrote a sample program in which I allocate memory randomly to dram (using malloc) and nvm (using pmalloc) and a background thread which is supposed to print out the total bytes allocated to NVM and DRAM after every 1 second.
#include <iostream>
#include <cstdlib>
#include <chrono>
#include <thread>
#include <pthread.h>
using namespace std::chrono;
size_t nvm_size = 0;
size_t dram_size = 0;
high_resolution_clock::time_point start;
high_resolution_clock::time_point stop;
bool status = true;
void print_all() {
stop = high_resolution_clock::now();
milliseconds time = duration_cast<milliseconds>(stop-start);
std::cout << time.count() << "\t" << nvm_size << "\t" << dram_size << std::endl;
}
void start_time() {
start = high_resolution_clock::now();
while (status) {
print_all();
std::this_thread::sleep_for(seconds(1));
}
}
void stop_time() {
status = false;
}
void add_nvm_size(size_t size) {
nvm_size += size;
}
void remove_nvm_size(size_t size) {
nvm_size -= size;
}
void add_dram_size(size_t size) {
dram_size += size;
}
void remove_dram_size(size_t size) {
dram_size -= size;
}
// void *allocate_nvm(size_t size) {
// return pmalloc(size);
// }
void *allocate_dram (size_t size) {
return malloc(size);
}
int main(int argc, char *argv[]) {
std::thread (start_time).detach();
int count=1;
while(count<=10000000) {
int random = rand() % 4;
if (random==0) {
allocate_dram (67108864);
add_dram_size(67108864);
// std::cout<<count<<"- Allocated in DRAM"<<"\tDRAM SIZE: "<<dram_size<<std::endl;
}
else if(random==1){
allocate_dram (67108864);
add_nvm_size(67108864);
// std::cout<<count<<"- Allocated in NVRAM"<<"\tNVRAM SIZE: "<<nvm_size<<std::endl;
}
else if(random==2){
if(dram_size>=67108864) {
remove_dram_size(67108864);
// std::cout<<count<<"- Freed from DRAM"<<"\tDRAM SIZE: "<<dram_size<<std::endl;
}
// else
// std::cout<<count<<"- Not Enough Memory Allocated in DRAM to be freed"<<"\tDRAM SIZE: "<<dram_size<<std::endl;
}
else if(random==3){
if(nvm_size>=67108864) {
remove_nvm_size(67108864);
// std::cout<<count<<"- Freed from NVRAM"<<"\tNVRAM SIZE: "<<nvm_size<<std::endl;
}
// else
// std::cout<<count<<"- Not Enough Memory Allocated in NVRAM to be freed"<<"\tNVRAM SIZE: "<<nvm_size<<std::endl;
}
count++;
}
stop_time();
return 0;
}
The following program ouputs correctly outside quartz. It displays the ouput after every 1 second. So on the left is time in milliseconds, followed by bytes allocated on NVM and bytes allocated on DRAM.
time NVM DRAM
0 201326592 0
1000 2885681152 1275068416
2000 30735859712 3288334336
3000 16911433728 138512695296
4040 37983617024 191797133312
5042 14159970304 129654325248
6361 38453379072 189918085120
7363 33554432000 108045271040
8365 15099494400 109521666048
9366 24763170816 117306294272
When I run this program with the quartz in hybrid mode it prints output after 10 milliseconds.
0 268435456 7650410496
10 1879048192 10401873920
20 2415919104 6845104128
30 2483027968 12616466432
40 3556769792 11811160064
50 4496293888 11408506880
60 536870912 17783848960
70 11072962560 16575889408
80 8120172544 9663676416
90 8657043456 7583301632
100 4966055936 268435456
110 939524096 1006632960
120 1476395008 2617245696
130 1946157056 10066329600
140 1073741824 14898167808
150 2281701376 15502147584
160 1744830464 17448304640
...
So quartz is not affecting the functionality of the thread but it's affecting the sleep time of thread.
I have not set EMUL_LOCAL_PROCESSES. Do I need to? Also why will quartz affect only the sleep time of a application thread?
When I run this:
./scripts/runenv.sh qemu-system --enable-kvm -cpu host -m 8192 -smp 2 -vcpu 0,affinity=0 -vcpu 1,affinity=1 -numa node,mem=4096,cpus=0 -numa node,mem=4096,cpus=1 -drive file=/home/temp/Dyang/centos7-200.qcow2,if=none,id=drive-virtio-disk,format=qcow2 -device virtio-blk-pci,bus=pci.0,drive=drive-virtio-disk,id=virtio-disk -net nic,model=virtio -net tap,script=no -monitor telnet:10.192.168.118:4444,server,nowait -balloon virtio
I get an unexpected error.
qemu-system: ……qemu-gfn/qemu/accel/kvm/kvm-all.c:2380: kvm_ipi_signal: Assertion
kvm_immediate_exit' failed`
I set the Debug level to 5 and just find nothing in Quartz print out.
But when I run qemu-system without Quartz, it works.
In kvm_ipi_signal, it calls kvm_cpu_kick to atomic_set(&cpu->kvm_run->immediate_exit,1).
In this reference(https://patchwork.ozlabs.org/patch/732808/?tdsourcetag=s_pctim_aiomsg),
The purpose of the KVM_SET_SIGNAL_MASK API is to let userspace "kick" a VCPU out of KVM_RUN through a POSIX signal. A signal is attached to a dummy signal handler; by blocking the signal outside KVM_RUN and unblocking it inside, this possible race is closed:
VCPU thread service thread
check flag
set flag
raise signal
(signal handler does nothing)
KVM_RUN
However, one issue with KVM_SET_SIGNAL_MASK is that it has to take tsk->sighand->siglock on every KVM_RUN. This lock is often on a remote NUMA node, because it is on the node of a thread's creator. Taking this lock can be very expensive if there are many userspace exits (as is the case for SMP Windows VMs without Hyper-V reference time counter).
Since Quartz generates IPI interrupt injection delay through remote NODE node memory access, will this affect KVM?Does Quartz support Qemu? Does Quartz have some influences on kvm?
Quartz uses the two encodings 0x530cd3 and 0x5303d3 for the events MEM_LOAD_UOPS_L3_MISS_RETIRED.REMOTE_DRAM
and MEM_LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM
, respectively. However, these encodings are only documented in the Intel manual for Ivy Bridge and not Haswell. Instead, on Haswell, the encodings to be used should be 0x5304d3 and 0x5301d3, respectively.
Hello , my server's cpu is Xeon E5-2630 v4 @ 2.20GHz , a Broadwell processor , and it's not supported.
Could you please modify the program to support this new processor ?
My Computer configuration is very common personal computer. Intel core i3, Ubuntu 14.04. Can I install quartz successfully
Since Quartz doesn't have write memory latency implemented yet as mentioned in Limitations of README file, does this mean that any write operations performed in NVM only mode or DRAM + NVM mode will have same write latency as that of the DRAM?
Hi, I'm trying to get quartz to work on my Core Skylake cpus.
According to the paper, the bandwidth model utilize thermal control registers. In Xeon, the corresponding register is THRT_PWR_DIMM[0:2]. I look up the register documents for Core, there doesn't have any register named THRT_PWR_DIMM. Also, there are no registers in Core can set the max number of transactions during the 1 usec throttling time frame per power throttling . Is it possible for bandwidth model to work on Core cpus?
When I try to run a program I don't get the correct output especially for NVM access, is it due to this :
"tee: /sys/bus/event_source/devices/cpu/rdpmc: No file or folder of this type" ?
Hey,I am getting the error : Unable to load kernel module when i execute below command
sudo scripts/setupdev.sh load
How do i fix this ?
I am having difficulty running the bandwidth-model-building.sh where I am getting a segfault error. I have checked the configuration files to make sure things are set as instructed and with debugging find that the segfault occurs when the intel_xeon_ex_get_throttle_register's regs is set to 0x00 (image of work-space below).
Would you have any suggestions on how to resolve this?
Additional Info:
Hello,
The BW throttling worked for me only after setting the THROTTLE_DDR_READ registers
in the __set_read_bw function specifically for runs after the training phase when the bandwidth model file is already present. Is this correct?
__set_read_bw() {
...
node->cpu_model->set_throttle_register(regs, THROTTLE_DDR_ACT,
read_bw_model.throttle_reg_val[point]);
//Added statement
node->cpu_model->set_throttle_register(regs, THROTTLE_DDR_READ,
read_bw_model.throttle_reg_val[point]);
...
}
HI,
I found there is a pflush()
function in the code. Do we need to call it in our user programs in order to inject the PM latency we want?
I've been trying to set up Quartz for a few days. After struggling far more than I should have, I've reached a dead end. When I attempt to run the make clean all
command from within quartz-master/build
, I run into the error pictured in the screenshot below. (This is my second time running the instruction, hence why it begins at 77%.)
I'm assuming the error is that nvmemul.ko is undefined. The only potential cause I can identify is that when I ran scripts/install.sh
, the report says that 13 packages were not upgraded. I have configured my CMakeLists.txt
file so that there were no errors during the cmake ..
command. I haven't been able to find anything by searching, and a friend who is well-versed in Unix did not understand why this error occurred.
I have also tried doing the build instructions from within quartz-master/src
. The instructions are not clear what is meant by "the emulator's source code root folder". However, this causes an error 4% into the cmake ..
command, so I'm guessing using quartz-master/src
is not the solution.
Computer Information:
Intel® Core™ i7-2600S CPU @ 2.80GHz × 8 (Sandy Bridge, I believe)
AMD® Turks / AMD® Turks
Ubuntu 21.04, 64-bit
The Ubuntu installation is a partition running natively on a ~2013 IMac.
when I first run the benchmarktest of bandwidth in benchmarktest directory,it show me "The number of physical nodes is greater than the number of memory-controller pci buses".the result figure is as below:
it show that topology mc pci file saved,but there is no data in /tmp/mc_pci_bus.
then I run the benchmarktest of bandwitdh,it show that there is no complete memory-controller pci topology to be found and report segmentation fault.the result figure is as below:
thanks in advance for any help! my cpu model is haswell.
I have read the README.md file and I was confused with the bandwidth emulation.
Consider a duel-socket NUMA environment in which node1 is configured as a virtual NVM node. Does that mean all memory requests to node1's local memory are affected by the bandwidth emulation? (Even if the process is running on node0)
And...what if a process running on node1 access the local memory of node0, will it be affected by the bandwidth emulation?
Sorry, these questions may seem stupid because I'm not familiar with the memory access in a NUMA environment.
My cpu is Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz and has two socket.
When I load the module , it find that this is Sandy Bridge.
Do Quartz support D+N mode in Sandy Bridge?
I run Quartz with my own CPP file, with the command:
g++ -I [Eumlator_Path]/quartz/src/lib/ myprogram.cpp -L [Eumlator_Path]/quartz/build/src/lib/ -lnvmemul,
(it works well with .C file with gcc complier)
But turns error:
/usr/include/c++/6/ext/string_conversions.h: In constructor ‘__gnu_cxx::__stoa(_TRet ()(const _CharT, _CharT**, _Base ...), const char*, const _CharT*, std::size_t*, _Base ...)::_Save_errno::_Save_errno()’:
/usr/include/c++/6/ext/string_conversions.h:63:27: error: ‘errno’ was not declared in this scope
_Save_errno() : _M_errno(errno) { errno = 0; }
^
/usr/include/c++/6/ext/string_conversions.h: In destructor ‘__gnu_cxx::__stoa(_TRet ()(const _CharT, _CharT**, _Base ...), const char*, const _CharT*, std::size_t*, _Base ...)::_Save_errno::~_Save_errno()’:
/usr/include/c++/6/ext/string_conversions.h:64:23: error: ‘errno’ was not declared in this scope
~_Save_errno() { if (errno == 0) errno = _M_errno; }
^
/usr/include/c++/6/ext/string_conversions.h: In function ‘_Ret __gnu_cxx::__stoa(_TRet ()(const _CharT, _CharT**, _Base ...), const char*, const _CharT*, std::size_t*, _Base ...)’:
/usr/include/c++/6/ext/string_conversions.h:72:16: error: ‘errno’ was not declared in this scope
else if (errno == ERANGE
^
In file included from /usr/include/c++/6/bits/basic_string.h:5420:0,
from /usr/include/c++/6/string:52,
from /usr/include/c++/6/bits/locale_classes.h:40,
from /usr/include/c++/6/bits/ios_base.h:41,
from /usr/include/c++/6/ios:42,
from /usr/include/c++/6/ostream:38,
from /usr/include/c++/6/iostream:39,
from /home/lishuai/fwang/quartz/reram_test.cpp:2:
/usr/include/c++/6/ext/string_conversions.h:72:25: error: ‘ERANGE’ was not declared in this scope
else if (errno == ERANGE
The primal error has a lot "XXX" was not declared in this scope, and I have already fixed some. But for the left, I need help.
Have somebody met the same problem or give some suggestions? Thank you.
Hello, I am trying to execute an application through the emulator. My application is executing successfully in the native machine. I try to link it with the emulator by adding the following flags in the compilation :
-I/NVMemul/quartz/src/lib/ -L/NVMemul/quartz/build/src/lib/ -lnvmemul
However, when I try to execute the app through the runenv.sh script I receive the following error:
../quartz/scripts/../build/src/lib/libnvmemul.so
../quartz/scripts/../nvmemul.ini
../quartz/scripts/runenv.sh: line 57: 25128 Segmentation fault (core dumped) $@
I have executed applications in the past successfully with these flags. Is there anything else that I am missing?
Hello
I have some problem with NVM read delay.
In my case, as the size of the data increases, it seems that the NVM read delay does not work in the middle of program execution. but if the data size is small, it works well.
I attached the picture that I captured the part where delay did not work using debug mode.
What should I do?
I look forward to your reply
My Experiment setup
When I compile Quartz on Ubuntu 16.04 with kernel 4.15.0.29, I get three errors:
Makefile:976: "Cannot use CONFIG_STACK_VALIDATION=y, please install libelf-dev, libelf-devel or elfutils-libelf-devel"
/home/hadi/code/quartz/build/src/dev/pmc.c: In function ‘pmc_ioctl_setcounter’:
/home/hadi/code/quartz/build/src/dev/pmc.c:171:9: error: implicit declaration of function ‘copy_from_user’ [-Werror=implicit-function-declaration]
if (copy_from_user(&q, (ioctl_query_setcounter_t*) arg, sizeof(ioctl_query_/home/hadi/code/quartz/build/src/dev/pmc.c: In function ‘pmc_ioctl_getpci’:
/home/hadi/code/quartz/build/src/dev/pmc.c:224:17: error: implicit declaration of function ‘copy_to_user’ [-Werror=implicit-function-declaration]
if (copy_to_user((ioctl_query_setgetpci_t*) arg, &q, sizeof(ioctl_q
I resolved the first error by installing libelf-dev
. Note that this library is not included in the script scripts/install.sh
. I resolved the other two errors by modifying pmc.c
so that it includes linux/uaccess.h
instead of asm/uaccess.h
.
After making these changes, the build completes successfully.
I have used this sample code where I have used pmalloc for a linked list
#include<stdio.h>
#include<stdlib.h>
#include "pmalloc.h"
typedef struct node
{
int data;
struct node *next;
}NODE;
void insertAtFront(NODE **head,int x)
{
NODE *new_node = (NODE*)pmalloc(sizeof(NODE));
new_node->data = x;
new_node->next = *head;
*head = new_node;
}
void insertAfter(NODE *prev,int x)
{
if(prev==NULL)
{
printf("prev can't be NULL\n");
return;
}
NODE *new_node = (NODE*)pmalloc(sizeof(NODE));
new_node->data = x;
new_node->next = prev->next;
prev->next = new_node;
}
void append(NODE **head,int x)
{
NODE *new_node = (NODE*)pmalloc(sizeof(NODE));
new_node->data = x;
new_node->next = NULL;
NODE *last = *head;
if(*head==NULL)
{
*head = new_node;
return;
}
while(last->next != NULL)
last = last->next;
last->next = new_node;
}
void printList(NODE *p)
{
while(p)
{
printf("%d->",p->data);
p = p->next;
}
printf("\n");
}
void deleteElement(NODE **p,int elem)
{
NODE *temp=*p;
NODE *prev;
if(temp != NULL && temp->data == elem) // if elem is at first node
{
*p = temp->next;
free(temp);
}
while(temp!=NULL && temp->data!=elem)
{
prev=temp;
temp=temp->next;
}
if(temp==NULL) return; // no such element
prev->next = temp->next;
free(temp);
}
void deleteAtPosition(NODE **p,int pos)
{
if(*p==NULL) return;
NODE *temp = *p;
if(pos==0)
{
*p = temp->next;
free(temp);
return;
}
int i;
for(i=0;temp!=NULL && i<pos-1;i++)
temp = temp->next; // ultimately gets previous node of the node to be deleted
if(temp==NULL || temp->next==NULL)
return;
NODE *next = temp->next->next;
free(temp->next);
temp->next = next;
}
int getLength(NODE *p)
{
int count = 0;
while(p)
{
count++;
p = p->next;
}
return count;
}
int getLengthRecursive(NODE *p)
{
if(p==NULL)
return 0;
return 1 + getLengthRecursive(p->next);
}
void swapNodes(NODE **p,int x, int y)
{
if(x==y)
return;
NODE *prevX=NULL, *prevY=NULL,*X=*p,*Y=*p;
while(X!=NULL && X->data != x)
{
prevX = X;
X = X->next;
}
while(Y!=NULL && Y->data != y)
{
prevY = Y;
Y = Y->next;
}
if(X==NULL || Y == NULL)
return;
if(prevX==NULL)
*p = Y;
else
prevX->next = Y;
if(prevY==NULL)
*p = X;
else
prevY->next = X;
NODE *temp = X->next;
X->next = Y->next;
Y->next = temp;
}
void reverse(NODE **p)
{
NODE *prev=NULL,*curr=*p,*next;
while(curr!=NULL)
{
next = curr->next;
curr->next=prev;
prev = curr;
curr = next;
}
*p = prev;
}
void reverseRecursive(NODE **p)
{
NODE *node = *p;
if(node == NULL)
return;
NODE *rest = (*p)->next;
if(rest==NULL)
return;
reverseRecursive(&rest);
node->next->next = node;
node->next = NULL;
*p = rest;
}
int main()
{
NODE *head = NULL;
append(&head,1);
insertAtFront(&head,2);
append(&head,3);
insertAfter(head->next,10);
printList(head);
printf("Length: %d \n",getLength(head));
printf("Length Recursive: %d \n", getLengthRecursive(head));
//deleteElement(&head,1);
printList(head);
//deleteAtPosition(&head,1);
printList(head);
printf("Length: %d \n",getLength(head));
printf("Length Recursive: %d \n", getLengthRecursive(head));
swapNodes(&head,2,1);
printList(head);
reverse(&head);
printList(head);
reverseRecursive(&head);
printList(head);
return 0;
}
My current directory contents looks like this
I have compiled the file using the following commands
gcc -I src/lib/ plinkedlist.c -L build/src/lib/ -lnvmemul
sudo scripts/setupdev.sh load
scripts/runenv.sh ./a.out
I get the correct program output but in the statistics I get 0 NVM accesses, even though this is untrue.
Statistics Output:
===== STATISTICS (Thu Nov 23 22:22:17 2017) =====
PID: 18718
Initialization duration: 2136458 usec
Running threads: 0
Terminated threads: 1
== Running threads ==
== Terminated threads ==
Thread id [18718]
: cpu id: 0
: spawn timestamp: 632629839714
: termination timestamp: 632629839811
: execution time: 97 usecs
: stall cycles: 0
: NVM accesses: 0
: latency calculation overhead cycles: 0
: injected delay cycles: 0
: injected delay in usec: 0
: longest epoch duration: 0 usec
: shortest epoch duration: 0 usec
: average epoch duration: 0 usec
: number of epochs: 0
: epochs which didn't reach min duration: 0
: static epochs requested: 0
Is there any reason/mistake I'm making?
Hello, I would like to ask a question, my server's cpu model is xeon E5-2620 v4 @2.10GHz, in the implementation of runenv.sh script prompt [16811] ERROR: No supported processor found. I want to determine if this processor meets the requirements.
Hello,
during the experiments with pure PM mode,
I found that the number of NVM accesses are very different in each trial as followings.
I only changed the latency of read and write in the nvmemul.ini
Are there any other configurations should I do to get correct emulation results?
The program uses malloc() and free(), and I run the script after loading nvmemul module.
scripts/runenv.sh prog.exe args
following is CPU information
ERROR: ld.so: object 'scripts/../build/src/lib/libnvmemul.so' from LD_PRELOAD cannot be preloaded (cannot open shared object file): ignored.
/usr/bin/scala: line 19: cd: /usr/share/scala/bin
Hi, I am getting the above error when i try to run scala programs. There are no issues when I try to run java and C applications.
This is the code which I am trying to run,
object ForLoop {
def main(args: Array[String]) {
var a = 0;
for( a <- 1 to 100){
println( "Value of a: " + a );
}
}
}
The above code works as usual with scala but, when ran with quartz it returns an error. The following is the command which I gave to run this code,
$ scripts/runenv.sh scala ForLoop
Note:
Hi,
I met the following error while compiling the Quartz code:
[root@localhost build]# make
[ 8%] Built target cpu
[ 82%] Built target nvmemul
[ 86%] Device]
make[5]: *** No rule to make target `/home/sbl/Quartz/quartz-master/build/src/dev/pmc.o, needed by /home/sbl/Quartz/quartz-master/build/src/dev/nvmemul.o. Stop.
make[4]: *** [module/home/sbl/Quartz/quartz-master/build/src/dev] Error 2
gmake[3]: *** [all] Error 2
make[2]: *** [src/dev/nvmemul.ko] Error 2
make[1]: *** [src/dev/CMakeFiles/dev_build.dir/all] Error 2
make: *** [all] Error 2
The environment I use is 2Socket Xeon5600/CentOS-7/Linux4.10/gcc-4.8.5.
I have installed all the required packages in the README.md, and compile the code in the following steps:
mkdir build
cd build
cmake ..
make
and the aforementioned error occurs...
Any suggestions?
Thank you very much.
Hi,
I met the following error while compiling the Quartz code:
when I use make clean all following your step , I got a problem.
[ 69%] Building C object src/lib/CMakeFiles/nvmemul.dir/stat.c.o
/home/ZHduan/quartz/src/lib/stat.c:19:20: fatal error: utlist.h: No such file or directory
#include "utlist.h"
^
compilation terminated.
make[2]: *** [src/lib/CMakeFiles/nvmemul.dir/stat.c.o] Error 1
make[1]: *** [src/lib/CMakeFiles/nvmemul.dir/all] Error 2
make: *** [all] Error 2
You said No specific Linux distribution or kernel version is required. So what's wrong ?
The environment I use is Intel(R) Xeon(R) CPU E5-2650 v3 @ 2.30GHz CentOS - linux 3.11.0 gcc version 4.8.5
I'm running a opencv program with quartz, this program is to read a lots of videos from a dataset and get some frames from video. But with for loop going on, the videocapture object's release function takes more and more time. At the beginning, release() takes a few milliseconds, then takes hundreds of milliseconds, finally the program need to wait for release() for seconds.
Here is my program:
#include <fstream>
#include <iostream>
#include <string>
#include <cstdio>
#include <random>
#include <algorithm>
#include <opencv2/core/core.hpp>
#include <opencv2/core/version.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/highgui/highgui_c.h>
#include <opencv2/imgproc/imgproc.hpp>
#include <opencv2/opencv.hpp>
#include <sys/time.h>
#include "/home/liupai/hme-workspace/hme-opencv-test/quartz/src/lib/pmalloc.h"
using namespace std;
void ImageChannelToBuffer(const cv::Mat* img, char* buffer, int c)
{
int idx = 0;
for (int h = 0; h < img->rows; ++h) {
for (int w = 0; w < img->cols; ++w) {
buffer[idx++] = img->at<cv::Vec3b>(h, w)[c];
}
}
}
int data_size = 0;
int read_video_to_volume_datum(const char* filename, const int start_frm,
const int label, const int length, const int height, const int width,
const int sampling_rate, char** datum)
{
cv::VideoCapture cap;
cv::Mat img, img_origin;
int offset = 0;
int channel_size = 0;
int image_size = 0;
int use_start_frm = start_frm;
cout << "\n#######Start!!!! cap.open file" << endl;
cap.open(filename);
if (!cap.isOpened()) {
cout << "Cannot open " << filename << endl;
return false;
}
int num_of_frames = cap.get(CV_CAP_PROP_FRAME_COUNT) + 1;
if (num_of_frames < length * sampling_rate) {
cerr << filename << " does not have enough frames; having "
<< num_of_frames << endl;
return false;
}
offset = 0;
if (use_start_frm < 0) {
cerr << "start frame must be greater or equal to 0" << endl;
}
int end_frm = use_start_frm + length * sampling_rate - 1;
if (end_frm > num_of_frames) {
cerr << "end frame must be less or equal to num of frames, "
<< "filename: " << filename << endl;
}
if (use_start_frm) {
cout << "\033[31m"
<< "use_start_frm: " << use_start_frm
<< ", end_frame: " << end_frm
<< ", num_of_frames: " << num_of_frames
<< ", filename: " << filename
<< "\033[0m" << endl;
cap.set(CV_CAP_PROP_POS_FRAMES, use_start_frm - 1);
}
for (int i = use_start_frm; i <= end_frm; i += sampling_rate) {
if (sampling_rate > 1) {
cap.set(CV_CAP_PROP_POS_FRAMES, i);
}
if (height > 0 && width > 0) {
cap.read(img_origin);
if (!img_origin.data) {
cerr << filename << " has no data at frame " << i << endl;
if (*datum != NULL) {
pfree(datum, data_size);
}
cap.release();
return false;
}
cout << "resize img_origin" << endl;
cv::resize(img_origin, img, cv::Size(width, height));
} else {
cap.read(img);
}
if (!img.data) {
cerr << "Could not open or find file " << filename << endl;
if (*datum != NULL) {
pfree(datum, data_size);
}
cap.release();
return false;
}
if (i == use_start_frm) {
image_size = img.rows * img.cols;
channel_size = image_size * length;
data_size = channel_size * 3;
*datum = (char*)pmalloc(data_size*sizeof(char));
}
for (int c = 0; c < 3; c++) {
ImageChannelToBuffer(&img, *datum + c * channel_size + offset, c);
}
cout << "offset = " << offset << endl;
offset += image_size;
img_origin.release();
}
cout << "\033[32mstart cap.release()\033[0m" << endl;
struct timeval tv_begin, tv_end;
gettimeofday(&tv_begin, NULL);
cap.release();
gettimeofday(&tv_end, NULL);
cout << "cap.release(): " << 1000.0*(tv_end.tv_sec - tv_begin.tv_sec)
+ (tv_end.tv_usec - tv_begin.tv_usec)/1000.0 << " ms." << endl;
cout << "\033[32mend cap.release()\033[0m" << endl;
return true;
}
void shuffle_clips(vector<int>& shuffle_index){
std::random_device rd;
std::mt19937 g(rd());
std::shuffle(shuffle_index.begin(), shuffle_index.end(), g);
}
int main()
{
const string root_folder = "/home/liupai/hme-workspace/train-data/UCF-101/";
const string list_file = "/home/liupai/hme-workspace/workspace/C3D/C3D-nvram/examples/c3d_ucf101_finetuning/train_02.lst";
cout << "opening file: " << list_file << endl;
std::ifstream list(list_file.c_str());
vector<string> file_list_;
vector<int> start_frm_list_;
vector<int> label_list_;
vector<int> shuffle_index_;
int count = 0;
string filename;
int start_frm, label;
while (list >> filename >> start_frm >> label) {
file_list_.push_back(filename);
start_frm_list_.push_back(start_frm);
label_list_.push_back(label);
shuffle_index_.push_back(count);
count++;
}
shuffle_clips(shuffle_index_);
const int dataset_size = shuffle_index_.size();
const int batch_size = 30;
const int new_length = 8;
const int new_height = 128;
const int new_width = 171;
const int sampling_rate = 1;
char* datum = NULL;
int lines_id_ = 0;
const int max_iter = 20000;
for (int iter = 0; iter < max_iter; ++iter) {
for (int item_id = 0; item_id < batch_size; ++item_id) {
cout << "------> iter: " << iter << endl;
bool read_status;
int id = shuffle_index_[lines_id_];
read_status = read_video_to_volume_datum((root_folder + file_list_[id]).c_str(), start_frm_list_[id],
label_list_[id], new_length, new_height, new_width, sampling_rate, &datum);
if (read_status) {
pfree(datum, data_size);
}
lines_id_++;
if (lines_id_ >= dataset_size) {
// We have reached the end. Restart from the first.
cout << "Restarting data prefetching from start." << endl;
lines_id_ = 0;
}
}
}
cout << "$$$$$$$$$$$$$$ read file finish!!!!!!!!!!!!" << endl;
}
Here is a output:
# At the beginning
------> iter: 0
#######Start!!!! cap.open file
use_start_frm: 65, end_frame: 72, num_of_frames: 179, filename: /home/liupai/hme-workspace/train-data/UCF-101/PlayingViolin/v_PlayingViolin_g24_c02.avi
...
start cap.release()
cap.release(): 3.018 ms.
end cap.release()
------> iter: 0
#######Start!!!! cap.open file
use_start_frm: 1, end_frame: 8, num_of_frames: 202, filename: /home/liupai/hme-workspace/train-data/UCF-101/TrampolineJumping/v_TrampolineJumping_g18_c01.avi
...
start cap.release()
cap.release(): 3.062 ms.
end cap.release()
------> iter: 0
#######Start!!!! cap.open file
use_start_frm: 81, end_frame: 88, num_of_frames: 296, filename: /home/liupai/hme-workspace/train-data/UCF-101/PommelHorse/v_PommelHorse_g12_c03.avi
...
start cap.release()
cap.release(): 2.453 ms.
end cap.release()
------> iter: 0
#######Start!!!! cap.open file
use_start_frm: 49, end_frame: 56, num_of_frames: 272, filename: /home/liupai/hme-workspace/train-data/UCF-101/StillRings/v_StillRings_g22_c04.avi
...
start cap.release()
cap.release(): 2.146 ms.
end cap.release()
------> iter: 0
#######Start!!!! cap.open file
use_start_frm: 225, end_frame: 232, num_of_frames: 252, filename: /home/liupai/hme-workspace/train-data/UCF-101/HeadMassage/v_HeadMassage_g08_c03.avi
...
start cap.release()
cap.release(): 2.136 ms.
end cap.release()
------> iter: 0
#######Start!!!! cap.open file
use_start_frm: 49, end_frame: 56, num_of_frames: 106, filename: /home/liupai/hme-workspace/train-data/UCF-101/Bowling/v_Bowling_g19_c07.avi
...
start cap.release()
cap.release(): 3.315 ms.
end cap.release()
# After about 400 iterations
------> iter: 437
#######Start!!!! cap.open file
use_start_frm: 113, end_frame: 120, num_of_frames: 376, filename: /home/liupai/hme-workspace/train-data/UCF-101/Kayaking/v_Kayaking_g13_c04.avi
...
start cap.release()
cap.release(): 301.021 ms.
end cap.release()
------> iter: 437
#######Start!!!! cap.open file
use_start_frm: 49, end_frame: 56, num_of_frames: 141, filename: /home/liupai/hme-workspace/train-data/UCF-101/ApplyLipstick/v_ApplyLipstick_g20_c04.avi
...
start cap.release()
cap.release(): 301.74 ms.
end cap.release()
------> iter: 437
#######Start!!!! cap.open file
use_start_frm: 209, end_frame: 216, num_of_frames: 230, filename: /home/liupai/hme-workspace/train-data/UCF-101/BlowDryHair/v_BlowDryHair_g18_c03.avi
...
start cap.release()
cap.release(): 302.311 ms.
end cap.release()
------> iter: 438
#######Start!!!! cap.open file
use_start_frm: 177, end_frame: 184, num_of_frames: 307, filename: /home/liupai/hme-workspace/train-data/UCF-101/BoxingPunchingBag/v_BoxingPunchingBag_g08_c01.avi
....
start cap.release()
cap.release(): 351.546 ms.
end cap.release()
------> iter: 438
#######Start!!!! cap.open file
use_start_frm: 49, end_frame: 56, num_of_frames: 113, filename: /home/liupai/hme-workspace/train-data/UCF-101/FrontCrawl/v_FrontCrawl_g21_c06.avi
...
start cap.release()
cap.release(): 292.598 ms.
end cap.release()
------> iter: 438
I just wanted to know if anyone were able to run Scala programs on Quartz. If so, what were the changes which you made to be able to run it?
if you meet the issue:
tee: /sys/bus/event_source/devices/cpu/rdpmc: No file or folder of this type" ?
I think the problem might be:
your Quartz is built on Virtual Machine. Refer to https://stackoverflow.com/questions/19763070/ubuntu-12-10-perf-stat-not-supported-cycles/44253130#44253130, I guess RDPMC is still unavailable on the most virtual machine (at least I tried Ubuntu 14.04, 16.04 and 18.08 and centos 7.0 with Linux kernel 4.4 and 4.11 respectively.)
Still exploring other solution to support virtual machine within Quartz.
When quartz come to DRAM+NVM mode , it simulate the nvm on one (remote) node and inject the latency (maybe read latency?).
So can I think that the access memory behavior in remote node's dram is NVM access behavior?
If it is , can I use numactl mbind on node to run the app in nvm? What should I change in nvmemul.ini?
Since the capacity of PCM can be larger than DRAM, can we set the NVM to be larger than the DRAM when we do the simulation?
Thank you very much.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.