alibaba-edu / high-precision-congestion-control Goto Github PK

Makefile 0.05% Python 68.90% C++ 30.13% Shell 0.01% MATLAB 0.06% Click 0.04% C 0.09% Perl 0.70% Gnuplot 0.02% Batchfile 0.01% Objective-C 0.01%

high-precision-congestion-control's People

Contributors

Stargazers

Watchers

Forkers

p79n6a preeyshah kitmanzheng fnlab738 liulalala mengqingkai yuntiantian sakshamagarwals auramixdope zhangsonglin gsyang33 youngadvance azure-lio yi-ran saeed redrattus syysumuro qwqzhang bukatea 5l1v3r1 fatyue benjiemi inhocho89 liyuliang001 ghghliu wsky51 moweiwang snyjm-18 hiok2000 hujtice yomarisgong why36 s19960826 qizhouzhang garywangcn ytxing fno2010 tanlijuan xinchihan0615 serhatarslan-hub yelianjin shellqiqi leo-enrique-wu cjnt zqz-net miaomao1989 egoinside dartingmelody googol-lab mrgaoxx chenwei1999 wangkai668 b1gby sandy533533 kwzhao spongeann phz5510489 xuaikun jackmygreat real-lhj huangjiawe huihuiiris wdjs aghalayini cookie0402 imjackky zxzx9898 pwang7 patrickkasper computernetworksystem minasszhang vic0428 milzero dailang-heihei cxyzhao link-xz caopeirui forestleem terry1504 meditator-hkx hhlrank kimimaroo uniform641 liaoyunkun dillionapple zhibo-yan distributedsystemresearch yuzmoon xinstriving roytien zhsupperviol wamos tiffanyxqz lynnucas leewxgit iq-scm liyutingxxn yjsunn snow276 funa01

high-precision-congestion-control's Issues

compile error (trace_reader)

Hi yuliang, I have compiled the ns-3 simulator, but when I compiled the trace_reader, I encountered the problem.

The version of gcc and g++ is 5.3.0

Calculate instantaneous (real-time) throughput from trace

I'm capturing trace files from all nodes, but calculating throughput of a flow does not seem to be easy since traces include packet level events. I have read the HPCC paper and it seems this is already done, so I was wondering is there any script available for it? I couldn't find anything in analysis folder. If not, what is the process to calculate flow throughput from the trace file so that I can script it?

Thanks in advance for your help!

How to understand the 'ali_32host_10rack.txt' file

I found a topology file named "ali_32host_10rack.txt" in the "mix" directory. How should I interpret this topology? It appears to be a two-tier Clos architecture consisting of 28 switches. My confusion lies in whether there are 320 servers or if it follows the naming convention with 32 servers, each hosting 10 NPUs.

error: ambiguous overload for ‘operator<<’ (

Getting error while running the simulation as per the instructions given in the README file.
I am using ns-3.30.1 installed in my Ubuntu 20.04 LTS system.

Navigating to the simulation folder, I am using ./waf --run 'scratch/third mix/config.txt' command to run the simulation as per the instructions.

Feedback on how to resolve this would be very helpful.

PFB the terminal output:

[ 695/1433] cxx: utils/print-introspected-doxygen.cc -> build/utils/print-introspected-doxygen.cc.4.o
In file included from ../utils/print-introspected-doxygen.cc:9:
../utils/print-introspected-doxygen.cc: In member function ‘void StaticInformation::DoGather(ns3::TypeId)’:
./ns3/log.h:322:44: error: ambiguous overload for ‘operator<<’ (operand types are ‘ns3::ParameterLogger’ and ‘StaticInformation*’)
  322 |           ns3::ParameterLogger (std::clog) << parameters;      \
../utils/print-introspected-doxygen.cc:269:3: note: in expansion of macro ‘NS_LOG_FUNCTION’
  269 |   NS_LOG_FUNCTION (this);
      |   ^~~~~~~~~~~~~~~
./ns3/log.h:409:20: note: candidate: ‘ns3::ParameterLogger& ns3::ParameterLogger::operator<<(T) [with T = StaticInformation*]’
  409 |   ParameterLogger& operator<< (T param)
      |                    ^~~~~~~~
In file included from /usr/include/c++/9/iostream:39,
                 from ../utils/print-introspected-doxygen.cc:1:
/usr/include/c++/9/ostream:691:5: note: candidate: ‘typename std::enable_if<std::__and_<std::__not_<std::is_lvalue_reference<_Tp> >, std::__is_convertible_to_basic_ostream<_Ostream>, std::__is_insertable<typename std::__is_convertible_to_basic_ostream<_Tp>::__ostream_type, const _Tp&, void> >::value, typename std::__is_convertible_to_basic_ostream<_Tp>::__ostream_type>::type std::operator<<(_Ostream&&, const _Tp&) [with _Ostream = ns3::ParameterLogger; _Tp = StaticInformation*; typename std::enable_if<std::__and_<std::__not_<std::is_lvalue_reference<_Tp> >, std::__is_convertible_to_basic_ostream<_Ostream>, std::__is_insertable<typename std::__is_convertible_to_basic_ostream<_Tp>::__ostream_type, const _Tp&, void> >::value, typename std::__is_convertible_to_basic_ostream<_Tp>::__ostream_type>::type = std::basic_ostream<char>&]’
  691 |     operator<<(_Ostream&& __os, const _Tp& __x)
      |     ^~~~~~~~
Waf: Leaving directory `/home/anindya/Anindya/2020-06-29_HPCC/High-Precision-Congestion-Control/simulation/build'
Build failed
 -> task in 'print-introspected-doxygen' failed (exit status 1): 
	{task 140561283524880: cxx print-introspected-doxygen.cc -> print-introspected-doxygen.cc.4.o}
['/usr/bin/g++', '-O0', '-ggdb', '-g3', '-std=gnu++11', '-Wno-error=deprecated-declarations', '-fstrict-aliasing', '-Wstrict-aliasing', '-pthread', '-pthread', '-I.', '-I..', '-I/usr/include/gtk-2.0', '-I/usr/lib/x86_64-linux-gnu/gtk-2.0/include', '-I/usr/include/pango-1.0', '-I/usr/include/atk-1.0', '-I/usr/include/gdk-pixbuf-2.0', '-I/usr/include/libmount', '-I/usr/include/blkid', '-I/usr/include/fribidi', '-I/usr/include/cairo', '-I/usr/include/pixman-1', '-I/usr/include/harfbuzz', '-I/usr/include/glib-2.0', '-I/usr/lib/x86_64-linux-gnu/glib-2.0/include', '-I/usr/include/uuid', '-I/usr/include/freetype2', '-I/usr/include/libpng16', '-I/usr/include/libxml2', '-DNS3_ASSERT_ENABLE', '-DNS3_LOG_ENABLE', '-DHAVE_PACKET_H=1', '-DHAVE_SQLITE3=1', '-DHAVE_IF_TUN_H=1', '-DHAVE_GSL=1', '../utils/print-introspected-doxygen.cc', '-c', '-o', 'utils/print-introspected-doxygen.cc.4.o']

Change Topology File Error

When I set the topology file to mix/topology.txt in the config file mix/config.txt (line 6), the simulation can run successfully.
However, if I use mix/fat.txt, the simulation cannot run, and the error message shows below:



assert failed. cond="rate2kmin.find(rate) != rate2kmin.end()", msg="must set kmin for each link speed", file=../scratch/third.cc, line=830
terminate called without an active exception

Program received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
51      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.

Compilation failing with error: passing ‘const ns3::CommandLine’ as ‘this’ argument discards qualifiers

Hello,
I'm attempting to compile HPCC on an Ubuntu 16.04 LTS machine, and I followed the instructions of #2 in order to get passed the broadcom-node.h error, but after I do that, I'm getting a new error. I'm running gcc/g++ 5.4.0 btw.


[ 756/1761] cxx: src/wimax/test/wimax-fragmentation-test.cc -> build/debug/src/wimax/test/wimax-fragmentation-test.cc.3.o
../../src/core/test/command-line-test-suite.cc: In member function ‘void ns3::CommandLineTestCaseBase::Parse(const ns3::CommandLine&, int, ...)’:
../../src/core/test/command-line-test-suite.cc:64:24: error: passing ‘const ns3::CommandLine’ as ‘this’ argument discards qualifiers [-fpermissive]
   cmd.Parse (argc, args);
                        ^
In file included from ../../src/core/test/command-line-test-suite.cc:20:0:
./ns3/command-line.h:224:8: note:   in call to ‘void ns3::CommandLine::Parse(int, char**)’
   void Parse (int argc, char *argv[]);
        ^
Waf: Leaving directory `/home/jhl56/High-Precision-Congestion-Control/build/debug'
Build failed
 -> task in 'ns3-core-test' failed (exit status 1): 
	{task 140191403070416: cxx command-line-test-suite.cc -> command-line-test-suite.cc.3.o}
['/usr/bin/g++', '-O0', '-ggdb', '-g3', '-std=gnu++11', '-Wno-error=deprecated-declarations', '-fstrict-aliasing', '-Wstrict-aliasing', '-I../../src/core', '-fPIC', '-pthread', '-I.', '-I../..', '-DNS3_ASSERT_ENABLE', '-DNS3_LOG_ENABLE', '-DHAVE_PACKET_H=1', '-DHAVE_IF_TUN_H=1', '-DNS_TEST_SOURCEDIR="src/core/test"', '../../src/core/test/command-line-test-suite.cc', '-c', '-o', 'src/core/test/command-line-test-suite.cc.3.o']

Any help would be greatly appreciated.
Thanks

Bug in qbb-net-device.cc

Hi yuliang, I found a bug in DequeueAndTransmit() method of qbb-net-device.cc file. In this function, may call DequeueAndTransmit() again (because switchnotifydequeue->call->sendPfc->call DequeAndTransmit()), this behavior will abort the program.

Does HPCC require per-packet ACK?

Hi, I have a small question. HPCC needs ACK to piggyback INT information, and I also notice that the L2_ACK_INTERVAL is set to 1 in the config.txt. Does it mean that HPCC NICs need support per-packet ACK? Or it can also tolerate if ACK is not per-packet?
Thank you.

Version

My system version may be inconsistent with the system version required by hpcc's ns3 code, which makes it very troublesome to solve many software version dependencies, so I want to solve this problem through docker. Can you tell me the ubuntu version, ns3 version and corresponding dependencies used in the development process?

About calculation formula for throughput

为了观察DCQCN算法的性能，我在simulation目录下执行了以下命令：
python run.py --topo topology --trace flow --bw 25 --cc dcqcn --enable_trace 1

其中topology.txt的内容为：
7 1 6
0
0 1 100Gbps 0.001ms 0
0 2 100Gbps 0.001ms 0
0 3 100Gbps 0.001ms 0
0 4 100Gbps 0.001ms 0
0 5 100Gbps 0.001ms 0
0 6 100Gbps 0.001ms 0

flow.txt的内容为：
5
2 1 3 100 200000000 2
3 1 3 100 200000000 2
4 1 3 100 150000000 2
5 1 3 100 5000000 2
6 1 3 100 5000000 2

仿真结束后在mix目录下生成了mix_topology_flow_dcqcn.tr，之后使用trace_reader对该trace文件进行了解析。

为了知道不同flow的吞吐量，我采用的计算方式是：针对某一条flow，使用该flow对应的sender收到的相邻ACK报文的size差除以时间差。例如针对sender#3而言，以下片段展示了sender#3上收到的相邻ACK报文示例：
...
2052481446 n:3 1:0 0 Recv ecn:0 0b000101 0b000301 100 10000 A 0x00 3 178993000 0 60
...
2052481697 n:3 1:0 0 Recv ecn:0 0b000101 0b000301 100 10000 A 0x00 3 178994000 0 60
...
那么这一段时间的吞吐量为 (178994000B - 178993000B) * 8 / (2052481697ns - 2052481446ns) = 31.87Gbps。
【问题1】为什么这种方法算出的吞吐量会比仿真命令中设置的网卡带宽(25Gbps)还大？

按照以上方式计算所有5条flow的吞吐量，如下所示：

【问题2】为什么吞吐量的变化情况不符合“先线速，然后迅速降低并趋于公平“的趋势？
【问题3】论文中的吞吐量计算公式是什么呢？

Regenerate fig 11c from HPCC paper

Hi, I am trying to recreate fig 11c (fat-tree topology with 50% background fbHadoop traffic). I use run.py to generate config files for each scheme. I am able to get expected results for HPCC and DCTCP, but for DCQCN and TIMELY my values are much more inflated than in the paper. I was wondering if there is any optimization done for DCQCN and TIMELY for the experiments in paper that I am missing out in my experiments, like some particular parameter value?

Thanks

the initial value about rate on first cnp

The rate on receiving the first cnp is set to 1.0*current rate here, however the rate on first cnp is set to 0 for Mellanox CX4. Could you please explain your consideration of the change?

param setting for DCTCP congestion control?

Hi yuliang, when I run dctcp CC for rdma clients , should the has_win in config.txt set to 1?

error when build under simulation for ns3

Trying to build ns3 simulator with command --

./waf --run 'scratch/third mix/config.txt'

Got error as screen shot. Any idea what might cause this? It's on ubuntu.

pfc.txt

Hi yuliang. I have try three very large flow have same dst, but the pfc.txt is empty? cc is DCQCN? Is it normal?

traffic gen for incast

Hi developers:

Could you point out how to use traffic_gen to add incast traffic? I guess traffic_gen in the current repo does not include artifacts to generate incast traffic.

master，How to count the number of PFC pause frames of a node?

In the pfc.txt file, there is only the pause time and recovery time of the node, and there is no way to count the number of PFC pause frames.

The method I want is to use the TR file to analyze the number of PFC pause frames received by a node, but I don't know what to do? What can you do?

Thank you, master

I want to draw a picture like this.

the code part

I am a bit curious about where is the code implementation of Algorithm1 describted in the paper? It seems not in RdmaClient.

Flow rate or window size

Dear all,

After reading the HPCC paper and the rdma-hw.cc code, I found that the window size is adjusted in the algorithm introduced in the paper. But in rdma-hw.cc, the flow rate is adjusted. Is this because the RTT is assumed to be close to baseRTT? So adjusting window size and flow rate will be the same.

Thank you!

@liyuliang001

Tests Fail when configuring project with "--enable-tests"

Tests are currently failing when the project is configured with the following:
gcc/g++ version is 4.9.2 and python is version 2.7.18

The command used to configure is:
CC='path/to/4.9.2/gcc' CXX='/path/to/4.9.2/g++' ./waf configure --enable-examples --enable-tests

The tests below are failing, some additional examples are crashing.

FAIL: TestSuite drop-tail-queue
FAIL: TestSuite udp
FAIL: TestSuite ipv6-fragmentation
FAIL: TestSuite udp-client-server
FAIL: TestSuite routing-olsr-regression
CRASH: TestSuite time
FAIL: TestSuite devices-mesh-flame-regression
FAIL: TestSuite devices-mesh-dot11s-regression
FAIL: TestSuite epc-s1u-downlink
FAIL: TestSuite epc-s1u-uplink
FAIL: TestSuite animation-interface
FAIL: TestSuite ns3-tcp-state
FAIL: TestSuite lte-epc-e2e-data
FAIL: TestSuite ns3-tcp-no-delay
FAIL: TestSuite routing-aodv-regression
FAIL: TestSuite ns3-tcp-socket

Is this a concern? Does the RDMA implementation break compatibility with other protocols?

Question on PACKET_PAYLOAD_SIZE in config

Hi all.

According to config_doc.txt the PACKET_PAYLOAD_SIZE is in KB.

When I looked at CalculateRoute in scratch/third.cc, packet_payload_size is multiplied by 8, which seems to suggest that packet_payload_size is in bytes.

Could you please confirm if this is the case? Thank you!

Might be bug in creat more than one link between two nodes

If I want to creat 4 links between 1 leaf switch and 1 spine switch, what should I write in the topology.txt? If I write like:
0 8 100Gbps 0.001ms 0
0 8 100Gbps 0.001ms 0
0 8 100Gbps 0.001ms 0
0 8 100Gbps 0.001ms 0
The flows will only be transported on the forth link (if these links have different bandwith, problem still happened), which might means between two nodes , the simulator will only counting the final link in the topology.txt, or it isn't the right way to creat multi links between two nodes (how to make it?).

The parameters for DCQCN

Hi, Yuliang:

I have a problem with DCQCN's parameters. From the configuration in run.py and the description in your paper:

Kmin = 100KB × Bw/25Gbps and Kmax = 400KB × Bw/ 25Gbps according to our experiences (no vendor suggestion available). For DCTCP, we set Kmin = Kmax = 30KB × Bw/10Gbps according to [8].

For 100Gbps links, we set Kmin=400KB, Kmax=1600KB. But in third.cc, there is headroom as only 3*BDP, smaller than Kmax?

uint32_t headroom = rate * delay / 8 / 1000000000 * 3;
std::cout << "switch head room size: " << headroom << std::endl;
sw->m_mmu->ConfigHdrm(j, headroom);

Is it too large for DCQCN Kmin in your experiment? I see in DCQCN paper, the parameters are Kmin=4KB, Kmax=200KB; while in DCTCP paper, the parameter is K=60KB. I think lower K will make the average queue shallower and reduce the FCT for flows whose size is smaller than BDP.

Why do you set the K value much larger? Is there anything I missed?

About the setting of ecnbits

我阅读了rdma-hw中DCQCN的算法，但是并没有找关于ecnbits的设置，只在receiveudp中找到了GetIpv4EcnBits。后续则是根据ecnbits生成cnp.

int RdmaHw::ReceiveUdp(Ptr<Packet> p, CustomHeader &ch){
	uint8_t ecnbits = ch.GetIpv4EcnBits();
	uint32_t payload_size = p->GetSize() - ch.GetSerializedSize();

	// TODO find corresponding rx queue pair
	Ptr<RdmaRxQueuePair> rxQp = GetRxQp(ch.dip, ch.sip, ch.udp.dport, ch.udp.sport, ch.udp.pg, true);
	if (ecnbits != 0){
		rxQp->m_ecn_source.ecnbits |= ecnbits;
		rxQp->m_ecn_source.qfb++;
	}

而GetIpv4EcnBits也只是取tos的后2位
uint8_t CustomHeader::GetIpv4EcnBits (void) const{ return m_tos & 0x3; }

当然我也粗略的查看了red-queue，也是与dcqcn无关的。
所以我想知道ecnbits的设置是在哪里实现的

Large Topos?

Have you tried extremely large topos?
I found that Ptr rdmaHw = CreateObject() or sw->SetAttribute("CcMode", UintegerValue(cc_mode)) in third.cc would fail if the topo is more than 6K nodes.

Local Ack Timeout?

Hi, yuliang.
I find all the loss packets are recoverd by the NAK, the implementation does not include the local ack timeout, Would those retransmission triggered by timeout affect the tail latency?

Some issues after closing ECN

After turning off ECN, the fct of small traffic will be very long, while the fct of large traffic will be very short. Why is this?

Only switch-node 320 generated in trace (fat-tree)

Hi all,

I have ran the simulation and generated the traces using the trace reader according to the README. However, I noticed that for the fat-tree topology, there is only one switch-node (node 320) that is recorded by the trace reader. Can I check if this is a configuration issue on my end?

Thanks!

A question about custom-header.cc

I have found this?

line 177:
udp.ih.Serialize(i) -> ack.ih.Serialize(i) (is it correct?)

DCQCN's RP algorithm implementation in rdma-hw.cc

Hi developers:

I have a problem with DCQCN's RP algorithm implementation in rdma-hw.cc. From the DCQCN paper，the rate reduction is before updating alpha. But in rdma-hw.cc, the rate decrease is after alpha update:

     ```
   // schedule alpha update
            ScheduleUpdateAlphaMlx(q);
            // schedule rate decrease
            ScheduleDecreaseRateMlx(q, 1); // add 1 ns to make sure rate decrease is after alpha update


Could you point out the reason for that?

about IRN

HI yuliang I have read your paper HPCC, it seems that this simulator implements IRN, I want to know how should I adjust params to enable IRN?

Compile failed

Dr.miao,
I have some problem compiling the source code of HPCC. The error information is as below,

My gcc version is 5.4.0.
Looking forward to your reply! Thanks a lot!

Hi,Mr. Miao: I generated a .tr file using the EnableTracing () function file of the QbbNetDevice class, but I garbled it after opening it with NotePad ++, and I tried many ways and there was no way to solve it. How do I open it to prevent garbled?

Packet loss occurs after PFC is triggered

Hi, yuliang.
I have runned a 1000-scale incast simulation based on your code, where each flow consists of one BDP packets (ie, 8 MTU packets).
PFC is triggered. Unfortunately, packet loss also occurs, which means that PFC does not take much effects.
Can you help me fixing this bug on PFC? Or give me some advice? Thanks a lot!!!

The topology I used is a two-tier topology (the same topology used in Homa), which consists 9 ToR switches and 4 core switches, connected via 40Gbps links. Each ToR switch connect with 16 servers via a 10Gbps links.

make trace_reader unable to read tr file！！！

大师，感谢指点！我怕用英文标识不清楚，我就用中文了！

make trace_reader可以正常运行，运行后出来一个trace_reader图标的文件。然后我运行 ./trace_reader mix_topology_flow_dcqcn.tr
没有任何反应，我想直接打开mix_topology_flow_dcqcn.tr文件还是乱码。
请问一下大佬，可能是哪里的问题的？您有遇见过这样的问题吗？感谢指点！
后面我想把我遇到的问题+解决办法，等等都分享到issue下面。哈哈哈

Compile problem

When compiling this project, some problems occurs. (ubuntu+python 3.7.3)

The erro information is as follows:
Waf: The wscript in '/home/hpcc' is unreadable
Traceback (most recent call last):
File "/home/hpcc/.waf3-1.7.11-edc6ccb516c5e3f9b892efc9f53a610f/waflib/Scripting.py", line 87, in waf_entry_point
set_main_module(Context.run_dir+os.sep+Context.WSCRIPT_FILE)
File "/home/hpcc/.waf3-1.7.11-edc6ccb516c5e3f9b892efc9f53a610f/waflib/Scripting.py", line 112, in set_main_module
Context.g_module=Context.load_module(file_path)
File "/home/hpcc/.waf3-1.7.11-edc6ccb516c5e3f9b892efc9f53a610f/waflib/Context.py", line 281, in load_module
exec(compile(code,path,'exec'),module.dict)
File "/home/hpcc/wscript", line 105
print name.ljust(25),
^
SyntaxError: invalid syntax

Does this have anything to do with the python version?

Can HPCC not run with 40g or 10G network bandwidth?

Master, I found that HPCC can not run under 10g or 40g network bandwidth. I'm sure in topology The bandwidth under TXT file is 40g. The following error occurred. Where do you think you want to modify the code?

fct and pfc file is empty

老师你好，不好意思，用了中文向您提问，因为学生英文水平有限，怕不能用英文讲清楚我的问题。
我的问题如下
在复现您的实验也就是论文中的图11（a）和（c）。我用生成的30%负载的fbhdp流量和fat tree网络架构和dcqcn算法，仿真后得到的fct和pfc文件都是空的。不知道是哪里出错了，还请老师能解答，非常感谢。

make trace_reader fail ，please give me some advice

您好，没有修改makefile，错误是一样的，后来觉得是不是版本问题，修改了Makefile，发现报相同的错误，无法生成trace_reader,老师请帮忙回复一下，十分感谢

Why are only N-1 ports on the switch-type node configured for ECN and headroom?

请问在main函数中配置交换机类型结点时，为什么只配置其上的N-1个端口呢？剩下的那个端口上的ECN和headroom在哪里设置呢？

When configuring a switch-type node in the main(), why are only N-1 ports on the node configured? Where can I set the ECN and headroom for the remaining port?

error: ‘TimeChecker’ in namespace ‘ns3’ does not name a type

In file included from src/uan/bindings/ns3module.cc:1:0:
src/uan/bindings/ns3module.h:1514:10: error: ‘TimeChecker’ in namespace ‘ns3’ does not name a type
ns3::TimeChecker *obj;
^
Waf: Leaving directory `/home/jan/High-Precision-Congestion-Control-master/simulation/build'
Build failed
-> task in 'ns3module_uan' failed (exit status 1):
{task 140439915395472: cxx ns3module.cc -> ns3module.cc.7.o}
['/usr/bin/g++', '-O0', '-ggdb', '-g3', '-std=gnu++11', '-Wno-error=deprecated-declarations', '-fstrict-aliasing', '-Wstrict-aliasing', '-fPIC', '-pthread', '-fno-strict-aliasing', '-fwrapv', '-fstack-protector-strong', '-fno-strict-aliasing', '-fwrapv', '-fstack-protector-strong', '-fno-strict-aliasing', '-fvisibility=hidden', '-Wno-array-bounds', '-pthread', '-pthread', '-fno-strict-aliasing', '-fwrapv', '-fstack-protector-strong', '-fno-strict-aliasing', '-I.', '-I..', '-Isrc/uan/bindings', '-I../src/uan/bindings', '-I/usr/include/python2.7', '-I/usr/include/x86_64-linux-gnu/python2.7', '-I/usr/include/gtk-2.0', '-I/usr/lib/x86_64-linux-gnu/gtk-2.0/include', '-I/usr/include/gio-unix-2.0', '-I/usr/include/cairo', '-I/usr/include/pango-1.0', '-I/usr/include/atk-1.0', '-I/usr/include/pixman-1', '-I/usr/include/libpng12', '-I/usr/include/gdk-pixbuf-2.0', '-I/usr/include/harfbuzz', '-I/usr/include/glib-2.0', '-I/usr/lib/x86_64-linux-gnu/glib-2.0/include', '-I/usr/include/freetype2', '-I/usr/include/libxml2', '-DNS3_ASSERT_ENABLE', '-DNS3_LOG_ENABLE', '-DHAVE_PACKET_H=1', '-DHAVE_SQLITE3=1', '-DHAVE_IF_TUN_H=1', '-DHAVE_GSL=1', '-DNS_DEPRECATED=', '-DNS3_DEPRECATED_H', '-DNDEBUG', '-D_FORTIFY_SOURCE=2', '-DNDEBUG', '-D_FORTIFY_SOURCE=2', '-DNDEBUG', '-D_FORTIFY_SOURCE=2', 'src/uan/bindings/ns3module.cc', '-c', '-o', 'src/uan/bindings/ns3module.cc.7.o']

运行examples的时候一直报这个错误

Observing packet drops at Ingerss on Switches

Hi, I am running an experiment with 8x8 leaf spine topology (no over-subscription) and using DRILL as a load balancer. I am using the run.py script to start the simulations. I observe packet drops at the ingress ports despite having a large headroom (3BDP). It appears that I receive more packets than BDP when I pause a port (seemingly more than 3BDP as per my headroom). Is this expected? Or is it something that needs to be resolved since it impacts the performance.

Can I run the simulator only with PFC and without any CC algorithm?

Hi yuliang, Can I run the the simulator only enable the PFC and without any CC algorithm?
In addition, How to disable the PFC?
thx

Regenerate fig 11a from HPCC paper

I'm trying to regenerate fig 11a , but some details are not clear.
It is written in the paper that we "either add incast traffic to 30% load traffic or run 50% load traffic. We generate the incast traffic by randomly selecting 60 senders and one receiver, each sending 500KB."
Firstly, I generate the 30% payload Facebook Hadoop traffic, there is about 100w flows. The paper doesn't say when the incast is added, so I casually added it to the 40wth line.But the result is not similar as paper's.

Two questions:
1.Does the timing of incast addition affect the results?
2.I found that 'run.py' provides the four kind of parameters of DCQCN algorithms. Which one is used in the simulation of paper?

Thanks for taking the time to answer the question!

A problem about AI or MI

Hi, Yuliang. I have a question about the maxStage parameters. From the public HPCC key results, I found

Besides, HPCC’s maxStage=0, which is the same as the simulation setting in our paper. The maxStage=5 in the HPCC paper is due to a typo for the simulation.

while in the paper, it is claimed that

maxStage controls a simple tradeoff between steady-state stability and the speed to reclaim free bandwidth.

From the theoretical model and ns3 simulation, I believe that maxStage=0 (which means that MI is directly used) is indeed better. But I notice that Google's Swift also used AI and removed the MI stage of Timely. I wonder if MI is too radical, causing some other problems, in the testbed or in practice. We have no testbed experience, but I guess that there would be many jitters and uncertainty for a large-scale testbed, which makes MI not suitable? Or MI is indeed better and it is not adopted just due to inertia?
Can you share your opinion or some experiences?

Thanks a lot
Best wishes

Trace reader not working as mentioned

I want to apply two conditions in trace reader. The readme says:
./trace_reader trace.tr sip=0x0b000101&dip=0x0b000201 will display only events with sip=0x0b000101 and dip=0x0b000201.

But this only shows sip=0x0b000101 and ignores dip=0x0b000201. If I run:
./trace_reader trace.tr dip=0x0b000201&sip=0x0b000101

now it only show events with dip=0x0b000201 and ignores sip.

Is there a bug in the trace reader? otherwise, how should I apply both conditions at the same time?

pfc.txt文件有问题！

大佬，您好：

pfc.txt文件里面数据都是这样的感觉不正常。第二列都是0，难道只有0节点才发生了pfc吗？其它的节点都没有发生pfc吗？我觉的其它的节点也会触发pfc，但是pfc这个文件没有记录下来。我该怎么去分析pfc呢？
感谢大佬！

Question about trace out put file

Hi, Mr. Miao
When I do the simulation using the following parameters:
--cc hp/timely/dcqcn --trace tmp_traffic --bw 100 --topo fat --hpai 50
I can get the empty trace result. May I ask what's the reason could be？ Should I change the node in the trace.txt file?
best wishes,
Qian