hmedal / lans Goto Github PK
View Code? Open in Web Editor NEWLArge LAbeled Netflow graph Simulator
License: GNU General Public License v3.0
LArge LAbeled Netflow graph Simulator
License: GNU General Public License v3.0
Lastly, they are attempting to build/run the code on a Cray XC40 machine and the version of mpi4py that LANS requires needs to be rebuilt for that machine. I am wondering if your group now has access to the Cray systems, whether you can try to provide a release that works/installs simply on an XC40 machine (only if you have access to one from Cray).
/usr/local/python/lib/python2.7/site-packages/pandas-0.16.2-py2.7-linux-x86_64.egg/pandas/core/indexing.py:115: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.setitem_with_indexer(indexer, value)
Traceback (most recent call last):
File "create_3D_edge_attribute_histograms.py", line 90, in
merged_df = pd.read_csv(temp_folder + 'merged_dataframe' + ctu_files[w].split('.', 1)[0] + '.csv')
File "/usr/local/python/lib/python2.7/site-packages/pandas-0.16.2-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 474, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/python/lib/python2.7/site-packages/pandas-0.16.2-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 250, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/python/lib/python2.7/site-packages/pandas-0.16.2-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 566, in init
self._make_engine(self.engine)
File "/usr/local/python/lib/python2.7/site-packages/pandas-0.16.2-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 705, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/local/python/lib/python2.7/site-packages/pandas-0.16.2-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 1072, in init
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 350, in pandas.parser.TextReader.cinit (pandas/parser.c:3187)
File "pandas/parser.pyx", line 594, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:5930)
IOError: File /work/sharun/1000_bin/temp/merged_dataframe_6.csv does not exist
From Mandy Sack:
Issue:
When I run with only 1 data set (5.binetflow) I receive the follow errors:
('Number of Processors: ', 16)
Traceback (most recent call last):
File "Enterprise_Connection_With_Graph_Simulation.py", line 103, in
main()
File "Enterprise_Connection_With_Graph_Simulation.py", line 50, in main
graphList.append(choice(org_graphList[1:len(org_graphList)]))
File "/opt/gd/lang/python-2.7.11/lib/python2.7/random.py", line 275, in choice
return seq[int(self.random() * len(seq))] # raises IndexError if seq is empty
IndexError: list index out of range
We should remove mention of an IDE
in the generate_edge function the global "innodes" and "outnodes" seem to reset between function calls.
these variables are assigned as empty global lists inside the create_graph function (they were previously declared outside of all functions to ensure that they were always run at the import declaration but changing location has had no noticable effect)
though the code removes lines from the roles within innodes in the nodeCreation function to clear it of all nodes with an indegree of less than 1 immediately after initializing it and after an a node's in degree is decremented a check is run to test if the new in degree of that node is less than 1 and remove the node from the list if it is, the code still chooses nodes that should have been removed from the innodes list as destination roles.
when generating nodes the predefined indegree is higher than the predefined outdegree, we need to check if this is random or bias
Exception in thread "main" org.apache.hadoop.fs.ParentNotDirectoryException: Parent path is not a directory: file:/work/sharun/LargeScenarios/input_files/1.binetflow
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:523)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:531)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:531)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
at org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:694)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.setupJob(FileOutputCommitter.java:313)
at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupJob(HadoopMapReduceCommitProtocol.scala:118)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:124)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:121)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:121)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:121)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:101)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:492)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:198)
at Properties$.main(Properties.scala:71)
at Properties.main(Properties.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Read 4 items
Error in if (substr(files[k], 1, 4) == "part") { :
missing value where TRUE/FALSE needed
Execution halted
However, then I ran into another error in Enterprise_Connection_With_Graph_Simulation.py at line:
create_graph(temp_folder,graphList[rank], seed = seedlist[rank], startpoint = startIndex[rank])
I was not able to get past that error quickly, and the version of pandas was rolled back to 0.19.1.
The 2nd issue I ran into was with networkx, and only in LANS version 6. Networkx version 2 was available in the sponsor’s environment. The following changes to Property.py makes it compatible with both version 1 and 2 of networkx.
Line62-Line66
def getInDegree(self):
return sorted(dict(self.G.in_degree()).values())
def getOutDegree(self):
return sorted(dict(self.G.out_degree()).values())
I found this site helpful for the mirgration: https://networkx.github.io/documentation/stable/release/migration_guide_from_1.x_to_2.0.html
simulated nodes do not always fit the histograms, need to switch histogram use to exact values rather than probabilities
This issue needs to be fixed in versions 5 an 6.1.
The first issue was noticed with pandas (same issues on both versions of LANS). The version of pandas that was available on the system was 0.21.1. None of these issues are seen when using pandas version 0.19.1
In the file role_mining.py, an error occurs at line:
feature_data = pd.read_csv(feature_file,delimiter=',',usecols=[0,1,2,3,4,5,6])
features = feature_data[[1,2,3,4,5,6]].as_matrix()
What I did to work around it before rolling back to pandas version 0.19.1 was to specify those columns, which made it past that error.
error with line
str = each[1].split(",",2)
in function get_histograms
caused by incorrect version of create_attribute_histograms.py
or incorrect attribute files being used as inputs
creating the 3d histograms threw an error wherein the merged dataframe would be created and then the code could not find the completed dataframe, this turned out to be an issue where hardcoding was used to find a .csv file where the actual result could be .csv or .binetflow
in the event of a .binetflow file graph_gen5.py looked for a file with only the scenario name
ex. if input was 5.binetflow, the code would look for just 5
Readme needs a few edits.
Hi Hugh, Mandy is traveling so I’ll do my best to describe the issues, she can correct me when she gets back.
The main issue was if either the sport or dport fields was empty it made things very unhappy. The way Mandy worked around it was to replace the empty field with a dummy value of 0.
She also removed the spaces in the hex values for those fields and converted them to decimal.
All the CTU data set manipulation was only required to be done once for the datasets so it was easy to forget they were tweaked.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.