Coder Social home page Coder Social logo

lans's People

Contributors

col11 avatar hmedal avatar jonathandavis552 avatar sarahharun avatar zhangfangyan avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

lans's Issues

Get LANS running on Cray machine

Lastly, they are attempting to build/run the code on a Cray XC40 machine and the version of mpi4py that LANS requires needs to be rebuilt for that machine. I am wondering if your group now has access to the Cray systems, whether you can try to provide a release that works/installs simply on an XC40 machine (only if you have access to one from Cray).

LANS-V5: 3D histogram code failed to handle multiple input files

/usr/local/python/lib/python2.7/site-packages/pandas-0.16.2-py2.7-linux-x86_64.egg/pandas/core/indexing.py:115: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame

See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
self.setitem_with_indexer(indexer, value)
Traceback (most recent call last):
File "create_3D_edge_attribute_histograms.py", line 90, in
merged_df = pd.read_csv(temp_folder + 'merged_dataframe
' + ctu_files[w].split('.', 1)[0] + '.csv')
File "/usr/local/python/lib/python2.7/site-packages/pandas-0.16.2-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 474, in parser_f
return _read(filepath_or_buffer, kwds)
File "/usr/local/python/lib/python2.7/site-packages/pandas-0.16.2-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 250, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/usr/local/python/lib/python2.7/site-packages/pandas-0.16.2-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 566, in init
self._make_engine(self.engine)
File "/usr/local/python/lib/python2.7/site-packages/pandas-0.16.2-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 705, in _make_engine
self._engine = CParserWrapper(self.f, **self.options)
File "/usr/local/python/lib/python2.7/site-packages/pandas-0.16.2-py2.7-linux-x86_64.egg/pandas/io/parsers.py", line 1072, in init
self._reader = _parser.TextReader(src, **kwds)
File "pandas/parser.pyx", line 350, in pandas.parser.TextReader.cinit (pandas/parser.c:3187)
File "pandas/parser.pyx", line 594, in pandas.parser.TextReader._setup_parser_source (pandas/parser.c:5930)
IOError: File /work/sharun/1000_bin/temp/merged_dataframe_6.csv does not exist

Issue from Sponsor (IndexError: list index out of range)

From Mandy Sack:

Issue:
When I run with only 1 data set (5.binetflow) I receive the follow errors:


('Number of Processors: ', 16)


Traceback (most recent call last):
File "Enterprise_Connection_With_Graph_Simulation.py", line 103, in
main()
File "Enterprise_Connection_With_Graph_Simulation.py", line 50, in main
graphList.append(choice(org_graphList[1:len(org_graphList)]))
File "/opt/gd/lang/python-2.7.11/lib/python2.7/random.py", line 275, in choice
return seq[int(self.random() * len(seq))] # raises IndexError if seq is empty
IndexError: list index out of range

indegrees in simulated graph do not fit the original graph or the simulated nodes

in the generate_edge function the global "innodes" and "outnodes" seem to reset between function calls.
these variables are assigned as empty global lists inside the create_graph function (they were previously declared outside of all functions to ensure that they were always run at the import declaration but changing location has had no noticable effect)
though the code removes lines from the roles within innodes in the nodeCreation function to clear it of all nodes with an indegree of less than 1 immediately after initializing it and after an a node's in degree is decremented a check is run to test if the new in degree of that node is less than 1 and remove the node from the list if it is, the code still chooses nodes that should have been removed from the innodes list as destination roles.

LANS-V4: Error in Graph property calculation for scenario 3 and 1

Exception in thread "main" org.apache.hadoop.fs.ParentNotDirectoryException: Parent path is not a directory: file:/work/sharun/LargeScenarios/input_files/1.binetflow
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:523)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:531)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:531)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:504)
at org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:694)
at org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter.setupJob(FileOutputCommitter.java:313)
at org.apache.spark.internal.io.HadoopMapReduceCommitProtocol.setupJob(HadoopMapReduceCommitProtocol.scala:118)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply$mcV$sp(FileFormatWriter.scala:124)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:121)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1.apply(FileFormatWriter.scala:121)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:57)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:121)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:101)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:58)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:56)
at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:74)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:114)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:135)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:132)
at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:113)
at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:87)
at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:87)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:492)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:198)
at Properties$.main(Properties.scala:71)
at Properties.main(Properties.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:738)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Read 4 items
Error in if (substr(files[k], 1, 4) == "part") { :
missing value where TRUE/FALSE needed
Execution halted

Fix networkx issue (only in LANS v6)

However, then I ran into another error in Enterprise_Connection_With_Graph_Simulation.py at line:
create_graph(temp_folder,graphList[rank], seed = seedlist[rank], startpoint = startIndex[rank])
I was not able to get past that error quickly, and the version of pandas was rolled back to 0.19.1.
 
The 2nd issue I ran into was with networkx, and only in LANS version 6. Networkx version 2 was available in the sponsor’s environment. The following changes to Property.py makes it compatible with both version 1 and 2 of networkx.
 
Line62-Line66
    def getInDegree(self):
        return sorted(dict(self.G.in_degree()).values())
 
    def getOutDegree(self):
        return sorted(dict(self.G.out_degree()).values())
 
I found this site helpful for the mirgration: https://networkx.github.io/documentation/stable/release/migration_guide_from_1.x_to_2.0.html

Fix pandas issue

This issue needs to be fixed in versions 5 an 6.1.

The first issue was noticed with pandas (same issues on both versions of LANS). The version of pandas that was available on the system was 0.21.1. None of these issues are seen when using pandas version 0.19.1
In the file role_mining.py, an error occurs at line:
    feature_data = pd.read_csv(feature_file,delimiter=',',usecols=[0,1,2,3,4,5,6])
    features = feature_data[[1,2,3,4,5,6]].as_matrix()
 
What I did to work around it before rolling back to pandas version 0.19.1 was to specify those columns, which made it past that error.

version 5, error in get_histograms

error with line
str = each[1].split(",",2)
in function get_histograms

caused by incorrect version of create_attribute_histograms.py
or incorrect attribute files being used as inputs

create 3d histograms throwing error file not found

creating the 3d histograms threw an error wherein the merged dataframe would be created and then the code could not find the completed dataframe, this turned out to be an issue where hardcoding was used to find a .csv file where the actual result could be .csv or .binetflow

Fix issue with issues with running code for CTU datasets

Hi Hugh, Mandy is traveling so I’ll do my best to describe the issues, she can correct me when she gets back.
 
The main issue was if either the sport or dport fields was empty it made things very unhappy.  The way Mandy worked around it was to replace the empty field with a dummy value of 0.
 
She also removed the spaces in the hex values for those fields and converted them to decimal.
 
All the CTU data set manipulation was only required to be done once for the datasets so it was easy to forget they were tweaked.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.