jadianes / kdd-cup-99-spark Goto Github PK

View Code? Open in Web Editor NEW

100.0 100.0 68.0 3.83 MB

PySpark solution to the KDDCup99

License: Other

Shell 0.29% R 0.33% Python 2.40% Jupyter Notebook 96.98%

kdd-cup-99-spark's People

Contributors

Stargazers

Watchers

Forkers

prashantprakash lizhenmxcz frknozr anguliachao 4sp1r3 alhaol sandy4321 caohy1988 thuongdinh-agilityio klxk anhlbt shinytony shivangi14 fehiepsi bobquest33 pinal9 josuageovanipinem mutturajb akhila0407 forkedrepository muhakbaryasin ducnd baonq-me code2015wang mnuzman machinelearningorg mdiby linbinbin khanhnnvn brajen259 sunilangadi2 greatp wondek raja434 shubhampachori12110095 ichunhui pseemakurthi alincc lm121 mannyjop mramu111 zewdie2010 zhng1456 zhilin-zheng jj768 anuragsinghkushwah tiffen kchandan sultanmu wtbsw katieji737 berrouachediabdelkader it-gro isaacarnault dmaisnam acharfaranseh polar1shu profbiyi liuchao11 axibaa alanliyue mikelhpdatke 5l1v3r1 hamzaree psmaxwell c1x1x00xxpentium

kdd-cup-99-spark's Issues

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe.
: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://quickstart.cloudera:8020/nfs/data/KDD99/kddcup.data

If MinMaxScaler().fit_transform(x) doesn't work

If your sklearn version is higher than version 0.19（including 0.19），there will be a error that：ValueError: Expected 2D array, got 1D array instead

If your sklearn version is under 0.18，there is a warning that：Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19

Solution：
MinMaxScaler(copy=False).fit_transform(x)

https://stackoverflow.com/questions/50546065/scikit-learn-minmax-scaler-doesnt-scale

what does 'sys.argv[]' mean?

Hi there,
I've found some problems when I was running your code and I couldn't understand after 7 days debugging and thinking. So I want to ask.
One is what does 'sys.argv' mean in this segment. Cause it always print the this line and quit, if I annotation the exit raw it would become an error about 'sys.argv'.
if __name__ == "__main__": if (len(sys.argv) != 3): print("Usage: /path/to/spark/bin/spark-submit --driver-memory 2g " + \ "KDDCup99.py max_k kddcup.data.file") sys.exit(1)
The second is could you please tell me what does this segment do in the code, I couldn't understand it. QUQ
if __name__ == "__main__": if (len(sys.argv) != 3): print("Usage: /path/to/spark/bin/spark-submit --driver-memory 2g " + \ "KDDCup99.py max_k kddcup.data.file") sys.exit(1) # set up environment max_k = int(sys.argv[1]) data_file = sys.argv[2] conf = SparkConf().setAppName("KDDCup99") \ #.set("spark.executor.memory", "2g") sc = SparkContext(conf=conf)

If you can answer my questions I'll be great honored. Thank you for your time.

jadianes / kdd-cup-99-spark Goto Github PK

kdd-cup-99-spark's People

Contributors

Stargazers

Watchers

Forkers

kdd-cup-99-spark's Issues

你好文件要怎么样运行

How to solve this issue

If MinMaxScaler().fit_transform(x) doesn't work

what does 'sys.argv[]' mean?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent