Coder Social home page Coder Social logo

pyspark-examples's Issues

AttributeError: 'NoneType' object has no attribute 'rdd'

Line # 39
keysDF = df.select(explode(map_keys(df.properties))).distinct().show()
throws below error:

AttributeError Traceback (most recent call last)
in
----> 1 keysList = keysDF.rdd.map(lambda x:x[0]).collect()

AttributeError: 'NoneType' object has no attribute 'rdd'

error of running pyspark using jupyter notebook on Windows, Exception: Java gateway process exited before sending its port number

Hello, I am trying to run pyspark examples on local windows machine, with Jupyter notebook using Anaconda. I followed this tutorial. and did not find any issue during the installation. However, I still got the following error messages when running the following example

import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
from pyspark.sql.functions import to_timestamp, current_timestamp
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, LongType

spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()





Exception Traceback (most recent call last)
in
5 from pyspark.sql.types import StructType, StructField, StringType, IntegerType, LongType
6
----> 7 spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()

~\Anaconda3\envs\sparkenv\lib\site-packages\pyspark\sql\session.py in getOrCreate(self)
226 sparkConf.set(key, value)
227 # This SparkContext may be an existing one.
--> 228 sc = SparkContext.getOrCreate(sparkConf)
229 # Do not update SparkConf for existing SparkContext, as it's shared
230 # by all sessions.

~\Anaconda3\envs\sparkenv\lib\site-packages\pyspark\context.py in getOrCreate(cls, conf)
382 with SparkContext._lock:
383 if SparkContext._active_spark_context is None:
--> 384 SparkContext(conf=conf or SparkConf())
385 return SparkContext._active_spark_context
386

~\Anaconda3\envs\sparkenv\lib\site-packages\pyspark\context.py in init(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
142 " is not allowed as it is a security risk.")
143
--> 144 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
145 try:
146 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

~\Anaconda3\envs\sparkenv\lib\site-packages\pyspark\context.py in _ensure_initialized(cls, instance, gateway, conf)
329 with SparkContext._lock:
330 if not SparkContext._gateway:
--> 331 SparkContext._gateway = gateway or launch_gateway(conf)
332 SparkContext._jvm = SparkContext._gateway.jvm
333

~\Anaconda3\envs\sparkenv\lib\site-packages\pyspark\java_gateway.py in launch_gateway(conf, popen_kwargs)
106
107 if not os.path.isfile(conn_info_file):
--> 108 raise Exception("Java gateway process exited before sending its port number")
109
110 with open(conn_info_file, "rb") as info:

Exception: Java gateway process exited before sending its port number



sparkbyexample website not working

Hi,

Thank you for creating such awesome tutorials. I just want to give you kind notice that sparkbyexample website is not working for some reason.

image

Please help to resolve this.

exponential smoothing in Pyspark

Hello, I have a pandas code for exponential smoothening. But I am not able to do the same in pyspark.
def exponential_smoothing(x, alpha):
result = []
for value in x:
if result:
smoothed_value = alpha * value + (1 - alpha) * result[-1]
else:
smoothed_value = value
result.append(smoothed_value)
return result
def apply_exponential_smoothing(df, alpha):
df['product_area_sales_value_N_mean_T'] = df.groupby(['area_id', 'product_id'])['product_area_sales_value_N_mean'].transform(lambda x: exponential_smoothing(x, alpha))
df['product_area_sales_unit_N_mean_T'] = df.groupby(['area_id', 'product_id'])['product_area_sales_unit_N_mean'].transform(lambda x: exponential_smoothing(x, alpha))
return df

tmp3 = apply_exponential_smoothing(tmp3, alpha=0.8)
this is the code. here in pyspark, I am not able to fetch previous row smoothen value. there is no such functionality in pyspark. Please suggest solution in spark

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.