The pyspark-examples's discuss from spark-examples

Unsupported Literal Type Error

The last line of code in pyspark-broadcast-dataframe.py:
filteDf= df.where((df['state'].isin(broadcastStates.value)))

gives this error:

SparkRuntimeException Traceback (most recent call last)
in <cell line: 29>()
27
28 # Broadcast variable on filter
---> 29 filteDf= df.where((df['state'].isin(broadcastStates.value)))

4 frames
/usr/local/lib/python3.10/dist-packages/pyspark/errors/exceptions/captured.py in deco(*a, **kw)
183 # Hide where the exception came from that shows a non-Pythonic
184 # JVM exception message.
--> 185 raise converted from None
186 else:
187 raise

SparkRuntimeException: [UNSUPPORTED_FEATURE.LITERAL_TYPE] The feature is not supported: Literal for '{FL=Florida, NY=New York, CA=California}' of class java.util.HashMap.

AttributeError: 'NoneType' object has no attribute 'rdd'

Line # 39
keysDF = df.select(explode(map_keys(df.properties))).distinct().show()
throws below error:

AttributeError Traceback (most recent call last)
in
----> 1 keysList = keysDF.rdd.map(lambda x:x[0]).collect()

AttributeError: 'NoneType' object has no attribute 'rdd'

exponential smoothing in Pyspark

Hello, I have a pandas code for exponential smoothening. But I am not able to do the same in pyspark.
def exponential_smoothing(x, alpha):
result = []
for value in x:
if result:
smoothed_value = alpha * value + (1 - alpha) * result[-1]
else:
smoothed_value = value
result.append(smoothed_value)
return result
def apply_exponential_smoothing(df, alpha):
df['product_area_sales_value_N_mean_T'] = df.groupby(['area_id', 'product_id'])['product_area_sales_value_N_mean'].transform(lambda x: exponential_smoothing(x, alpha))
df['product_area_sales_unit_N_mean_T'] = df.groupby(['area_id', 'product_id'])['product_area_sales_unit_N_mean'].transform(lambda x: exponential_smoothing(x, alpha))
return df

tmp3 = apply_exponential_smoothing(tmp3, alpha=0.8)
this is the code. here in pyspark, I am not able to fetch previous row smoothen value. there is no such functionality in pyspark. Please suggest solution in spark

sparkbyexample website not working

Hi,

Thank you for creating such awesome tutorials. I just want to give you kind notice that sparkbyexample website is not working for some reason.

Please help to resolve this.

List of strings is not a supported format to create dataframe.

pyspark-examples/currentdate.py

Line 21 in 0ae16f1

df = spark.createDataFrame(list('1'), schema=schema)

sparkbyexamples.com Portal not responding

26-AUG-2023 sparkbyexamples.com Portal is not responding.

Data

PySpark Examples in PDF

Where can I find a pdf format for all the examples on your website https://sparkbyexamples.com?

error of running pyspark using jupyter notebook on Windows, Exception: Java gateway process exited before sending its port number

Hello, I am trying to run pyspark examples on local windows machine, with Jupyter notebook using Anaconda. I followed this tutorial. and did not find any issue during the installation. However, I still got the following error messages when running the following example

import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
from pyspark.sql.functions import to_timestamp, current_timestamp
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, LongType

spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()

Exception Traceback (most recent call last)
in
5 from pyspark.sql.types import StructType, StructField, StringType, IntegerType, LongType
6
----> 7 spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()

~\Anaconda3\envs\sparkenv\lib\site-packages\pyspark\sql\session.py in getOrCreate(self)
226 sparkConf.set(key, value)
227 # This SparkContext may be an existing one.
--> 228 sc = SparkContext.getOrCreate(sparkConf)
229 # Do not update SparkConf for existing SparkContext, as it's shared
230 # by all sessions.

~\Anaconda3\envs\sparkenv\lib\site-packages\pyspark\context.py in getOrCreate(cls, conf)
382 with SparkContext._lock:
383 if SparkContext._active_spark_context is None:
--> 384 SparkContext(conf=conf or SparkConf())
385 return SparkContext._active_spark_context
386

~\Anaconda3\envs\sparkenv\lib\site-packages\pyspark\context.py in init(self, master, appName, sparkHome, pyFiles, environment, batchSize, serializer, conf, gateway, jsc, profiler_cls)
142 " is not allowed as it is a security risk.")
143
--> 144 SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
145 try:
146 self._do_init(master, appName, sparkHome, pyFiles, environment, batchSize, serializer,

~\Anaconda3\envs\sparkenv\lib\site-packages\pyspark\context.py in _ensure_initialized(cls, instance, gateway, conf)
329 with SparkContext._lock:
330 if not SparkContext._gateway:
--> 331 SparkContext._gateway = gateway or launch_gateway(conf)
332 SparkContext._jvm = SparkContext._gateway.jvm
333

~\Anaconda3\envs\sparkenv\lib\site-packages\pyspark\java_gateway.py in launch_gateway(conf, popen_kwargs)
106
107 if not os.path.isfile(conn_info_file):
--> 108 raise Exception("Java gateway process exited before sending its port number")
109
110 with open(conn_info_file, "rb") as info:

Exception: Java gateway process exited before sending its port number

spark-examples / pyspark-examples Goto Github PK

pyspark-examples's Issues

Unsupported Literal Type Error

AttributeError: 'NoneType' object has no attribute 'rdd'

Line # 39
keysDF = df.select(explode(map_keys(df.properties))).distinct().show()
throws below error:

exponential smoothing in Pyspark

sparkbyexample website not working

List of strings is not a supported format to create dataframe.

sparkbyexamples.com Portal not responding

Data

PySpark Examples in PDF

error of running pyspark using jupyter notebook on Windows, Exception: Java gateway process exited before sending its port number

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

spark-examples / pyspark-examples Goto Github PK

pyspark-examples's Issues

Line # 39 keysDF = df.select(explode(map_keys(df.properties))).distinct().show() throws below error:

Recommend Projects

Recommend Topics

Recommend Org

Line # 39
keysDF = df.select(explode(map_keys(df.properties))).distinct().show()
throws below error: