Comments (3)
@FurcyPin I am going to close it with #163 - you encounter any similar cases, please let me know.
from pyspark-stubs.
That's a valid point, thank you for reporting.
Related:
- [DISCUSS][SQL][PySpark] Column name support for SQL functions on Apache Spark Developers List.
from pyspark-stubs.
For simple unary and binary ((Column, Column) => Column)
functions we can identify possible mismatches like this:
from operator import attrgetter, itemgetter
import re
from py4j.protocol import Py4JError
import requests
from toolz.curried import (
filter, groupby, itemfilter, map, mapcat, pipe, sorted, valfilter, valmap
)
from toolz.functoolz import compose
from pyspark.sql import functions
from pyspark.sql.utils import ParseException
url = (
"https://raw.githubusercontent.com/apache/spark/"
"v2.4.3/sql/core/src/main/scala/org/apache/spark/sql/functions.scala"
)
func_pattern = re.compile("^\s*def ([\w_]+)\(([, :\w]*?)\): Column")
argtype_pattern = re.compile("[\w_]+: (\w+)")
def is_candidate(types):
pref = pipe(types, map(itemgetter(slice(0, 2))), set)
return (
(("Column", ) in pref and ("String", ) not in pref) or
(("Column", "Column") in pref and ("String", "String") not in pref)
)
def takes_string(item):
name, types = item
try:
f = getattr(functions, name)
if ("Column", "Column") in types:
f("foo", "bar")
else:
f("foo")
# Successfully applied string
return False
# Incorrect arity
except TypeError:
return False
# No such function in PySpark
except AttributeError:
return False
except ParseException:
return False
# Doesn't take String as the first argument
except Py4JError:
return True
pipe(
requests.get(url).text.splitlines(),
mapcat(func_pattern.findall),
sorted,
groupby(itemgetter(0)),
valmap(compose(
set,
map(compose(tuple, argtype_pattern.findall)),
map(itemgetter(1))
)),
valfilter(is_candidate),
itemfilter(takes_string),
list
)
which gives in 2.4.3:
['abs',
'array_repeat',
'ascii',
'base64',
'bitwiseNOT',
'lower',
'ltrim',
'rtrim',
'trim',
'unbase64',
'upper']
but excluding array_repeat
(which is a different beast), these are already fixed in 3.0.0.pre0 (master).
from pyspark-stubs.
Related Issues (20)
- [SPARK-32517] Add StorageLevel.DISK_ONLY_3
- [SPARK-31000][PYTHON][SQL] Add ability to set table description via Catalog.createTable()
- [SPARK-32449] Add summary to MultilayerPerceptronClassificationModel
- [SPARK-29157] Add DataFrameWriterV2 to Python API
- [SPARK-31849] Make PySpark SQL exceptions more Pythonic HOT 1
- [SPARK-32010] Thread leaks in pinned thread mode
- Support string type in pyspark.sql.DataFrameReader.csv's schema parameter HOT 1
- [SPARK-31656] AFT blockify input vectors
- [SPARK-32719] Add Flake8 check for missing imports
- [SPARK-32319] Disallow the use of unused imports
- [SPARK-32798] Make unionByName optionally fill missing columns with nulls in PySpark
- RandomForestRegressor.{__init__, setParams} are missing leafCol
- Drop hasSummary from LinearRegressionTrainingSummary, GeneralizedLinearRegressionTrainingSummary and LogisticRegressionSummary
- [SPARK-32835] Add withField method to the pyspark Column class
- pyspark-stubs installed pyspark-2.4.4 and corrupt pre-installed pyspark-3.0.0 HOT 3
- `pyspark.rdd.RDD.histogram`'s `buckets` argument is incomplete
- How to handle java backend stubs HOT 2
- DataFrameLike does not have to_sql method HOT 2
- Wrong type in Dataframe.write.parquet
- Allow latest version of pyspark HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyspark-stubs.