I work at IRMAR in Rennes, France.
pnavaro / big-data Goto Github PK
View Code? Open in Web Editor NEWPython tools for big data
Home Page: https://pnavaro.github.io/big-data
Python tools for big data
Home Page: https://pnavaro.github.io/big-data
I work at IRMAR in Rennes, France.
In notebook 14-FileFormats, the code imports pyarrow as pa but uses it as pq.
import pyarrow as pa
pq.write_to_dataset(table, root_path="test", filesystem=hdfs)
Bonsoir,
Excusez-moi de vous déranger.
Est-ce possible de mettre les corrections sur Github ?
Cordialement
Bonjour Mr Navarro,
Je rencontre des difficultés à ouvrir mon dossier csv.
En effet, j'ai bien réussi à installer Pyspark sur mon Pycharm
from pyspark.sql import SparkSession
spark = SparkSession
.builder
.appName('Convert CSV to parquet')
.master('local')
.config('spark.hadoop.parquet.enable.summary-metadata', 'true')
.getOrCreate()
spark.read.format('csv').options(header='true').load('data_small.csv')
df = spark.read.csv(u'D:\AILIS\bigdata\yellow_tripdata_2012_01.csv', header="true", inferSchema="true")
Je reçois le message d'erreur suivant : pyspark.sql.utils.AnalysisException: Path does not exist: file:/D:/AILIS/bigdata/yellow_tripdata_2012_01.csv (alors que la base s'y trouve bien !)
C'est apparement un problème d'Hadoop mais je ne comprends pas comment le résoudre malgré mes nombreuses recherches sur internet... peut-être est-ce au niveau du master ('local ') ? Vous indiquez également qu'il faut le convertir en parquet dans mon hdfs homedirectory, à quoi cela correspond ?
Puis-je ne télécharger que 2 mois, je n'ai malheureusement pas la place de télécharger les 12 mois sur mon ordinateur.
Merci d'avance
En vous souhaitant de belles fêtes
Cordialement
Aïlis THOMAS
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.