Comments (5)
@michael1991 Thanks for raising this. Can you help me to reproduce this issue. I tried below but it was working fine for me.
fake = Faker()
data = [{"ID": fake.uuid4(), "EventTime": "2023-03-04 14:44:42.046661",
"FullName": fake.name(), "Address": fake.address(),
"CompanyName": fake.company(), "JobTitle": fake.job(),
"EmailAddress": fake.email(), "PhoneNumber": fake.phone_number(),
"RandomText": fake.sentence(), "CityNameDummyBigFieldName": fake.city(), "ts":"1",
"StateNameDummyBigFieldName": fake.state(), "Country": fake.country()} for _ in range(1000)]
pandas_df = pd.DataFrame(data)
hoodie_properties = {
'hoodie.datasource.write.table.type': 'COPY_ON_WRITE',
'hoodie.datasource.write.operation': 'upsert',
'hoodie.datasource.write.hive_style_partitioning': 'true',
'hoodie.datasource.write.recordkey.field': 'ID',
'hoodie.datasource.write.partitionpath.field': 'StateNameDummyBigFieldName,CityNameDummyBigFieldName',
'hoodie.table.name' : 'test'
}
spark.sparkContext.setLogLevel("WARN")
df = spark.createDataFrame(pandas_df)
df.write.format("hudi").options(**hoodie_properties).mode("overwrite").save(PATH)
for i in range(1, 50):
df.write.format("hudi").options(**hoodie_properties).mode("append").save(PATH)
from hudi.
Hi @ad1happy2go , glad to hear you again ~
Can you try column name with underscore, i'm not sure if enable urlencode for partition and partition column name with underscore could make this happen.
from hudi.
@michael1991
How many number of partitions in the table? Is it possible to get the URI? I was not able to reproduce this though.
from hudi.
@ad1happy2go Partitions are hours, for example, gs://bucket/tables/hudi/r_date=2024-06-17/r_hour=00. But problem only occurs on two partitions and underscore, we are using one partition column like yyyyMMddHH and it's going on well. Not sure the exact cause.
from hudi.
Related Issues (20)
- [SUPPORT] using spark's observe feature on dataframes saved by hudi is stuck HOT 3
- Corrupted parquet file in hudi partition | Deletion of partition in Hudi HOT 6
- [SUPPORT] Multi Writer Jobs with OCC (U1 and U2) with Async Cleaner
- [SUPPORT] how to migrate exist bloom index table to bucket table HOT 2
- [SUPPORT] Unable to Use DynamoDB Based Lock with Hudi PySpark Job Locally HOT 8
- [SUPPORT] Serde properties missing after migrate from hivesync to gluesync HOT 4
- [SUPPORT] HOT 1
- [SUPPORT] The clean service can't clean historical version files after the savepoint instant when i set `hoodie.archive.beyond.savepoint=true` HOT 1
- [SUPPORT] Multi Writer DeltaStreamer (W1 and W2) Writing into Partition IN and US One of them failing
- [SUPPORT] Data deduplication caused by drawback in the delete invalid files before commit HOT 9
- [SUPPORT] hoodie.datasource.write.precombine.field is invalid HOT 3
- [SUPPORT] HOT 5
- [SUPPORT] hoodie.cleaner.commits.retained Setting Overridden, Warning to Increase to 20 HOT 3
- [SUPPORT] select lots of values via Record Index HOT 7
- [SUPPORT] AWS Glue: An error occurred while calling o333.save. Failed to apply clean commit to metadata HOT 1
- Does Hudi has the warm/cold data archive solution HOT 1
- [SUPPORT] Caused by: org.apache.hudi.exception.HoodieException: Executor executes action [commits the instant 20240618064120870] error HOT 6
- [SUPPORT] SqlQueryBasedTransformer new field issue with PostgresDebeziumSource HOT 7
- [SUPPORT] Caused by: java.lang.ClassNotFoundException: org.apache.hudi.DefaultSource after hudi upgraded to 6.15 HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hudi.