salsify / arc-furnace Goto Github PK
View Code? Open in Web Editor NEWNeed to melt, weave, and meld information together? Arc furnace will fuse anything you've got.
License: MIT License
Need to melt, weave, and meld information together? Arc furnace will fuse anything you've got.
License: MIT License
In many cases the ArcFurnace nodes expect rows that come in to have certain fields--this is especially true for Hash
and Equijoin
nodes. These nodes should be resilient to missing data and properly log when expectations are not met instead of failing with a stacktrace (which they often do).
There's no documentation on how to use or add to arc-furnace aside from the one basic example
One of our projects has a ProductPipeline
that is a relatively simple implementation of the library. Here's a snippet:
require_relative 'constants'
require 'arc-furnace/pipeline'
require 'arc-furnace/excel_source'
require 'arc-furnace/all_fields_csv_sink'
class ProductsPipeline < ArcFurnace::Pipeline
include Constants
# create products source
source :products_source,
type: ArcFurnace::ExcelSource,
params: {
filename: :product_filename,
encoding: 'ISO-8859-1'
}
transform :products_transform, params: { source: :products_source } do |hash|
result = hash.deep_dup
result[SALSIFY_ID] = result.delete(BLAH_ID)
result
end
filter :filtered_products, params: { source: :products_transform, observed_products: :observed_products } do |row, params|
params.fetch(:observed_products).add(row[BLAH_ID])
end
sink type: ArcFurnace::AllFieldsCSVSink,
source: :filtered_products,
params: { filename: "#{Dir.pwd}/products_import.csv" }
end
The source file is a 14 MB XLSX file, but the output file is a 71 MB CSV file. The output file is five times larger and XLSX files tend to be larger than CSV files relative to the information contained. I tried removing the filter and the file size was the same, and I spot checked the file and they look identical. I feel like something is going wrong with the AllFieldsCSVSink
.
Queue @dspangen
If there are no columns found matching the join key column, arc furnace should throw an error and not continue to process the file.
In the case of an equijoin the transform process finishes with no errors but it's dropped all rows on the floor and won't produce an output.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.