Comments (10)
lets see where the datasets can go
So apart from Licenses and mods we have:
(@hoijui consider organizing the definition.csv alphabetically, really would help)
doc/
gen/
run/
res/
src/
existing options
let's go through one by one to clarify where datasets would go
doc/
: NO → this is where we want to put explanatory documentation that embeds fromres/
(though I don't fully understand the difference betweenres/media/
andres/assets/media/
gen/
: NO → only generated files/outputs go hererun/
: NO → only for automation, helping build and keep the repo organized (to my understanding so far)res/
: MAYBE → if the data is not SOURCE data that is constantly improved and worked with and used acrossdoc/
andsrc/
equally as we always want "single source of truth" it makes sense → I'll write some examples in a bitsrc/
: MAYBE → all Files that are part of the true "Source" of the project should sit here (no binaries!, no explanatory data apart from#comments in the code
), the first place to look, that is where the CAB Review according to DIN SPEC 3105 will look (apart from the docs to go through to help with understanding)!
what about new directories?
I see only three options here:
data/
→ very generic, but would cover a lot (not only a good thing)datasets/
→ very clear, might be a bit long as a namerecords/
→ a bit more open thendatasets/
, all data records would go here, even scraped data
Pro/Con and resulting open questions:
- Is
data/
or one of the other (datasets/
records/
) a new main directory or part of the other? - Is
records
clear enough to not confuse withgenerated
? - How to differentiate collected data from externally generated to internally generated data that sits in
gen
?
I'll evaluate this now
from osh-dir-std.
Basically that means we're discussing:
- Where to put it?
res/
src/
<new>/
and
- How to name it?:
data/
datasets/
records/
from osh-dir-std.
In an other practical example, I have slightly different data:
I wrote a script, that takes a git repo web URL (e.g. https://github.com/hoijui/osh-dir-std/
), and by looking at that pages HTML source, decides whether the repo is public or not.
To come up with the code, I had to do some "research", going to different git repo hosting sites, and looking at the HTML source for their repos, both public and non-public (e.g. private) ones.
I then c&p out relevant parts, and collected them in a Markdown file, or say, two: public.md
and private.md
Where to these belong?
src/scraped/
doc/scraped/
res/data/scraped/
data/scraped/
- ...
from osh-dir-std.
I think it is a very relevant question to answer, maybe it helps to check again what higher level structure we have:
https://github.com/hoijui/osh-dir-std/blob/main/mod/unixish/definition.csv
Let me collect my thoughts, just a sec
PS: I don't fully understand your "scraped" use case yet, but will come back to that too
from osh-dir-std.
other possibly useful words:
- gather
- collect
- recordings
- collections
I like records
a lot though!
It fits well for tabular data, for whatever dimensionality.
a issue with it is:
it describes the data-format, while (most) other dir names describe the data (content). for example, we have a directory called doc/
; it is not called text/
. then again, src/
is kind of in both categories.
from osh-dir-std.
Ok I suggest:
res/datasets/
: for scraped datasets and other data that is just there as a resource for other parts of the documentation and references*src/records/
: for all source related work data that is complied manually or via external sources to help with development
this would also help (at least me) to better understand:
res/media/
andres/datasets/
as resources in source format whilst every binary resources sit underres/assets/
💡
* I think maybe even Survey data should go there? What about TSdCs related Technical specs of the overall Machine or external parts/modules that are proprietary?
Final thought
- in case (for a reason I can only estimate slightly right now) we only talk about resources
and not at all about source of the project
Example A
I want to collect data from a machine to evaluate the precision and have this as reference data in my repository,
so what would I do?
- I would write a
script-a
insrc/software/
with asrc/calc/
logic file (isn't that also a software kind of?)
behind and some output generated through a simulationsrc/sim/
using that calculation as well. - I would want to send this simulation output to ...?
→ would this go to dataset/records too? or is this agen/sim/
output? - now I take
src/software/script-a
to run the test with the machine by talking through an API of asrc/firmware/
and collect the data records in ...?
→ would this go to datasets/records too? or is this asrc/test/
source now? - This data now counts as my real life reference for further
src/sim/
simulation runs to improve thesrc/mech/
andsrc/elec/
design (maybe even to improve the script, the software or firmware as well).
Example B
I want to create a reference data sheet for measurements out of a 3D analysis of a physical object,
from there I'll generate a parametric design, what would I do?
Example C
I want to scrape metadata from other similar hardware projects as a reference for my calculations,
design and compare with my own metadata/specs even for documentation purposes, what would I do?
Example D
I want to create a realistic image of my wind turbine rotor blade design,
by using data-points from an external Airfoil generator software, what would I do?
- [Concept Design step] I would go to the generator, input my preset rotor blade metadata from ...?
→ would this sit in datasets/records? or ingen/calc/
as it was calculated based on power/wind/size,
so other machine config metadata? - [Mech Design step] I would take that data-points from the generator for a specific 2D profile
and with some help of asrc/calc/
mathematical logic file
(might also be embedded in the CAD program I'm using)
and crate a nice 3D CAD Model - [Simulation Design step] Then I import that CAD model in
src/mech
to a create asrc/sim
simulation,
improve the design a bit and send it tosrc/anim/
for creating a photo-realistic image that will be send to ...?
→ is this then to go togen/anim/
or is this image a file that will sit underres/assets/media/img/
?
as reference I used this tree view:
run/
res/
res/conf/
res/media/
res/media/img/
res/assets/
res/assets/media/
res/assets/media/img/
res/assets/media/vid/
res/assets/var/
src/
src/anim/
src/calc/
src/sim/
src/elec/
src/firmware/
src/mech/
src/software/
src/test/
gen/
gen/site/
gen/anim/
gen/calc/
gen/sim/
gen/software/
gen/firmware/
gen/elec/
gen/mech/
gen/doc/
gen/doc/assembly/
gen/doc/manuf/
gen/doc/usr/
gen/doc/recycling/
doc/
doc/assembly/
doc/manuf/
doc/usr/
from osh-dir-std.
Here also #8 for easier communication
from osh-dir-std.
I figured, file is actually a very good fit according to its definition:
- a folder, cabinet, or other container in which papers, letters, etc., are arranged in convenient order for storage or reference.
- a collection of papers, records, etc., arranged in convenient order: to make a file for a new account.
would it really be an option though? :/
src/files/bla.csv
... too general, right?
from osh-dir-std.
other options:
- charts
- knowledge
- compilations
- input
- documents
- archive
- text
- writing
- written_material
- excerpts
- extracts
- listings
- index
from osh-dir-std.
Hey sorry I totally missed this but I like src/input/
actually very much, it indicates source files that are simply input for other design files/processes and might come from external/physical sources/measurments. It is then also not limited to datasets or records but could also be something else.
src/files/
is too generic!! So go with src/input/
from osh-dir-std.
Related Issues (10)
- Link to Rust library&CLI for this standard in the README
- How to guide a non-(fully-)conforming projects maintainer to improve the situation?
- Add `doc/design/` for design guidelines
- Distribute modularity?
- Add more tags
- Should `gen/` files always be a reference for `doc/` or is the manual output rather put under `res/`? HOT 6
- organise definition.csv alphabetically HOT 8
- REUSE like system for (OSH related) file-meta-data? HOT 2
- Additional PCB/Electronics samples for LibrePCB & Horizon-EDA
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from osh-dir-std.