Coder Social home page Coder Social logo

Comments (10)

timmwille avatar timmwille commented on June 29, 2024 1

lets see where the datasets can go

So apart from Licenses and mods we have:

(@hoijui consider organizing the definition.csv alphabetically, really would help)

doc/
gen/
run/
res/
src/

existing options

let's go through one by one to clarify where datasets would go

  • doc/ : NO → this is where we want to put explanatory documentation that embeds from res/ (though I don't fully understand the difference between res/media/ and res/assets/media/
  • gen/ : NO → only generated files/outputs go here
  • run/ : NO → only for automation, helping build and keep the repo organized (to my understanding so far)
  • res/ : MAYBE → if the data is not SOURCE data that is constantly improved and worked with and used across doc/ and src/ equally as we always want "single source of truth" it makes sense → I'll write some examples in a bit
  • src/ : MAYBE → all Files that are part of the true "Source" of the project should sit here (no binaries!, no explanatory data apart from #comments in the code), the first place to look, that is where the CAB Review according to DIN SPEC 3105 will look (apart from the docs to go through to help with understanding)!

what about new directories?

I see only three options here:

  • data/ → very generic, but would cover a lot (not only a good thing)
  • datasets/ → very clear, might be a bit long as a name
  • records/ → a bit more open then datasets/, all data records would go here, even scraped data

Pro/Con and resulting open questions:

  • Is data/ or one of the other (datasets/ records/) a new main directory or part of the other?
  • Is records clear enough to not confuse with generated?
  • How to differentiate collected data from externally generated to internally generated data that sits in gen?

I'll evaluate this now

from osh-dir-std.

timmwille avatar timmwille commented on June 29, 2024 1

Basically that means we're discussing:

  1. Where to put it?
  • res/
  • src/
  • <new>/

and

  1. How to name it?:
  • data/
  • datasets/
  • records/

from osh-dir-std.

hoijui avatar hoijui commented on June 29, 2024

In an other practical example, I have slightly different data:

I wrote a script, that takes a git repo web URL (e.g. https://github.com/hoijui/osh-dir-std/), and by looking at that pages HTML source, decides whether the repo is public or not.
To come up with the code, I had to do some "research", going to different git repo hosting sites, and looking at the HTML source for their repos, both public and non-public (e.g. private) ones.
I then c&p out relevant parts, and collected them in a Markdown file, or say, two: public.md and private.md
Where to these belong?

  • src/scraped/
  • doc/scraped/
  • res/data/scraped/
  • data/scraped/
  • ...

from osh-dir-std.

timmwille avatar timmwille commented on June 29, 2024

I think it is a very relevant question to answer, maybe it helps to check again what higher level structure we have:
https://github.com/hoijui/osh-dir-std/blob/main/mod/unixish/definition.csv

Let me collect my thoughts, just a sec

PS: I don't fully understand your "scraped" use case yet, but will come back to that too

from osh-dir-std.

hoijui avatar hoijui commented on June 29, 2024

other possibly useful words:

  • gather
  • collect
  • recordings
  • collections

I like records a lot though!
It fits well for tabular data, for whatever dimensionality.
a issue with it is:
it describes the data-format, while (most) other dir names describe the data (content). for example, we have a directory called doc/; it is not called text/. then again, src/ is kind of in both categories.

from osh-dir-std.

timmwille avatar timmwille commented on June 29, 2024

Ok I suggest:

  • res/datasets/ : for scraped datasets and other data that is just there as a resource for other parts of the documentation and references*
  • src/records/ : for all source related work data that is complied manually or via external sources to help with development

this would also help (at least me) to better understand: res/media/ and res/datasets/ as resources in source format whilst every binary resources sit under res/assets/ 💡

* I think maybe even Survey data should go there? What about TSdCs related Technical specs of the overall Machine or external parts/modules that are proprietary?


Final thought

  • in case (for a reason I can only estimate slightly right now) we only talk about resources
    and not at all about source of the project

Example A

I want to collect data from a machine to evaluate the precision and have this as reference data in my repository,
so what would I do?

  • I would write a script-a in src/software/ with a src/calc/ logic file (isn't that also a software kind of?)
    behind and some output generated through a simulation src/sim/ using that calculation as well.
  • I would want to send this simulation output to ...?
    → would this go to dataset/records too? or is this a gen/sim/ output?
  • now I take src/software/script-a to run the test with the machine by talking through an API of a src/firmware/ and collect the data records in ...?
    → would this go to datasets/records too? or is this a src/test/ source now?
  • This data now counts as my real life reference for further src/sim/ simulation runs to improve the src/mech/ and src/elec/ design (maybe even to improve the script, the software or firmware as well).

Example B

I want to create a reference data sheet for measurements out of a 3D analysis of a physical object,
from there I'll generate a parametric design, what would I do?

Example C

I want to scrape metadata from other similar hardware projects as a reference for my calculations,
design and compare with my own metadata/specs even for documentation purposes, what would I do?

Example D

I want to create a realistic image of my wind turbine rotor blade design,
by using data-points from an external Airfoil generator software, what would I do?

  • [Concept Design step] I would go to the generator, input my preset rotor blade metadata from ...?
    → would this sit in datasets/records? or in gen/calc/ as it was calculated based on power/wind/size,
    so other machine config metadata?
  • [Mech Design step] I would take that data-points from the generator for a specific 2D profile
    and with some help of a src/calc/ mathematical logic file
    (might also be embedded in the CAD program I'm using)
    and crate a nice 3D CAD Model
  • [Simulation Design step] Then I import that CAD model in src/mech to a create a src/sim simulation,
    improve the design a bit and send it to src/anim/ for creating a photo-realistic image that will be send to ...?
    → is this then to go to gen/anim/ or is this image a file that will sit under res/assets/media/img/?

as reference I used this tree view:

run/
res/
res/conf/
res/media/
res/media/img/
res/assets/
res/assets/media/
res/assets/media/img/
res/assets/media/vid/
res/assets/var/
src/
src/anim/
src/calc/
src/sim/
src/elec/
src/firmware/
src/mech/
src/software/
src/test/
gen/
gen/site/
gen/anim/
gen/calc/
gen/sim/
gen/software/
gen/firmware/
gen/elec/
gen/mech/
gen/doc/
gen/doc/assembly/
gen/doc/manuf/
gen/doc/usr/
gen/doc/recycling/
doc/
doc/assembly/
doc/manuf/
doc/usr/

from osh-dir-std.

timmwille avatar timmwille commented on June 29, 2024

Here also #8 for easier communication

from osh-dir-std.

hoijui avatar hoijui commented on June 29, 2024

I figured, file is actually a very good fit according to its definition:

  1. a folder, cabinet, or other container in which papers, letters, etc., are arranged in convenient order for storage or reference.
  2. a collection of papers, records, etc., arranged in convenient order: to make a file for a new account.

would it really be an option though? :/

src/files/bla.csv

... too general, right?

from osh-dir-std.

hoijui avatar hoijui commented on June 29, 2024

other options:

from osh-dir-std.

timmwille avatar timmwille commented on June 29, 2024

Hey sorry I totally missed this but I like src/input/ actually very much, it indicates source files that are simply input for other design files/processes and might come from external/physical sources/measurments. It is then also not limited to datasets or records but could also be something else.

src/files/ is too generic!! So go with src/input/

from osh-dir-std.

Related Issues (10)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.