Coder Social home page Coder Social logo

Comments (8)

abey79 avatar abey79 commented on June 3, 2024

Thanks for the extensive, thought-provoking write-up!

First, I agree with the premises:

  1. The data model has a major impact on the software structure and abilities. The "polyline collection" model made vpype 1 very easy to write and maintain, but has obvious limitations, especially in a SVG-centric environment.
  2. Amongst its big features, vpype 2.0 should support at least the SVG primitives. (The other ones being path-level metadata and a more flexible I/O architecture, cf. the vpype-gcode integration discussion).

Objectives

Now, IMO the "v2 data model" should be designed against a clear set of objectives in terms of:
A) Which feature should it support?
B) What is the expected performance?
C) How "clean"/easy-to-use/accessible should its API be? (for purposes of maintainability and use by third parties in plugins)

Before getting to your proposal, let me brain-dump what I currently have in mind for the above. This is obviously TBD and subject to evolve, but I guess it can serve as starting point.

Feature set

Must have:

  • As I mentioned before, it should at least support the SVG geometric primitives: points, (poly-)line, arcs, quad/cubic bezier. (Though points can technically be supported with vpype's data model, they are typically interchangeably filtered out, badly handled, or ignore. This is due to the "plotter" focus. In V2, they should be explicitly supported and specially handled by the writers, e.g. to optionally generate a tiny circle so that they actually appear in plotted material.)
  • Obviously, "basic", easy-to-implement operations such as translation, rotation, scaling, flipping, and re-ordering should be supported.
  • Cropping (i.e. at given x or y locations, not arbitrary masking) is needed too. I'm not sure how easy it is to implement on a bezier curve.
  • Polyline conversion: some operation are bound to require pure polyline geometries for their implementation. For example: gcode export, or the squiggle command.
  • Support for path-level metadata, in addition to the current layer-level and global metadata. (Here I define "path" as a single-stroke graphic element, possibly made of multiple segment, each of which could be of different type. So similar to SVG's <path> element, with the constrain of no discontinuity induced by some intermediate m command.)

Not sure:

  • Skewing came for free with raw point data and matrix operations, but might be non-trivial with some primitives. I'd happily drop it if it cause any trouble (or simply require polyline conversion).
  • Segment-level metadata. I currently see no need for this but would happily be proven otherwise.

Excluded:

  • Everything related to computational geometry (boolean operations, predicates, etc.—basically everything provided by Shapely) should be outside of the scope IMHO. This is a somewhat personal take, based on the motivation and time (or lack thereof) to create and maintain such a collection of tools. Shapely is, and would remain a dependency of vpype, and should be used whenever practical.

TODO:

  • It would be useful to establish a proper list of needed operation, based on existing vpype code and plug-in.
  • This list should include metadata aspects as well (see my remark below as well).

Expected performance

Though they would be useful, I don't have hard numbers/benchmark for this. Here are my subjective take for the time being:

  • I've always thought of vpype's performance in comparison to plotting. Processing a file should take significant less time than is required to plot it. ~3 orders of magnitude seems reasonable: it's ok to spend 3s processing a file for a 1-hour plot.
  • 1-2 additional orders of magnitude are beneficial for batch processing (e.g. using Gnu Parallel, DoIt, or via the vsk save feature).
  • Corollary: the goal is certainly not to create a high throughput engine suitable for use in, e.g. HPC or high-traffic web server.
  • There are reasons to use vpype (or at least part of its feature set) on SMC such as the Raspberry Pi. Reasonable performance is beneficial, though I/O often is the predominant factor in this context.

API

vpype is both a library (the vpype package) and a CLI application (the vpype_cli package). As a library, vpype used by vpype_cli, vsketch, vpype's plug-ins, and, potentially, third party packages (e.g. my Axidraw UX taxi, and I'm also aware of some Inkscape integration effort). Although I dont think vpype currently fully succeeds at this, my vision is to have a usable, Pythonic, full-typed API. IMO, this objective takes precedence over improving beyond the (admittedly ill-defined) performance target mentioned earlier.

In theory, any data model can possibly be wrapped inside a nice API. In practice, the required implementation/maintenance effort will vary a lot depending on the chosen data model.

Side note: from my experience with adding some metadata support to vpype, the question of metadata handling and it's impact on both the API and actual, high-level features (i.e. vpype command) implementation is probably greater and trickier than the purely geometric aspects. Proper metadata handling must be well-though and properly factored in the data model.

My comments on the "Numpy Nx5" proposal

In light of the feature set, your proposal covers what I think would be needed for vpype 2. As I implied earlier, I reject the idea of a custom implementation of advanced computation geometry feature. I'd rather use Shapely for that.

I terms of performance, you make a strong point that this is probably the most performant implementation that pure python can get. However, is this warranted? I'm not sure. In my experience, most real-world files contains paths that are either polyline, or compound paths with very few primitive (e.g. rounded rect: four lines and four arcs). In this context, the naive use of list[Segment] (where Segment is part of a class hierarchy with PolyLine, Arc, CubicBezier, etc.) is not that bad because the list length will be small in the vast majority of cases (single element for pure polylines, 8 elements for rounded rect, etc.). So long as PolyLine is implemented with Numpy (like in vpype 1), this approach should scale correctly.

As you might expect by now, my biggest issue is the work that would be required to wrap this into a proper API, compared to the "naive" implementation, which could be made cleanly enough that no layering would be needed API-wise.

Misc. specific issue I'm seeing with this format:

  • the arc definition is not unique (ctrl may be anywhere on the arc) and possibly prone to numeric issue (if ctrl is very close to start and/or end)
  • redundancy make it easy for the geometry structure to be invalid (e.g. a Point with start != stop, etc.
  • the info field reminds me of my C past (could be addressed by structured numpy array)
  • next to no typing/static analysis is possible with this approach

Going forward

Building a reference implementation

I don't see myself going for a radical choice like your proposal without building a reference prototype with "regular", Pythonic code to explore the baseline in terms of performance and API. I've been thinking a bit about this lately, and might making strides in that direction part of my new year resolutions :) This should enable proper benchmarking and API exploration.

Improving on your proposal

As I suggested already, I think your proposal could be much improved by using Structured arrays. In particular, the info field could be split into its components and given proper type. Proper column labels are nice too. This would be at the cost of the segment flip ability, but that is easy enough to implement "manually".

vpype2 sandbox

I'm going make a sandbox branch with a vpype2_sandbox directory to store experiments and prototypes (I already have two that, as simple as they are, i'd like to keep track of). Dont hesitate to add stuff there (via PR). I'm not going to be picky or whatever, this is just a sandbox that will likely be removed altogether later on.

from vpype.

tatarize avatar tatarize commented on June 3, 2024

SVG Arc is a dumpster fire.

vpype 2.0 should support at least the SVG primitives.

I would advise against supporting the SVG arc primitive there. It's an absolute dumpster fire of a primitive. You can easily simulate the arc with quads, cubics, or circular arcs (which are really nice), unlike the weird parameters of the SVG arc. You can do circular arcs with just a start and end and a control point. Including that monstrosity basically make all math impossible. I would highly recommend just remove them and replacing them with some circular arcs or with some bezier curves. The circular arc code is easy, supported by gcode controllers, and a lot more graphics packages for display. Pick how many circular arcs you want to use. Then subdivide whatever curve you want to simulate, and pick three points that occur on the curve. Since you can always make a circular arc with start, end, and a control point, you call those three points an arc and it will do a really good job at that.

SvgPathtools has some fantastic math for various primitives and it always gets really fuzzy for arcs.

  • Note: Point isn't actually an SVG primitive. They do have a SVGPoint() class but it's not a thing you can just put in the design somewhere.

Gcode export

That's not necessarily directly related to line collections. I have some generic writer stuff for embroidery, it's just that there's a lot a commands and it has some notably complex parts.

https://github.com/EmbroidePy/pyembroidery/blob/master/pyembroidery/GenericWriter.py

Segment level metadata

Segment-level metadata. I currently see no need for this but would happily be proven otherwise.

ARCgis does some things with this sort of stuff, they have associated dictionaries for settings or something. Where the actual objects can have a bunch of data. You don't need a different layer and could even invent layers on the fly. You would be able to set a series of points and give them all an index number to draw a connects the dots.

Numpy Nx5

  • Typically if you're picking an arc control you put it at about the center, and this isn't hard to mess with. While not covered there's a couple quirks with the arc. For example if you have coincident start and end point it actually makes a ccw circle with the far side being dictated by the control point. You could potentially make a point where start != end. Or a quad or arc where control != control2, but usually nobody should be fiddling with this data. You can also easily make sets of segments where the segments segment(n).end != segment(n+1).start though in this case the understanding is that you moved to the new start location there. It's highly useful to just pass an array as a shape. But, generally these aren't invalid (per se), they just don't actually match. These shouldn't crash and restoring the validity would be really straight-forward if it was required for something.

In theory the code should work for data arranged in this way regardless how it's created. And for speed there's always a lot of utility in making something fairly uniform out of primitives.

Structured Arrays

I tried to use this stuff before and it didn't really turn out to be extremely helpful and seemed kind of slow accordingly. You lose some very useful commands. You also lose the somewhat easy ability to convert to lists of pure python primitives of the same type. And if you're at the point of using a structured array, you'd maybe be better off writing normal c++ classes and hooking them in, which is basically the core of Shapely in geos, or whatever.

Addendum / Shapely doesn't do curves.

Part of the hope here is to make something really easy to use that fits generally into a lot of different use-cases. I've done work in lasers, embroidery, etc. And I really like keeping good curves around as long as possible. Shapely is sometimes nice but notably it doesn't have curves. But, you'd be surprised how quickly svg paths get really really slow, when you load up a 20k heavy objects. And while you can almost always get by with enough lines, there are some things like gcode where you can actually draw circular arcs directly. And without any methods of dealing with them you can't get to that step. It seems like a lot of the math isn't too hard or weird, especially when you get rid of ellipical arcs and use circular arcs with very nice parameterizations. I also do a bunch cut planning for the laser software and it's lead to me using a like 3 different types of path-like classes. And this was the simplest thing that would speed things up a lot, that could also include things like point here to dwell the laser, with specific metadata attached to the individual points, I could even store a different dwell time for each of the multipoints.

from vpype.

abey79 avatar abey79 commented on June 3, 2024

It recently occurred to me that your proposed memory representation for curves could serve as a basis for an extremely performant GPU-based render. That data could be parsed by a tessellation shader to generate vertex, which could in turn be processed by my existing "fat line" geometry shader. Highly compact buffer for fast CPU-GPU transfer and 100% of the work done in the GPU. This should be fast.

from vpype.

tatarize avatar tatarize commented on June 3, 2024

I am of the rather firm opinion that using memory based struct objects that are not always uniform tends to suffer some compatibility problems, not insurmountable but often inefficient. But, if I say these 5 complexes or 10 floats are my shape that we can do a lot of very nice and very fast operations on the structure. I think I've run into problems just streaming the memory straight to disk and reloading other structures. Which make me pretty confident that a set number of uniform core primitives is going to be more useful. Even numpy has some serious limitations with their own struct objects (a lot of commands don't work well with them).

And yeah, a goodly amount of that is because you can store them in memory in very clearly sized blocks. So each 80 bytes is specifically able to be loaded as these complex objects merely by reading the entire chunk of memory and assigning it as complexes, and likely doing the same and assigning them as floats. This also makes the format quite cross platform and pretty compact to save.

If I could get you onboard with this or something decidedly very similar to this. It might make things easier on me (and you), I could likely use your highly performant GPU rendering or use my 2-opt algorithm fairly interchangeably. Since all shapes would be fairly standardized the interchange between different formats would be really easy. Even if using a system that doesn't take complexes, these can be turned into floats.

The file format to save this would basically be take this chunk of memory, write it directly to disk. Load this same chunk of memory from disk and memory assign those values as this type. At the most basic floats and ints tend to work like this, so if you can build a very uniform structure and just set some rules to govern how we treat this structure that definition would work, not just GPU vs CPU but also any other programming system, since we're loading not just a primitive but one that is massively well supported (assuming we can just treat these as floats when needed). You may run into issues if your memory goes <complex> <complex> <int>, <int>, <bool> <complex> <complex> since even if you want to load this into a, say, a Java ByteBuffer you wouldn't be able to just tell it to magic into a bunch of floats. The same likely applies to shifting from CPU to GPU, since your structs are going to be misaligned for the expectation there.

Yes, please strongly reconsider. Despite your earlier criticism, I still think this is the way to go. Or something decidedly very similar to this. Obviously there's parts of the definitions and even shape types that could be converted and extended but it really does seem like something you could store in highly structured memory and move around is the way to go.

from vpype.

abey79 avatar abey79 commented on June 3, 2024

To be entirely transparent, I've recently engaged on an entirely different trip. For a number of unrelated reasons, I've been intensively learning Rust over the last month or so, and now I'm exploring how this could play out in the vpype landscape. This is highly speculative stuff, and it tends to massively shift paradigms left and right. My current (again, highly speculative) question is "what if vpype-core (e.g. the current vpype package—as opposed to vpype-cli which contains all the commands etc.) was to be rewritten in Rust? It's too early for me to even try to begin to answer this, but it is overwhelmingly clear that, if it happened, the data would be modelled with proper, native Rust structures (its actually an excellent fit).

That said, my earlier point was actually tangential to this discussion. Your data format would be very good for a GPU-based renderer, even if that buffer is constructed from other data structures for display purposes only. As a matter of fact, the vpype viewer already constructs such a buffer, except it's only vertex since only polylines are currently supported.

I should also say that, although my viewer has plenty of limitations (due to the very narrow scope) and some visual glitches, I have yet to find anything coming close in terms of performance, even it the (high-level) Rust crates I've been exploring so far. (It could of course be very efficiently re-implemented with low-level GPU crates such as wgpu). This might currently be a bit overlooked. I should probably extract this render engine to increase its general usefulness beyond vpype, especially if it was to include curves stuff.

Unrelated but of potential interest to you: the usvg crate appears to be the Rust equivalent of svgelements. It aims to abstract the complexity of attribute inheritance, fully parses transforms, and reduces everything (including rect, circle, etc.) to path with only M, L, C and Z commands. Unsurprisingly, it is fast: ~30x faster than vpype for a particularly degenerate 90MB SVG I have on hand.

from vpype.

tatarize avatar tatarize commented on June 3, 2024

Yeah, it might be made correctly from what I can see. Basically svgelements is a bit misdesigned. The emphasis on the render tree being the same as the parsed tree is wrong. Basically the correct core methodology would be to parse the DOM entirely. Then use the given DOM tree to render shapes and paths. Which could use Path or anything else. But, the core method in svgelements has it take PathNode objects and calls them Path and sort of glosses over the distinctions within the spec. Basically svgelements needs to parse to DOM then you can arrange and set whatever values you want in that format and set CSS and other elements then convert that into actual Path objects which could include Rect objects as their own thing or decompose those into Paths, or even segmentize the curves required into points. So you could, for example, go from SVG -> DOM -> Shapely if you wanted. But, you could also round-trip things going from SVG -> DOM -> Editing -> SVG losslessly. Or even go from None -> Build -> SVG much the same way as SVG write does.

Basically reading to a DOM-tree state doing the regular bootstrapping that Browsers do is the correct methodology there. My code currently goes SVG -> Shapes. I did write a super basic save routine since people kept asking and it was a hundred lines of code or whatever and I've written a few different copies here and there, but it's notably going to be lossy since I mostly already processed everything.

I did try rewriting some old code I did in Java into python and it notably got notably slower and so I was going to likely rewrite the core part of the algorithm into C++ though I guess I could look at Rust for that. I still want to use python but maybe through some bindings. Since Rust and C should compile the same or so, just end up being safer in Rust.

Also the Zingl stuff ( https://zingl.github.io/bresenham.html ) for rendering curves though not able to be wildly parallelized is pretty impressive since you can render at the 1 pixel level using pure integer math without trying to figure out the optimal subdivision/segmentization math. I've considered a few times going through a process like porting that stuff to GRBL or something else since really the carry-forward error routines to figure out the discrete steps is wildly under-utilized currently.

from vpype.

abey79 avatar abey79 commented on June 3, 2024

If you're looking for a fast SVG loader and you agree with the choices made by usvg, I would strongly suggest going the route of making a Python wrapper over that library. The maturin + PyO3 are mature tools that would make this process relatively painless, including the necessary hassle of building for all platforms when uploading to PyPI. Also, Rust is massively superior to C/C++ on so many levels!

edit: I'd be more than happy to help with a pyusvg project.

from vpype.

abey79 avatar abey79 commented on June 3, 2024

Some quick and dirty benchmark with what I have so far, which includes SVG parsing, cropping (at polyline level for vpype, at bezier level for the rust prototype), and linearising at the default .1mm tolerance:

image

vpype takes ~140ms after discounting the loading time (import vpype takes 570ms!), vs. 4ms for the rust prototype.

The comparison is somewhat unfair, because there is some additional metadata stuff in vpype case. The way tolerance is dealt with is also different: vpype use it as max segment length regardless of curvature, whereas my rust prototype use it as max polyline vs. true curve distance, which is obviously much better.

image

Anyways, these numbers obviously need to be taken with lots of grains of salt, but the clearly highlight the potential...

I plan to polish a little be the display (white background, stroke and stroke-width, etc.) and push the code online.

from vpype.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.