Coder Social home page Coder Social logo

Comments (16)

dbuenzli avatar dbuenzli commented on June 6, 2024

I'd rather not have the "Possible" solution, this has been discussed in this PR #13. I don't think it's a good idea.

I think we should have something like #11 and/or something like #1 and/or Fpath.to_segments : t -> string list

Note I won't have time to work on this anytime soon if you'd like something soon please do a PR.

from fpath.

mjambon avatar mjambon commented on June 6, 2024

Yes, I can make a PR.

I'm wondering why Fpath.t is currently a plain string rather than say a record like

type t = {
  path: string;
  volume: string; (* not sure about this *)
  segments: string list;
}

I'd probably add a syntax field too that would indicate the syntax (POSIX or Windows) of the path field. Do you have an objection against using a record like this?

from fpath.

dbuenzli avatar dbuenzli commented on June 6, 2024

Do you have an objection against using a record like this?

Yes. Please keep the representation as it is. Converting and keeping such a representation incurs significant memory and conversion costs that most of the time you don't need to pay.

(I vaguely remember a discussion this with opam's devs a long time ago where they had a complex type for paths but reverted to simple strings because it registered performance-wise).

The functions I mentioned above should be simple rendering functions.

from fpath.

mjambon avatar mjambon commented on June 6, 2024

ok, sounds good. I'm not sure how to deal with Posix/Windows syntaxes that can now coexist under the same type. Maybe use a single syntax internally? (this would incur a conversion cost when running to_string if this requires a different syntax). I need to look into this closer.

from fpath.

dbuenzli avatar dbuenzli commented on June 6, 2024

Devise the functions on windows and unix separately and choose depending on the windows variable. See e.g. of_string.

from fpath.

mjambon avatar mjambon commented on June 6, 2024

I don't think we can keep the representation as is and keep a single path type unless we're willing to store all paths in Windows format. Here are the options:

Option 1: store all paths in the Windows format

We would need to use the Windows format rather than the native format because of the possibility of having a volume that specifies the root. This would incur a conversion cost each time we call Fpath.to_string on POSIX systems.

This isn't possible because we can't represent the POSIX path a\b in Windows format.

Option 2: keep all paths in the native format

We can no longer do that since we need a place to store the volume coming from Windows paths if we parse a Windows path on a POSIX system.

Option 3: add a tag and keep paths in whichever format works or is cheaper

This is the solution I'm favoring now:

(*
   The main type, representing a path. The internal representation is
   hidden from the user.

   We store paths in string form for performance reasons
   (https://github.com/dbuenzli/fpath/issues/18#issuecomment-1434414621).
   However, some Posix/Windows tag must be available at runtime to specify
   the format of the path since:

   (1) paths in both formats can coexist;
   (2) the default to_string function is assumed to be fast and not require any
       string creation in the default case.

   The default (and legacy) functions that parse or print a path assume
   the OS-native format, which is Windows or POSIX. Extra conversion costs
   can be expected when using a foreign format at parsing or printing time.

   N.B. a path is never "" or something is wrooong.
*)
type t =
| P of string
| W of string

(* The syntax of a path specified as a string. *)
type format =
| Posix
| Windows

Option 4: offer 3 types (native, POSIX, Windows) and explicit conversion between them

Conversion in both directions can fail for different reasons:

  • from POSIX to Windows: segments containing \ can't be converted to Windows
  • from Windows to POSIX: volume names must be dropped

@dbuenzli expressed earlier he'd rather not do this.

Option 5: option 3 + option 4

We'd provide 4 types:

  • native
  • POSIX
  • Windows
  • mixed and user-visible: type t = private Posix of string | Windows of string

I think this is getting a little crazy. I'm mentioning it for completeness.

Option 6: store the path in URI format

This is a concrete syntax that can accommodate both POSIX and Windows paths.

A drawback is that to_string becomes an expensive operation in all cases. I find myself calling Fpath.to_string a lot due to various Unix.* calls that take a string. It's possible that my fears aren't justified since Fpath.to_string gets called a lot too.

I note that supporting the URI format is useful and should probably be supported by Fpath. We can add support for it later within a solution like (3).

Summary

Options 3 and 6 seem viable to me. I'd rather not implement 6 because it's a lot more work than 3 and I'm uncertain about the performance requirements, whereas 3 is more straightforward and I'm confident it won't make performance worse for existing users.

from fpath.

mjambon avatar mjambon commented on June 6, 2024

Regarding performance, I expect this to be acceptable:

type t =
| P of string
| W of string

I suspect the representations that are too expensive in some use cases are those that store the path as a list of segments:

type t = {
  segments: string list;
  ...
}

(increases memory usage and pressure on the garbage collector when a lot of paths are kept in memory)

Benchmarks on realistic code would be good. If someone reads this message and has an application that is really sensitive to how paths are represented, please chime in.

from fpath.

dbuenzli avatar dbuenzli commented on June 6, 2024

I'm afraid but the current scheme has worked well so far there's no need to change it. Could you please say exactly what the problem is ?

from fpath.

mjambon avatar mjambon commented on June 6, 2024

Sorry, I should have started with examples. Here's one:

Fpath.({|C:\a\b|} |> of_windows_string |> to_windows_string)

should return {|C:\a\b|} on all platforms.

The following, however, should be platform-dependent:

Fpath.({|C:\a\b|} |> of_windows_string |> to_string)

On Windows, this should give {|C:\a\b|} but on Linux, it should be an error because we can't represent the C: part. Or maybe we drop it silently, in which case we'd get {|/a/b|}. The result should not be {|C:\a\b|} because, on POSIX, it's a simple file name, not a folder a containing a file b.

The question is how to implement the desired behavior.

from fpath.

dbuenzli avatar dbuenzli commented on June 6, 2024

Thanks for the examples that make things clear. I don't think Fpath will ever provide what you want (more precisely it can support the second point of your first message but not the first).

What you are asking for goes against the abstraction of Fpath which represents your platform dependent file system path in a platform independent manner. It changes its behaviour depending on your platform.

In particular:

On Windows, this should give {|C:\a\b|} but on Linux, it should be an error because we can't represent the C: part.

This is undesirable behaviour, I don't want to discover my paths are wrong when I convert them to use them with a system function. Once they are injected in the type they should be, by definition, good to go.

from fpath.

dbuenzli avatar dbuenzli commented on June 6, 2024

(more precisely it can support the second point of your first message but not the first).

Also I suspect that the need for your first point is rather weak. At a certain point you need to test on the platform you are going to support. Unless you have special needs there's little you will be able to test, execept for path syntactic manipulation operations which are precisely those Fpath is abstracting for you.

OTOH there's a lot to be lost by weakening Fpath.t value invariant to have them stand for file paths of any platform on any platform. It introduces more error paths in client code.

from fpath.

mjambon avatar mjambon commented on June 6, 2024

On Windows, this should give {|C:\a\b|} but on Linux, it should be an error because we can't represent the C: part.

This is undesirable behaviour, I don't want to discover my paths are wrong when I convert them to use them with a system function. Once they are injected in the type they should be, by definition, good to go.

This would be solved by having distinct types for Windows and POSIX (option 4 or 5). There would be 3 modules, each providing its own path type:

  • Fpath.t: a valid native path (exactly the same as today)
  • Fpath_posix.t: a valid POSIX path
  • Fpath_windows.t: a valid Windows path
  1. All the to_string functions would be fail-safe.
  2. This would allow testing both syntaxes on the same platform. Right now, we need to be on Windows to test the Windows syntax. I find this quite impractical.

Alternatively, if we don't want multiple Fpath* modules with similar interfaces, we could still have different types in the same module. This would be type-safe, with no runtime cost:

(* Empty types representing the format of the path string *)
type native
type posix
type windows

(* The type of a path, parametrized by the format. This is a phantom type. *)
type 'a t

val v : string -> native t
val posix_v : string -> posix t
val windows_v : string -> windows t

(* Return the path string in the same syntax as the original. *)
val to_string : 'a t -> string

val parent : 'a t -> 'a t
...

A major issue with this approach is that it's not backward-compatible with previous versions of Fpath because all the functions that used to operate on type t now operate on type 'a t. We would have to introduce a new Fpath module for this, possibly as a submodule of the main Fpath module:

module Poly : sig
  type native
  type posix
  type windows
  type 'a t
  val v : string -> native t
  val posix_v : string -> posix t
  val windows_v : string -> windows t
  val to_string : 'a t -> string
  ...
end

type t = Poly.(native t)
val v : string -> t
val to_string : t -> string
...

In order to use the new (sub)module, a user could write:

module Fpath = Fpath.Poly

... (Fpath.posix_v "/src")...

So in the end, we end up with two modules anyway because of backward compatibility requirements.

from fpath.

dbuenzli avatar dbuenzli commented on June 6, 2024

So far the only problem you have seems to be:

  • We'd like to be able to test the handling of Windows file paths on Unix (so that we don't have too much catch-up to do when we decide our application needs to work on Windows).

But as I wrote, I doubt you can test anything beyond trivialities.

So I want to reiterate again that I'm not going to make Fpath significantly more complicated than it is now for what I think is niche usage (if needed at all) and lures users into doing the wrong things by importing a problem in their code they likely don't have.

I'm happy to provide parsing and rendering functions for best-effort and lossless translations of the other platform syntax in the current platform syntax but not more. That is something of the form:

type volume = string
val of_portable_string : volume option * Fpath.t
val to_posix_path : Fpath.t -> volume option * string
val to_windows_path : ?default_volume:volume -> Fpath.t -> string

If your app really needs cross platform syntax handling on a single platform. You are better of devising your own type possibly based on Fpath.t itself:

type portable_path = volume option * Fpath.t

and work with that.

from fpath.

mjambon avatar mjambon commented on June 6, 2024

I don't understand your proposal. The model I'm pushing for is:

  1. there are multiple popular syntaxes to express file paths
  2. a path library provides validators, parsers, and printers for each syntax
  3. the path library is independent of system calls used for file access
  4. the path library can be tested entirely from any single platform
  5. the path library offers one or more types that allow an application to pass around path values without confusing them with strings that don't represent paths
  6. the path library offers an interface for modifying or combining paths that's independent of path syntax

I understand that you're objecting to point 4 and perhaps point 2. I, on the other hand, think they're essential.

I would like to open the door to supporting other formats such as URIs that are not tied to a particular host OS. Right now, our application is dealing with git paths which are abstract except when writing tests because writing "/src/foo.ml" is quicker and more readable than create [""; "src"; "foo.ml"]. I created a custom path module for it. I'll continue experimenting.

from fpath.

dbuenzli avatar dbuenzli commented on June 6, 2024

I don't understand your proposal

Which part don't you understand ?

I understand that you're objecting to point 4

I'm not objecting to point 4. I'm saying that Fpath's behaviour is platform dependent. As such the point doesn't make sense. And except for introducing conceptual mess, I don't think there's any value to expose internals of the API to make that possible.

I would like to open the door to supporting other formats such as URIs that are not tied to a particular host OS.

There's nothing particularly spectacular about this, #1 should be sufficient.

Again, these things can and should be treated as parsing and rendering problems.

from fpath.

mjambon avatar mjambon commented on June 6, 2024

I'm closing this issue because what I want is not within the scope of Fpath.

from fpath.

Related Issues (15)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.