Coder Social home page Coder Social logo

Comments (13)

StephanTLavavej avatar StephanTLavavej commented on August 16, 2024 2

@frederick-vs-ja I think you missed this paragraph, WG14-N3220 7.29.3.5/6:

If a conversion specifier is not one of the ones previously specified, the behavior is undefined.

I think that the second part of your analysis is correct - while C says that unrecognized conversion specifiers are UB, C++ says that unrecognized conversion specifiers are copied unchanged ([locale.time.put.members]/1 "Each character that is not part of a format sequence is written to s immediately").

It seems that no UCRT changes are necessary here - instead we should alter the STL's wrapper layer to recognize unknown conversion specifiers and ensure that they're copied unchanged (after 5 seconds of thought, escaping them with %% would be minimally intrusive but I don't know how easy/viable that is).

from stl.

StephanTLavavej avatar StephanTLavavej commented on August 16, 2024 2

Improved performance and resistance to UCRT/STL specification divergence would be worthwhile, but that has to be balanced against the complexity and maintenance burden of such a change. Our immediate reactions were "rarrrrrgh!" (@barcharcraz) and "uhhhhhhhh?" (@StephanTLavavej) 😹.

We wouldn't want to encourage anyone to work on that, but we won't reject the idea out of hand.

from stl.

TheStormN avatar TheStormN commented on August 16, 2024 1

@StephanTLavavej I've submitted a PR with simple fix: #4840

However, by looking at the code there I see significant effort to avoid _Strftime crashing. I still do think that if UCRT can't be changed, an new STL version of that function should be created which will not even require hacks to expand the string if more space is needed. Also strftime on Windows is really slow as it is doing char->wchar->char conversion and because of the current way std::put_time is using it by literally invoking it for each format specifier.

So is there any interest in doing STL native version of time_put to avoid calling _Strftime is general? This will also greatly benefit chrono related formatting.

from stl.

frederick-vs-ja avatar frederick-vs-ja commented on August 16, 2024

While undefined behavior is allowed for strftime on invalid input

See WG14-N3220 7.29.3.5/2:

The strftime function places characters into the array pointed to by s as controlled by the string pointed to by format. The format shall be a multibyte character sequence, beginning and ending in its initial shift state. The format string consists of zero or more conversion specifiers and ordinary multibyte characters. A conversion specifier consists of a % character, possibly followed by an E or O modifier character (described later), followed by a character that determines the behavior of the conversion specifier. All ordinary multibyte characters (including the terminating null character) are copied unchanged into the array. If copying takes place between objects that overlap, the behavior is undefined. No more than maxsize characters are placed into the array.

IIUC "%E%J%P" is just a valid input for strftime. The %J part is just not a conversion specifier, and thus should be copied to the output as-is.

but the cplusplus.com docs on the function says if anything wrong happens, it should set the failbit on the stream and not crash the program

The standard wording for put_time is specified in [ext.manip]/10 and [locale.time.put.members]/1. Quoting the latter:

Effects: The first form steps through the sequence from pattern to pat_end, identifying characters that are part of a format sequence. Each character that is not part of a format sequence is written to s immediately, and each format sequence, as it is identified, results in a call to do_put; thus, format elements and other characters are interleaved in the output in the order in which they appear in the pattern. Format sequences are identified by converting each character c to a char value as if by ct.narrow(c, 0), where ct is a reference to ctype<charT> obtained from str.getloc(). The first character of each sequence is equal to '%', followed by an optional modifier character mod and a format specifier character spec as defined for the function strftime. If no modifier character is present, mod is zero. For each valid format sequence identified, calls do_put(s, str, fill, t, spec, mod).

Ditto, there's nothing wrong with put_time. As a result, the program should neither crash nor set failbit. (But you posted the program with a wrong parameter order - the output operand should be std::put_time(now, "%E%J%P").)

I believe it's UCRT's strftime that is buggy, and we can still rely on it if it gets fixed. However, a workaround would be needed if we want put_time to behave correctly as soon as possible.


Personal thought on workaround:

Don't call strftime/_Strftime in UCRT at this moment. Write a "corrected strftime" instead.

from stl.

TheStormN avatar TheStormN commented on August 16, 2024

@frederick-vs-ja Thanks a lot for the input. Yes my example got a bit wrong while I was trying to do a minimal version of it. I've edited my post to correct it.

Personal thought on workaround:

Don't call strftime/_Strftime in UCRT at this moment. Write a "corrected strftime" instead.

Unfortunately this would not solve all the cases. For example std::format does also call std::put_time which calls strftime. I guess there might be more similar cases. Going to chase a huge codebase for all variants which lead to strftime is a bit an overkill.

From past experiences I think requesting any kind of behavioral changes to the UCRT usually gets ignored. Perhaps the best approach here would be for the STL team to create their own version and not use the UCRT provided one. This way not only they will avoid crashes but will also better conform to the C++ standard.

from stl.

frederick-vs-ja avatar frederick-vs-ja commented on August 16, 2024

Going to chase a huge codebase for all variants which lead to strftime is a bit an overkill.

Fortunately, MSVC STL is not so that huge. I think this is the only concentrated relavent entry to UCRT. std::format eventually enters here when formatting chrono types.

_Count = _Strftime(&_Str[0], _Str.size(), _Fmt, _Pt, _Tnames._Getptr());

AFAIK _Strftime doesn't call strftime, but both of them call _Wcsftime_l which raises non-conforming errors.

Perhaps it's still doable to add a _Strftime_v2 function which behaves as corrected _Strftime/strftime.

from stl.

TheStormN avatar TheStormN commented on August 16, 2024

Ah, as a workaround you meant the STL codebase. I thought you implied the application code on which I'm working on.

Yep, I think this is doable and not too complicated. Also it should not affect binary compatibility.

@StephanTLavavej Any thoughts on this one?

from stl.

TheStormN avatar TheStormN commented on August 16, 2024

@StephanTLavavej Will that also cover the cases where some more broken string is passed? For example, lets say I just forward user input to the std::put_time and in all cases I would not expect a crash(which strftime() will do) but some error which can be handled. In general I think in C++ no matter what is the format string, the app should not crash.

P.S. I think a much better solution would be for the UCRT team to alter the UB of strftime() to be more C++ friendly. :)

from stl.

StephanTLavavej avatar StephanTLavavej commented on August 16, 2024

Can you provide a specific example of a "broken string" that does not involve an unknown conversion specifier?

from stl.

TheStormN avatar TheStormN commented on August 16, 2024

Hey, for example "%Y-%-m-%-d" but I guess that will also be taken care of by the escaping? Anyway, whichever workaround you choose, still the checks which strftime is already doing will have to be replicated. I think if the UCRT team alters the UB to conform to C++ we will have a big win here.

from stl.

StephanTLavavej avatar StephanTLavavej commented on August 16, 2024

Yeah, %- is an unknown conversion specifier, so I think that's the same issue.

Getting UCRT changes is extremely difficult - I think we should pursue STL changes here.

from stl.

TheStormN avatar TheStormN commented on August 16, 2024

In that case I would recommend to not use strftime at all and create a complete STL rewrite.

I guess you should have access to UCRT source code, so coping most stuff from there would make this task much easier.

P.S. This will also solve the issues with std::format crashes when used with Chrono types.

from stl.

TheStormN avatar TheStormN commented on August 16, 2024

Well, the fmt library on which the std::format implementation is based already did it and as you already have an agreement with the author, we can take some code from there. :))

from stl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.