Comments (13)
@frederick-vs-ja I think you missed this paragraph, WG14-N3220 7.29.3.5/6:
If a conversion specifier is not one of the ones previously specified, the behavior is undefined.
I think that the second part of your analysis is correct - while C says that unrecognized conversion specifiers are UB, C++ says that unrecognized conversion specifiers are copied unchanged ([locale.time.put.members]/1 "Each character that is not part of a format sequence is written to s
immediately").
It seems that no UCRT changes are necessary here - instead we should alter the STL's wrapper layer to recognize unknown conversion specifiers and ensure that they're copied unchanged (after 5 seconds of thought, escaping them with %%
would be minimally intrusive but I don't know how easy/viable that is).
from stl.
Improved performance and resistance to UCRT/STL specification divergence would be worthwhile, but that has to be balanced against the complexity and maintenance burden of such a change. Our immediate reactions were "rarrrrrgh!" (@barcharcraz) and "uhhhhhhhh?" (@StephanTLavavej) 😹.
We wouldn't want to encourage anyone to work on that, but we won't reject the idea out of hand.
from stl.
@StephanTLavavej I've submitted a PR with simple fix: #4840
However, by looking at the code there I see significant effort to avoid _Strftime
crashing. I still do think that if UCRT can't be changed, an new STL version of that function should be created which will not even require hacks to expand the string if more space is needed. Also strftime
on Windows is really slow as it is doing char->wchar->char conversion and because of the current way std::put_time
is using it by literally invoking it for each format specifier.
So is there any interest in doing STL native version of time_put
to avoid calling _Strftime
is general? This will also greatly benefit chrono
related formatting.
from stl.
While undefined behavior is allowed for
strftime
on invalid input
See WG14-N3220 7.29.3.5/2:
The
strftime
function places characters into the array pointed to by s as controlled by the string pointed to byformat
. Theformat
shall be a multibyte character sequence, beginning and ending in its initial shift state. Theformat
string consists of zero or more conversion specifiers and ordinary multibyte characters. A conversion specifier consists of a%
character, possibly followed by anE
orO
modifier character (described later), followed by a character that determines the behavior of the conversion specifier. All ordinary multibyte characters (including the terminating null character) are copied unchanged into the array. If copying takes place between objects that overlap, the behavior is undefined. No more thanmaxsize
characters are placed into the array.
IIUC "%E%J%P"
is just a valid input for strftime
. The %J
part is just not a conversion specifier, and thus should be copied to the output as-is.
but the cplusplus.com docs on the function says if anything wrong happens, it should set the
failbit
on the stream and not crash the program
The standard wording for put_time
is specified in [ext.manip]/10 and [locale.time.put.members]/1. Quoting the latter:
Effects: The first form steps through the sequence from pattern to
pat_end
, identifying characters that are part of a format sequence. Each character that is not part of a format sequence is written to s immediately, and each format sequence, as it is identified, results in a call todo_put
; thus, format elements and other characters are interleaved in the output in the order in which they appear in the pattern. Format sequences are identified by converting each characterc
to achar
value as if byct.narrow(c, 0)
, wherect
is a reference toctype<charT>
obtained fromstr.getloc()
. The first character of each sequence is equal to'%'
, followed by an optional modifier charactermod
and a format specifier characterspec
as defined for the functionstrftime
. If no modifier character is present,mod
is zero. For each valid format sequence identified, callsdo_put(s, str, fill, t, spec, mod)
.
Ditto, there's nothing wrong with put_time
. As a result, the program should neither crash nor set failbit. (But you posted the program with a wrong parameter order - the output operand should be std::put_time(now, "%E%J%P")
.)
I believe it's UCRT's strftime
that is buggy, and we can still rely on it if it gets fixed. However, a workaround would be needed if we want put_time
to behave correctly as soon as possible.
Personal thought on workaround:
Don't call strftime
/_Strftime
in UCRT at this moment. Write a "corrected strftime
" instead.
from stl.
@frederick-vs-ja Thanks a lot for the input. Yes my example got a bit wrong while I was trying to do a minimal version of it. I've edited my post to correct it.
Personal thought on workaround:
Don't call strftime/_Strftime in UCRT at this moment. Write a "corrected strftime" instead.
Unfortunately this would not solve all the cases. For example std::format
does also call std::put_time
which calls strftime
. I guess there might be more similar cases. Going to chase a huge codebase for all variants which lead to strftime
is a bit an overkill.
From past experiences I think requesting any kind of behavioral changes to the UCRT usually gets ignored. Perhaps the best approach here would be for the STL
team to create their own version and not use the UCRT provided one. This way not only they will avoid crashes but will also better conform to the C++ standard.
from stl.
Going to chase a huge codebase for all variants which lead to
strftime
is a bit an overkill.
Fortunately, MSVC STL is not so that huge. I think this is the only concentrated relavent entry to UCRT. std::format
eventually enters here when formatting chrono
types.
Line 749 in ef1d621
AFAIK _Strftime
doesn't call strftime
, but both of them call _Wcsftime_l
which raises non-conforming errors.
Perhaps it's still doable to add a _Strftime_v2
function which behaves as corrected _Strftime
/strftime
.
from stl.
Ah, as a workaround you meant the STL codebase. I thought you implied the application code on which I'm working on.
Yep, I think this is doable and not too complicated. Also it should not affect binary compatibility.
@StephanTLavavej Any thoughts on this one?
from stl.
@StephanTLavavej Will that also cover the cases where some more broken string is passed? For example, lets say I just forward user input to the std::put_time
and in all cases I would not expect a crash(which strftime()
will do) but some error which can be handled. In general I think in C++ no matter what is the format string, the app should not crash.
P.S. I think a much better solution would be for the UCRT team to alter the UB of strftime()
to be more C++ friendly. :)
from stl.
Can you provide a specific example of a "broken string" that does not involve an unknown conversion specifier?
from stl.
Hey, for example "%Y-%-m-%-d" but I guess that will also be taken care of by the escaping? Anyway, whichever workaround you choose, still the checks which strftime
is already doing will have to be replicated. I think if the UCRT team alters the UB to conform to C++ we will have a big win here.
from stl.
Yeah, %-
is an unknown conversion specifier, so I think that's the same issue.
Getting UCRT changes is extremely difficult - I think we should pursue STL changes here.
from stl.
In that case I would recommend to not use strftime at all and create a complete STL rewrite.
I guess you should have access to UCRT source code, so coping most stuff from there would make this task much easier.
P.S. This will also solve the issues with std::format crashes when used with Chrono types.
from stl.
Well, the fmt
library on which the std::format
implementation is based already did it and as you already have an agreement with the author, we can take some code from there. :))
from stl.
Related Issues (20)
- `STL.natvis`: `ranges::view_interface` doesn't always have `size()`
- `STL.natvis`: The VS copy is outdated
- `README.md`: Update working draft revision to N4986
- xlocnum error C2065: 'PTRDIFF_MAX': undeclared identifier HOT 10
- `<type_traits>`: Logical operator traits with non-`bool_constant` arguments emit truncation warnings
- STL: `expected<any, T>` and its friends can break container iterator comparison HOT 2
- Shouldn't `std::exception::what()` be declared as `noexcept`? HOT 2
- <iostream>: std::cout with very low speed compared with printf HOT 3
- <stacktrace>: unloading fails of a dll that first obtained stacktrace
- Support for Win7 / Server 2008 R2 is DOOMED HOT 8
- <format>:How do I specialize the std::formatter with module HOT 1
- `<algorithm>`: `ranges::inplace_merge` doesn't seem to be able to utilize `ranges::upper_bound` HOT 8
- <execution>: Silent data loss in nested parallel file operations HOT 10
- `<yvals_core.h>`: Update `_MSVC_STL_UPDATE` to August 2024
- <mutex>: Problem using mutex with pybind. HOT 3
- C++23 official flag(not /std:c++latest) HOT 16
- `<algorithm>`: `lexicographical_compare_three_way` missing _Mandates_ check
- `<spanstream>`: The `span` constructed by `basic_ispanstream`'s range constructor may be ill-formed HOT 3
- `README.md`: Update working draft revision to N4988
- <iomanip>: std::put_time should not crash on invalid/out-of-range tm struct values HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stl.