Comments (7)
@mwichmann Isn't the implementation behavior for None correct and the test expectation incorrect?
The docstring contains:
Flatten a sequence to a non-nested list.
Converts either a single scalar or a nested sequence to a non-nested list.
Note that :func:`flatten` considers strings
to be scalars instead of sequences like pure Python would.
"""
None
is a scalar data type and value.
from scons.
The behavior for sets is probably also correct as a set is not a sequence (i.e., sets cannot be indexed and are not ordered).
A namedtuple is a sequence and would be expanded/flattened which in most use cases is likely undesirable. A namedtuple is probably better handled as a scalar rather than a sequence. Food for thought.
Edit: namedtuple does serialize to json as a list though.
from scons.
Interesting thoughts. Will have to chew on it some more. Don't trust the docstring too much, as I've adjusted it already trying to convey what I think it does. Old manpage used to say:
Takes a sequence (that is, a Python list or tuple) that may contain nested sequences and returns a flattened list containing all of the individual elements in any sequence. This can be helpful for collecting the lists returned by calls to Builders; other Builders will automatically flatten lists specified as input, but direct Python manipulation of these lists does not.
And old docstring:
Flatten() converts either a single scalar or a nested sequence to a non-nested list. Note that flatten() considers strings to be scalars instead of sequences like Python would.
The implication of those in combination is unclear: one says "takes a sequence" and the other scalar or sequence.
from scons.
In the general case, a scalar or sequence argument is useful as the return value is always is a list which in certain circumstances could eliminate type-specific testing (i.e., sequence/string vs scalar) prior to invocation.
For example, calling flatten to get a list of system paths. A scalar string representing a path would be converted to a list within flatten and a lists of paths would be flattened as expected. Either way the result is a list without having to check if the argument passed in is a scalar/string/sequence.
Whether or not the scalar None is meaningful is context dependent. In some contexts, None might be a valid member of nested lists or as a scalar. In other cases, None might imply bad input in the code around the flatten call (which might be guarded by a test for evaluating as true: if scalar_or_sequence: flat_list = flatten(scalar_or_sequence)
).
from scons.
Not sure I agree that sets shouldn't be flattened.
There's no explicit guarantee that Flatten() retains order, so the fact that sets don't have an order or can be indexed seems not to matter.
That said, do any of our internal datastructures use or return sets currently?
Or is this only a concern if a user were to use a set instead of a list or a dictionary, or other container of items?
from scons.
Keep in mind what follows is intended to be helpful.
Not sure I agree that sets shouldn't be flattened.
Sets probably should be flattened and/or have a boolean keyword argument indicating how sets should be processed.
There's no explicit guarantee that Flatten() retains order, so the fact that sets don't have an order or can be indexed seems not to matter.
I was just pointing out that the existing code type checks are for sequences. From a type perspective, sets and frozensets are not sequences which are ordered and indexible.
That said, do any of our internal datastructures use or return sets currently?
Above my pay grade.
Or is this only a concern if a user were to use a set instead of a list or a dictionary, or other container of items?
Sets are useful proxies for lists as they guarantee uniqueness. A dictionary is not going to be flattened.
The existing do_flatten
, flatten
, and flatten_sequence
implementations have bug(s) lurking but they are not exposed due to their current invocation usage in SCons.
For example (annotated):
def do_flatten( # pylint: disable=redefined-outer-name,redefined-builtin
sequence,
result,
isinstance=isinstance, # <-- possibly user-defined kwarg
StringTypes=StringTypes, # <-- possibly user-defined kwarg
SequenceTypes=SequenceTypes, # <-- possibly user-defined kwarg
) -> None:
for item in sequence:
if isinstance(item, StringTypes) or not isinstance(item, SequenceTypes):
result.append(item)
else:
do_flatten(item, result) # <-- any user-defined kwargs are discarded
The same problem appears in flatten
and flatten_sequence
.
At a minimum, the flatten
and flatten_sequence
methods should probably accept None
as the default keywords in some cases and explicitly set the desired type tuples as necessary. If this is done, then it is trivial to extend including sets/frozen sets as "sequences" for purposes of flattening.
The script below is also attached in a zip file. The current flatten routines were renamed with names ending in _original
. The revised implementations make use of the existing keyword arguments with the bugs fixed.
Notes:
- This was done quickly so there may be code issues.
- namedtuple was not addressed.
flatten.py
source file in attached flatten.zip
:
from collections import UserString, UserList, deque
from collections.abc import MappingView
# proxy for import
StringTypes = (str, UserString)
SequenceTypes = (list, tuple, deque, UserList, MappingView)
SetTypes = (set, frozenset)
# shadowed in functions: save using internal names
_StringTypes = StringTypes
_SequenceTypes = SequenceTypes
_SetTypes = (set, frozenset)
# synthetic types
_FlattenTypes = _SequenceTypes + _SetTypes
##### ORIGINAL CODE
def do_flatten_original( # pylint: disable=redefined-outer-name,redefined-builtin
sequence,
result,
isinstance=isinstance, # <-- possibly user-defined kwarg
StringTypes=StringTypes, # <-- possibly user-defined kwarg
SequenceTypes=SequenceTypes, # <-- possibly user-defined kwarg
) -> None:
for item in sequence:
if isinstance(item, StringTypes) or not isinstance(item, SequenceTypes):
result.append(item)
else:
do_flatten_original(item, result) # <- any user-defined kwargs are discarded
def flatten_original( # pylint: disable=redefined-outer-name,redefined-builtin
obj,
isinstance=isinstance, # <-- possibly user-defined kwarg
StringTypes=StringTypes, # <-- possibly user-defined kwarg
SequenceTypes=SequenceTypes, # <-- possibly user-defined kwarg
do_flatten=do_flatten_original,
) -> list:
"""Flatten a sequence to a non-nested list.
Converts either a single scalar or a nested sequence to a non-nested list.
Note that :func:`flatten` considers strings
to be scalars instead of sequences like pure Python would.
"""
if isinstance(obj, StringTypes) or not isinstance(obj, SequenceTypes):
return [obj]
result = []
# effectively do_flatten code: replace with do_flatten call
for item in obj:
if isinstance(item, StringTypes) or not isinstance(item, SequenceTypes):
result.append(item)
else:
do_flatten(item, result) # <- any user-defined kwargs are discarded
return result
def flatten_sequence_original( # pylint: disable=redefined-outer-name,redefined-builtin
sequence,
isinstance=isinstance, # <-- possibly user-defined kwarg
StringTypes=StringTypes, # <-- possibly user-defined kwarg
SequenceTypes=SequenceTypes, # <-- possibly user-defined kwarg
do_flatten=do_flatten_original,
) -> list:
"""Flatten a sequence to a non-nested list.
Same as :func:`flatten`, but it does not handle the single scalar case.
This is slightly more efficient when one knows that the sequence
to flatten can not be a scalar.
"""
result = []
# effectively do_flatten code: replace with do_flatten call
for item in sequence:
if isinstance(item, StringTypes) or not isinstance(item, SequenceTypes):
result.append(item)
else:
do_flatten(item, result) # <- any user-defined kwargs are discarded
return result
##### MODIFIED CODE
def do_flatten( # pylint: disable=redefined-outer-name,redefined-builtin
sequence,
result,
isinstance=isinstance,
StringTypes=StringTypes,
SequenceTypes=SequenceTypes,
) -> None:
for item in sequence:
if isinstance(item, StringTypes) or not isinstance(item, SequenceTypes):
result.append(item)
else:
do_flatten(
item,
result,
isinstance=isinstance,
StringTypes=StringTypes,
SequenceTypes=SequenceTypes,
)
def flatten( # pylint: disable=redefined-outer-name,redefined-builtin
obj,
isinstance=isinstance,
StringTypes=None,
SequenceTypes=None,
do_flatten=do_flatten,
) -> list:
"""Flatten a sequence to a non-nested list.
Converts either a single scalar or a nested sequence to a non-nested list.
Note that :func:`flatten` considers strings
to be scalars instead of sequences like pure Python would.
"""
if StringTypes is None:
StringTypes = _StringTypes
if SequenceTypes is None:
SequenceTypes = _FlattenTypes
result = []
if isinstance(obj, StringTypes) or not isinstance(obj, SequenceTypes):
result.append(obj)
else:
do_flatten(
obj,
result,
isinstance=isinstance,
StringTypes=StringTypes,
SequenceTypes=SequenceTypes,
)
return result
def flatten_sequence( # pylint: disable=redefined-outer-name,redefined-builtin
sequence,
isinstance=isinstance,
StringTypes=None,
SequenceTypes=None,
do_flatten=do_flatten,
) -> list:
"""Flatten a sequence to a non-nested list.
Same as :func:`flatten`, but it does not handle the single scalar case.
This is slightly more efficient when one knows that the sequence
to flatten can not be a scalar.
"""
if StringTypes is None:
StringTypes = _StringTypes
if SequenceTypes is None:
SequenceTypes = _FlattenTypes
result = []
do_flatten(
sequence,
result,
isinstance=isinstance,
StringTypes=StringTypes,
SequenceTypes=SequenceTypes,
)
return result
if __name__ == '__main__':
# just lists, just one level
result = list(flatten_original([[1, 2, 3], [4, 5, 6], [7], [8, 9]]))
assert result == [1, 2, 3, 4, 5, 6, 7, 8, 9], result
result = list(flatten_sequence_original([[1, 2, 3], [4, 5, 6], [7], [8, 9]]))
assert result == [1, 2, 3, 4, 5, 6, 7, 8, 9], result
result = list(flatten([[1, 2, 3], [4, 5, 6], [7], [8, 9]]))
assert result == [1, 2, 3, 4, 5, 6, 7, 8, 9], result
result = list(flatten_sequence([[1, 2, 3], [4, 5, 6], [7], [8, 9]]))
assert result == [1, 2, 3, 4, 5, 6, 7, 8, 9], result
# mix list, tuple, set, str and multi-level
# illustrate bug in original code: kwarg type discarded during recursion
try:
result = list(flatten_original([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"], SequenceTypes=_FlattenTypes))
assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result
except AssertionError:
pass
else:
raise RuntimeError('expected AssertionError')
try:
result = list(flatten_sequence_original([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"], SequenceTypes=_FlattenTypes))
assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result
except AssertionError:
pass
else:
raise RuntimeError('expected AssertionError')
# illustrate modified code preserves kwarg which treats sets as scalars
try:
result = list(flatten([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"], SequenceTypes=_SequenceTypes))
assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result
except AssertionError:
pass
else:
raise RuntimeError('expected AssertionError')
try:
result = list(flatten_sequence([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"], SequenceTypes=_SequenceTypes))
assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result
except AssertionError:
pass
else:
raise RuntimeError('expected AssertionError')
# illustrate modified code handles sets as expected
result = list(flatten([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"]))
assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result
result = list(flatten_sequence([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"]))
assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result
result = list(flatten([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"], SequenceTypes=_FlattenTypes))
assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result
result = list(flatten_sequence([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"], SequenceTypes=_FlattenTypes))
assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result
# mix list, tuple, set containing tuple of frozen sets, str and multi-level
result = list(flatten([[1, [2]], (3, 4, {(frozenset([5]), frozenset([6]))}, 7), 8, "99"]))
assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result
result = list(flatten_sequence([[1, [2]], (3, 4, {(frozenset([5]), frozenset([6]))}, 7), 8, "99"]))
assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result
from scons.
I have long had a rewritten set of flatten util routines (although the name may say Flatten, that's really just a wrapper to expose it in the Environment), so I was aware of the current buggy nature. They've just never risen to the point where it seemed right to push the PR....
The way the usage is pitched this needn't be terribly comprehensive - there are places where you may call things acting on nodes, which produce a list of answers, and do so in a loop, so you get a list-of-lists. My impression is that's what Flatten was written for and everything else is just stretching it, for completeness, etc.
from scons.
Related Issues (20)
- run document test failed HOT 3
- Construction of DefaultEnvironment can be racy HOT 4
- Build as administrator is required on windows (sometimes?) HOT 18
- Make the new parallel scheduler the default
- Fortran scanner misses comments in some cases HOT 1
- 4.6.0 distfile: contains pycache files for Python 3.11 HOT 7
- Typo in user documentation HOT 10
- DependencyWarning is somewhat misused HOT 1
- Scanners need to be smarter about text they're scanning
- User Guide: scanner examples could be improved HOT 4
- Pseudo() global function missing
- lxml 5.x breaks doc validation
- Possible race in CacheDir config HOT 5
- Improve json output of Dump HOT 1
- Second Value node example in manpage doesn't work
- testing: implicit cache issue as of Python 3.13a5 HOT 2
- Python 3.13 alpha breaks explain() HOT 3
- TeX scanner does not recognize glossary entries file
- Windows: test directory tempfile scheme makes it hard to avoid virus scanner issues HOT 1
- Scanners getting added even when tool is not present, LaTeXScanner error when `AllowSubstExceptions()` defined
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from scons.