Coder Social home page Coder Social logo

Comments (7)

jcbrill avatar jcbrill commented on May 27, 2024

@mwichmann Isn't the implementation behavior for None correct and the test expectation incorrect?

The docstring contains:

    Flatten a sequence to a non-nested list.

    Converts either a single scalar or a nested sequence to a non-nested list.
    Note that :func:`flatten` considers strings
    to be scalars instead of sequences like pure Python would.
    """

None is a scalar data type and value.

from scons.

jcbrill avatar jcbrill commented on May 27, 2024

The behavior for sets is probably also correct as a set is not a sequence (i.e., sets cannot be indexed and are not ordered).

A namedtuple is a sequence and would be expanded/flattened which in most use cases is likely undesirable. A namedtuple is probably better handled as a scalar rather than a sequence. Food for thought.

Edit: namedtuple does serialize to json as a list though.

from scons.

mwichmann avatar mwichmann commented on May 27, 2024

Interesting thoughts. Will have to chew on it some more. Don't trust the docstring too much, as I've adjusted it already trying to convey what I think it does. Old manpage used to say:

Takes a sequence (that is, a Python list or tuple) that may contain nested sequences and returns a flattened list containing all of the individual elements in any sequence. This can be helpful for collecting the lists returned by calls to Builders; other Builders will automatically flatten lists specified as input, but direct Python manipulation of these lists does not.

And old docstring:

Flatten() converts either a single scalar or a nested sequence to a non-nested list. Note that flatten() considers strings to be scalars instead of sequences like Python would.

The implication of those in combination is unclear: one says "takes a sequence" and the other scalar or sequence.

from scons.

jcbrill avatar jcbrill commented on May 27, 2024

In the general case, a scalar or sequence argument is useful as the return value is always is a list which in certain circumstances could eliminate type-specific testing (i.e., sequence/string vs scalar) prior to invocation.

For example, calling flatten to get a list of system paths. A scalar string representing a path would be converted to a list within flatten and a lists of paths would be flattened as expected. Either way the result is a list without having to check if the argument passed in is a scalar/string/sequence.

Whether or not the scalar None is meaningful is context dependent. In some contexts, None might be a valid member of nested lists or as a scalar. In other cases, None might imply bad input in the code around the flatten call (which might be guarded by a test for evaluating as true: if scalar_or_sequence: flat_list = flatten(scalar_or_sequence)).

from scons.

bdbaddog avatar bdbaddog commented on May 27, 2024

Not sure I agree that sets shouldn't be flattened.
There's no explicit guarantee that Flatten() retains order, so the fact that sets don't have an order or can be indexed seems not to matter.

That said, do any of our internal datastructures use or return sets currently?
Or is this only a concern if a user were to use a set instead of a list or a dictionary, or other container of items?

from scons.

jcbrill avatar jcbrill commented on May 27, 2024

Keep in mind what follows is intended to be helpful.

Not sure I agree that sets shouldn't be flattened.

Sets probably should be flattened and/or have a boolean keyword argument indicating how sets should be processed.

There's no explicit guarantee that Flatten() retains order, so the fact that sets don't have an order or can be indexed seems not to matter.

I was just pointing out that the existing code type checks are for sequences. From a type perspective, sets and frozensets are not sequences which are ordered and indexible.

That said, do any of our internal datastructures use or return sets currently?

Above my pay grade.

Or is this only a concern if a user were to use a set instead of a list or a dictionary, or other container of items?

Sets are useful proxies for lists as they guarantee uniqueness. A dictionary is not going to be flattened.

The existing do_flatten, flatten, and flatten_sequence implementations have bug(s) lurking but they are not exposed due to their current invocation usage in SCons.

For example (annotated):

def do_flatten(  # pylint: disable=redefined-outer-name,redefined-builtin
    sequence,
    result,
    isinstance=isinstance,  # <-- possibly user-defined kwarg
    StringTypes=StringTypes,  # <-- possibly user-defined kwarg
    SequenceTypes=SequenceTypes,  # <-- possibly user-defined kwarg
) -> None:
    for item in sequence:
        if isinstance(item, StringTypes) or not isinstance(item, SequenceTypes):
            result.append(item)
        else:
            do_flatten(item, result)  # <-- any user-defined kwargs are discarded

The same problem appears in flatten and flatten_sequence.

At a minimum, the flatten and flatten_sequence methods should probably accept None as the default keywords in some cases and explicitly set the desired type tuples as necessary. If this is done, then it is trivial to extend including sets/frozen sets as "sequences" for purposes of flattening.

The script below is also attached in a zip file. The current flatten routines were renamed with names ending in _original. The revised implementations make use of the existing keyword arguments with the bugs fixed.

Notes:

  • This was done quickly so there may be code issues.
  • namedtuple was not addressed.

flatten.py source file in attached flatten.zip:

from collections import UserString, UserList, deque
from collections.abc import MappingView

# proxy for import
StringTypes = (str, UserString)
SequenceTypes = (list, tuple, deque, UserList, MappingView)
SetTypes = (set, frozenset)

# shadowed in functions: save using internal names
_StringTypes = StringTypes
_SequenceTypes = SequenceTypes
_SetTypes = (set, frozenset)

# synthetic types
_FlattenTypes = _SequenceTypes + _SetTypes

##### ORIGINAL CODE

def do_flatten_original(  # pylint: disable=redefined-outer-name,redefined-builtin
    sequence,
    result,
    isinstance=isinstance,        # <-- possibly user-defined kwarg
    StringTypes=StringTypes,      # <-- possibly user-defined kwarg
    SequenceTypes=SequenceTypes,  # <-- possibly user-defined kwarg
) -> None:
    for item in sequence:
        if isinstance(item, StringTypes) or not isinstance(item, SequenceTypes):
            result.append(item)
        else:
            do_flatten_original(item, result) # <- any user-defined kwargs are discarded

def flatten_original(  # pylint: disable=redefined-outer-name,redefined-builtin
    obj,
    isinstance=isinstance,        # <-- possibly user-defined kwarg
    StringTypes=StringTypes,      # <-- possibly user-defined kwarg
    SequenceTypes=SequenceTypes,  # <-- possibly user-defined kwarg
    do_flatten=do_flatten_original,
) -> list:
    """Flatten a sequence to a non-nested list.

    Converts either a single scalar or a nested sequence to a non-nested list.
    Note that :func:`flatten` considers strings
    to be scalars instead of sequences like pure Python would.
    """
    if isinstance(obj, StringTypes) or not isinstance(obj, SequenceTypes):
        return [obj]
    result = []
    # effectively do_flatten code: replace with do_flatten call
    for item in obj:
        if isinstance(item, StringTypes) or not isinstance(item, SequenceTypes):
            result.append(item)
        else:
            do_flatten(item, result)  # <- any user-defined kwargs are discarded
    return result

def flatten_sequence_original(  # pylint: disable=redefined-outer-name,redefined-builtin
    sequence,
    isinstance=isinstance,        # <-- possibly user-defined kwarg
    StringTypes=StringTypes,      # <-- possibly user-defined kwarg
    SequenceTypes=SequenceTypes,  # <-- possibly user-defined kwarg
    do_flatten=do_flatten_original,
) -> list:
    """Flatten a sequence to a non-nested list.

    Same as :func:`flatten`, but it does not handle the single scalar case.
    This is slightly more efficient when one knows that the sequence
    to flatten can not be a scalar.
    """
    result = []
    # effectively do_flatten code: replace with do_flatten call
    for item in sequence:
        if isinstance(item, StringTypes) or not isinstance(item, SequenceTypes):
            result.append(item)
        else:
            do_flatten(item, result)  # <- any user-defined kwargs are discarded
    return result

##### MODIFIED CODE

def do_flatten(  # pylint: disable=redefined-outer-name,redefined-builtin
    sequence,
    result,
    isinstance=isinstance,
    StringTypes=StringTypes,
    SequenceTypes=SequenceTypes,
) -> None:
    for item in sequence:
        if isinstance(item, StringTypes) or not isinstance(item, SequenceTypes):
            result.append(item)
        else:
            do_flatten(
                item,
                result,
                isinstance=isinstance,
                StringTypes=StringTypes,
                SequenceTypes=SequenceTypes,
            )

def flatten(  # pylint: disable=redefined-outer-name,redefined-builtin
    obj,
    isinstance=isinstance,
    StringTypes=None,
    SequenceTypes=None,
    do_flatten=do_flatten,
) -> list:
    """Flatten a sequence to a non-nested list.

    Converts either a single scalar or a nested sequence to a non-nested list.
    Note that :func:`flatten` considers strings
    to be scalars instead of sequences like pure Python would.
    """
    if StringTypes is None:
        StringTypes = _StringTypes

    if SequenceTypes is None:
        SequenceTypes = _FlattenTypes

    result = []
    if isinstance(obj, StringTypes) or not isinstance(obj, SequenceTypes):
        result.append(obj)
    else:
        do_flatten(
            obj,
            result,
            isinstance=isinstance,
            StringTypes=StringTypes,
            SequenceTypes=SequenceTypes,
        )
    return result


def flatten_sequence(  # pylint: disable=redefined-outer-name,redefined-builtin
    sequence,
    isinstance=isinstance,
    StringTypes=None,
    SequenceTypes=None,
    do_flatten=do_flatten,
) -> list:
    """Flatten a sequence to a non-nested list.

    Same as :func:`flatten`, but it does not handle the single scalar case.
    This is slightly more efficient when one knows that the sequence
    to flatten can not be a scalar.
    """
    if StringTypes is None:
        StringTypes = _StringTypes

    if SequenceTypes is None:
        SequenceTypes = _FlattenTypes

    result = []
    do_flatten(
        sequence,
        result,
        isinstance=isinstance,
        StringTypes=StringTypes,
        SequenceTypes=SequenceTypes,
    )
    return result

if __name__ == '__main__':

# just lists, just one level

    result = list(flatten_original([[1, 2, 3], [4, 5, 6], [7], [8, 9]]))
    assert result == [1, 2, 3, 4, 5, 6, 7, 8, 9], result

    result = list(flatten_sequence_original([[1, 2, 3], [4, 5, 6], [7], [8, 9]]))
    assert result == [1, 2, 3, 4, 5, 6, 7, 8, 9], result

    result = list(flatten([[1, 2, 3], [4, 5, 6], [7], [8, 9]]))
    assert result == [1, 2, 3, 4, 5, 6, 7, 8, 9], result

    result = list(flatten_sequence([[1, 2, 3], [4, 5, 6], [7], [8, 9]]))
    assert result == [1, 2, 3, 4, 5, 6, 7, 8, 9], result

# mix list, tuple, set, str and multi-level

    # illustrate bug in original code: kwarg type discarded during recursion

    try:
        result = list(flatten_original([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"], SequenceTypes=_FlattenTypes))
        assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result
    except AssertionError:
        pass
    else:
        raise RuntimeError('expected AssertionError')

    try:
        result = list(flatten_sequence_original([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"], SequenceTypes=_FlattenTypes))
        assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result
    except AssertionError:
        pass
    else:
        raise RuntimeError('expected AssertionError')

    # illustrate modified code preserves kwarg which treats sets as scalars

    try:
        result = list(flatten([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"], SequenceTypes=_SequenceTypes))
        assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result
    except AssertionError:
        pass
    else:
        raise RuntimeError('expected AssertionError')

    try:
        result = list(flatten_sequence([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"], SequenceTypes=_SequenceTypes))
        assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result
    except AssertionError:
        pass
    else:
        raise RuntimeError('expected AssertionError')

    # illustrate modified code handles sets as expected

    result = list(flatten([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"]))
    assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result

    result = list(flatten_sequence([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"]))
    assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result

    result = list(flatten([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"], SequenceTypes=_FlattenTypes))
    assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result

    result = list(flatten_sequence([[1, [2]], (3, 4, {5, 6}, 7), 8, "99"], SequenceTypes=_FlattenTypes))
    assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result

# mix list, tuple, set containing tuple of frozen sets, str and multi-level

    result = list(flatten([[1, [2]], (3, 4, {(frozenset([5]), frozenset([6]))}, 7), 8, "99"]))
    assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result

    result = list(flatten_sequence([[1, [2]], (3, 4, {(frozenset([5]), frozenset([6]))}, 7), 8, "99"]))
    assert result == [1, 2, 3, 4, 5, 6, 7, 8, "99"], result

flatten.zip

from scons.

mwichmann avatar mwichmann commented on May 27, 2024

I have long had a rewritten set of flatten util routines (although the name may say Flatten, that's really just a wrapper to expose it in the Environment), so I was aware of the current buggy nature. They've just never risen to the point where it seemed right to push the PR....

The way the usage is pitched this needn't be terribly comprehensive - there are places where you may call things acting on nodes, which produce a list of answers, and do so in a loop, so you get a list-of-lists. My impression is that's what Flatten was written for and everything else is just stretching it, for completeness, etc.

from scons.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.