Coder Social home page Coder Social logo

Add new string functions about zeek HOT 10 CLOSED

zeek avatar zeek commented on May 18, 2024
Add new string functions

from zeek.

Comments (10)

moshekaplan avatar moshekaplan commented on May 18, 2024 1

You're welcome to take it over. I've already implemented find and count, in case that helps: string_funcs.diff

from zeek.

spitfire55 avatar spitfire55 commented on May 18, 2024

If nobody is currently working on this, I'm going to write a bunch of string BiFs this week. Here is the list of functions I'll start tackling (inspired by Python string functions):

  • str.rfind
  • str.rstrip
  • str.lstrip
    • I see the notes above about sub being able to do lstrip and rstrip. I think this will be more clear/intuitive, even if its potentially redundant.
  • str.endswith
  • str.isnum
  • str.isalpha
  • str.isalnum
  • str.ljust
  • str.rjust
  • str.swapcase
  • str.to_title
  • str.zfill

from zeek.

timwoj avatar timwoj commented on May 18, 2024

As noted above, the two strip methods are done. If we're going to mimic the python methods, here's what the signatures would look like:

  • function count%(str: string, sub: string%): count: Takes a string and a substring to search for, returns the number of times that substring is seen.
  • function find%(str: string, sub: string, start: count &default=0, end: count &default=0%): count: Takes a string and a substring to search for, returns the index of the start of the substring within the string. Also can take an index to start searching from and an index to stop searching at. start should always be less than end.
  • function rfind%(str: string, sub: string, start: count &default=0, end: count &default=0%): count: The same as find but searches in reverse. Takes a string and a substring to search for, returns the index of the start of the substring within the string. Also can take an index to start searching from and an index to stop searching at. start should always be greater than end.
  • function startswith%(str: string, sub: string%): bool: Returns true or false whether a string starts with a substring. This is easily implemented with find.
  • function endswith%(str: string, sub: string%): bool: Returns true or false whether a string ends with a substring. This is easily implemented with rfind.
  • function isnum%(str: string%): bool: Returns whether the entire string represents a number.
  • function isalpha%(str: string%): bool: Returns whether the entire string is alphabetic characters.
  • function isalnum%(str: string%): bool: Returns whether the entire string is alphanumeric characters.
  • function ljust%(str: string, width: count, fill: string%): string: Returns a copy of a string, left-justified within a number of characters defined by width. The extra characters are filled in with fill. If the string passed for fill is more than one character in length, an error is thrown.
  • function rjust%(str: string, width: count, fill: string%): string: Returns a copy of a string, right-justified within a number of characters defined by width. The extra characters are filled in with fill. If the string passed for fill is more than one character in length, an error is thrown.
  • function swapcase%(str: string%): string: Returns a copy of the string with the cases of all of the character within that string swapped. For example, the string aBc would be returned as AbC.
  • function to_title%(str: string%): string: Returns a copy of the string in titlecase. This means that the first letter of each word in the string will be capitalized. For more info, see https://docs.python.org/2/library/stdtypes.html#str.title
  • function zfill%(str: string, width: count%): string: Returns a copy of the string filled on the left side with zeroes. This is effectively rjust(str, width, "0").

Some questions:

  • We already have strstr, which is effectively the same thing as find.
  • I could definitely see other versions of find and rfind that take patterns, but there's a question of whether those versions should return the position or the string that matches. We already have find_last, but it takes a pattern and not a string. It returns the matched string and not a count.

from zeek.

JustinAzoff avatar JustinAzoff commented on May 18, 2024

if you are copying python, also see https://www.python.org/dev/peps/pep-0616/

add two new methods, removeprefix() and removesuffix(), to the APIs of Python's various string objects. These methods would remove a prefix or suffix (respectively) from a string, if present

reason being the current strip/lstrip methods are often mis-used under the believe that they only remove the literal argument:

print rstrip("banana", "na");

outputs b not bana.

from zeek.

vpax avatar vpax commented on May 18, 2024

Do these have some sort of module scoping to avoid name collisions? count and find seem ripe for such. (In fact, I'd think count would get confused with the type.)

from zeek.

timwoj avatar timwoj commented on May 18, 2024

Do these have some sort of module scoping to avoid name collisions?

The existing string functions don't at the moment, but there's nothing that would stop us from moving them and leaving deprecated versions in the global namespace.

from zeek.

timwoj avatar timwoj commented on May 18, 2024

Python's isnumeric method simply checks to see if every character in the string is a number. Is that sufficient or should it actually check to see if the string is a double, negative, etc? How about things like other bases? Scientific notation?

from zeek.

moshekaplan avatar moshekaplan commented on May 18, 2024

I'd recommend making it the same as python - less cognitive load for devs who code in both languages.

from zeek.

timwoj avatar timwoj commented on May 18, 2024

find and rfind technically work in Python by taking the substring between the indexes and then searching within that substring. This effectively means that if you pass a range smaller than the search string, it'll always return a failure. I currently have it implemented differently, where the start and end positions are related to the start of the match. This means that if the end of the match is past the requested end position, it'll still return success. Does that sound good or should I swap it to the python version?

from zeek.

moshekaplan avatar moshekaplan commented on May 18, 2024

Unless there is a significant benefit from deviating, I think consistency is best

from zeek.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.