Coder Social home page Coder Social logo

Cleanup about postcode HOT 8 CLOSED

jamiethompson avatar jamiethompson commented on August 22, 2024
Cleanup

from postcode.

Comments (8)

BenMorel avatar BenMorel commented on August 22, 2024

Hi, I understand where you're coming from. It may make sense to just remove all non-alphanumeric chars, and I just pushed a commit (66f46b0) in this direction.

However, this opens the door to UTF-8 chars being just silently removed from the output, instead of considering the input invalid.

Example: WC2E 9RZ is a valid postcode in the UK.

What about WC2E-9RZ? This should definitely be accepted.
What about WC2E.9RZ or WC2E~9RZ? Possibily, as you suggested.

But what about WC2EĆ©9RZ? IMO This should probably be flagged as invalid, as nobody would ever use a UTF-8 char as a separator, and I wouldn't expect this char to be silently dropped.

Maybe we could just broaden the replacement to all non-alphanumeric printable ASCII (32-126) chars. But then, using a UTF-8 no-break space would flag the postcode as invalid. Oops.

Not sure what's the best solution here. What do you think?

from postcode.

jamiethompson avatar jamiethompson commented on August 22, 2024

It's an interesting point you raise in relation to UTF-8. It's an entirely valid point, although I can't say I've ever run into it in real-world usage. It's worth some further thought though.

In my day job, I see a huge amount of real-world UK postcode data, and people - for some reason - regularly provide their postcodes formatted with hyphens, periods and all manner of other characters. There is another common problem of users regularly transposing digits for characters and vice versa. For example Y01 1AA instead of YO1 1AA. This has a lot to do with people in the UK commonly referring to the digit 0 as "Oh" rather than "Zero". That however is far outside the scope of what we're talking about here!

I think that perhaps it could be argued that what characters should be considered separators is use-case specific. In this sense, perhaps your more limited solution is actually a better one and an application can perform other more specific cleanup prior to formatting.

I am sorry. I realise that I have argued for a change and am now immediately questioning it!

from postcode.

jamiethompson avatar jamiethompson commented on August 22, 2024

I've been thinking about this a lot recently. My opinion has changed. I think that you were indeed right to make the change to strip only a limited set of separators.

Any more aggressive cleansing, if required in a specific use can, can be done prior to passing the value in.

What do you think?

from postcode.

BenMorel avatar BenMorel commented on August 22, 2024

I quite agree with this, although I'm thinking about pushing it a bit farther: what about leaving the entire cleansing to the application?

Because removing only spaces and dashes means that "WC2E 9RZ" would pass, but the same string with a no-break space wouldn't, which might be surprising to someone relying on the defaut behaviour.

Leaving the entire cleansing to the application means that "WC2E 9RZ" would be considered invalid, forcing you to pass "WC2E9RZ". This might be even more confusing, I don't know.

Thoughts?

from postcode.

jamiethompson avatar jamiethompson commented on August 22, 2024

I understand your reasoning but at the same time I think it would be confusing for the user if correctly formatted postcodes were not accepted. This would be true not just of UK postcodes but others like Polish where the format is NN-NNN

from postcode.

BenMorel avatar BenMorel commented on August 22, 2024

I guess we should indeed revert to the original behaviour: only strip spaces and hyphens, as these are the only 2 separators valid across all countries. Maybe the documentation can be made clearer about this, in particular the fact that other separators, including no-break space, will be considered invalid, and that it's up to the application to cleanse other characters as needed.

I've been thinking about more advanced solutions, including using preg's Unicode character properties to detect and accept all kinds of UTF-8 separators / punctuation chars while leaving out alphanumeric characters, but I feel like this would be pushing it too far, and that we should delegate this kind of cleansing to the application. One reason is that we wan't assume that the whole world is UTF-8.

If you're OK with this, I'll revert my commit above and tag a release for #5 alone.

from postcode.

jamiethompson avatar jamiethompson commented on August 22, 2024

I like that your library is simple with clear responsibilities. I should have thought longer before opening this issue. I've enjoyed the discussion though.

Looking forward to seeing #5 released. Thanks again.

from postcode.

BenMorel avatar BenMorel commented on August 22, 2024

Commit reverted. Thanks for the discussion!

from postcode.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.