Coder Social home page Coder Social logo

GB Formatter about postcode HOT 17 CLOSED

brick avatar brick commented on August 22, 2024
GB Formatter

from postcode.

Comments (17)

BenMorel avatar BenMorel commented on August 22, 2024 1

I pushed a few commits, to check the letters in each position (just like your code did), and also to check the area code against the Wikipedia list:

https://github.com/brick/postcode/blob/master/src/Formatter/GBFormatter.php

Could you please test it and report any issue?

from postcode.

BenMorel avatar BenMorel commented on August 22, 2024 1

OK let's add XX then. Done in f5ba3a5. Are we good here?

from postcode.

BenMorel avatar BenMorel commented on August 22, 2024 1

Glag you like it! Closing this one. I'll tag a release as soon as we resolve #4.

from postcode.

BenMorel avatar BenMorel commented on August 22, 2024 1

Released as 0.2.3!

from postcode.

BenMorel avatar BenMorel commented on August 22, 2024

Interesting, thanks for sharing this! Do you have a formal source for this, or is this just empirical testing performed overy the years?

A couple additional questions:

  • Is GIR still in use?

  • Is your list complete? The existing code accepts ANA NAA and AANA NAA, and this is corroborated by Wikipedia

  • What do you think of the following regexp, found in the same Wikipedia article? would you have a chance to test it against your database?

    ^(([A-Z]{1,2}[0-9][A-Z0-9]?|ASCN|STHL|TDCU|BBND|[BFS]IQQ|PCRN|TKCA) ?[0-9][A-Z]{2}|BFPO ?[0-9]{1,4}|(KY[0-9]|MSR|VG|AI)[ -]?[0-9]{4}|[A-Z]{2} ?[0-9]{2}|GE ?CX|GIR ?0A{2}|SAN ?TA1)$
    

from postcode.

jamiethompson avatar jamiethompson commented on August 22, 2024

GIR is technically still in use yeah and unfortunately has to be supported for completeness

The list is complete. It's based on the actual rules set out by Royal Mail and corroborated by real-world use. There is nothing wrong with your regexes in terms of the basic pattern matching but they do not take into account that in certain positions in the various formats, only certain alpha characters are allowed as per the bullet points in the validation section of the wikipedia article.

Here's an example that currently passes despite being an invalid postcode XZ1 1AA

X is not allowed in position one, ever. Same for Z in position two. And secondly, because of this, there isn't actually a postcode area XZ, both because of the character limitations and because there is only actually a fairly short list of actual postcode areas as listed in https://en.wikipedia.org/wiki/List_of_postcode_areas_in_the_United_Kingdom

Edit: I Initially mistakenly used ZX as an example rather than XZ. Ironically ZX is allowed by the rules, but there is also no ZX postcode area in existence.

This isn't an uncommon problem. Most systems that attempt UK postcode validation exhibit this flaw.

I've also more recently been working on a longer but more accurate regex that limits the first part of the postcode specifically to the actual real-world postcode areas. Here's a WIP

^(([BEGLMNSW])[0-9][0-9A-HJKPSTU]{0,1}|(A[BL]|B[ABDHLNRST]|C[ABFHMORTVW]|D[ADEGHLNTY]|E[HNX]|F[KY]|G[LUY]|H[ADGPRSUX]|I[GPV]|K[ATWY]|L[ADELNSU]|M[EKL]|N[EGNPRW]|O[LX]|P[AEHLOR]|R[GHM]|S[AEGKLMNOPRSTWY]|T[ADFNQRSW]|UB|W[ACDFNRSV]|YO|ZE)[0-9][0-9ABEHMNPRV-Y]{0,1})\s*(([0-9])([ABDEFGHJLNPQRSTUWXYZ]{2}))$

I'm not sure that it's quite complete yet, but you perhaps see the direction I'm going with this. The nice thing about this reges is that it extracts the various functional parts of the postcode

  1. Full postcode (SW1A 1AA)
  2. Outward Code (SW1A)
  3. Area (SW)
  4. Inward Code (1AA)
  5. Sector (1)
  6. Unit (AA)

I'm familiar with the regex you mention

^(([A-Z]{1,2}[0-9][A-Z0-9]?|ASCN|STHL|TDCU|BBND|[BFS]IQQ|PCRN|TKCA) ?[0-9][A-Z]{2}|BFPO ?[0-9]{1,4}|(KY[0-9]|MSR|VG|AI)[ -]?[0-9]{4}|[A-Z]{2} ?[0-9]{2}|GE ?CX|GIR ?0A{2}|SAN ?TA1)$

The problem with this is that it also doesn't enforce the alpha character restrictions at various locations in the code. it also incorrectly includes the special cases which relate to various overseas territories. Ascension, Tristan da Cunha, etc. In any real world uses none of these places are considered part of the UK, nor covered by Royal Mail, not least because they all have their own ISO country codes.

from postcode.

BenMorel avatar BenMorel commented on August 22, 2024

Thanks. TBH I'm against overly complicated patterns like these, they're pretty much unmaintainable and I will almost always prefer several lines of codes over this.

If I'm not mistaken you didn't answer the question: what about ANA NAA and AANA NAA?

from postcode.

jamiethompson avatar jamiethompson commented on August 22, 2024

Sorry, yes my list includes ANA NAA and AANA NAA

from postcode.

jamiethompson avatar jamiethompson commented on August 22, 2024

I understand your reluctance to include a regex as complicated as that. I notice a pattern you've used for some countries, although with simpler schemes than the UK is to check for valid prefixes after a basic regex match. Maybe that's something simpler that could be more easily maintained.

from postcode.

BenMorel avatar BenMorel commented on August 22, 2024

Sorry, yes my list includes ANA NAA and AANA NAA

My bad, I misread the comments in your code! Your code matches the notes in the Wikipedia article indeed, so I'd be happy to integrate it in the formatter.

Do you think the formatter should also check the area code against the list of known codes?

from postcode.

BenMorel avatar BenMorel commented on August 22, 2024

I'm actually giving it a shot, and noticed that for alpha3, your list of chars differs from the list in the Wikipedia article:

The only letters to appear in the third position are A, B, C, D, E, F, G, H, J, K, P, S, T, U and W when the structure starts with A9A.

And your list is abcdefghjkstuw, missing the P?

from postcode.

jamiethompson avatar jamiethompson commented on August 22, 2024

Do you think the formatter should also check the area code against the list of known codes?

I think it should. Because even when you take the allowed characters into account, you can still form endless technically correct yet invalid postcodes. This is precisely what I've been planning on doing with my own regex pattern

^(([BEGLMNSW])[0-9][0-9A-HJKPSTU]{0,1}|(A[BL]|B[ABDHLNRST]|C[ABFHMORTVW]|D[ADEGHLNTY]|E[CHNX]|F[KY]|G[LUY]|H[ADGPRSUX]|I[GPV]|K[ATWY]|L[ADELNSU]|M[EKL]|N[EGNPRW]|O[LX]|P[AEHLOR]|R[GHM]|S[AEGKLMNOPRSTWY]|T[ADFNQRSW]|UB|W[ACDFNRSV]|YO|ZE)[0-9][0-9ABEHMNPRV-Y]{0,1})\s*(([0-9])([ABDEFGHJLNPQRSTUWXYZ]{2}))$

⬆️ https://gist.github.com/jamiethompson/fb39c4ddfd9f7b74e447a98ecb52841e

My concern with this is that i've minfied it down in such a way that maintenance might be difficult in future. Another approach I've considered might be more readable would be to extract the area and then test it against an array or valid areas. It is very rare that areas are added or removed. They're considered to pretty much static.

And your list is abcdefghjkstuw, missing the P?

You know what, I think you might be right. This code has been in production for years and that's never been spotted but it is a bug. I think it's because although P is a valid character for position 3 in an A9A style postcode, there aren't actually any real world postcodes (yet) that use P.

from postcode.

jamiethompson avatar jamiethompson commented on August 22, 2024

I really like the way you've broken that down. It's much more readable the way you've laid it out. I'll give this a go later on today.

from postcode.

jamiethompson avatar jamiethompson commented on August 22, 2024

This looks good to me. I have uncovered a couple of edge cases relating to non-geographic postcode areas which should be supported (as with G1R 0AA).

This is supported by
https://en.wikipedia.org/wiki/List_of_postcode_areas_in_the_United_Kingdom#Non-geographic_postcodes

These are simpler. You simply need to add BF and BX to the array of areas.

I don't believe that the XX code should be supported. This is only used by online retailers to produce return labels and (I think deliberately) breaks the specification for the allowed first characters.

from postcode.

BenMorel avatar BenMorel commented on August 22, 2024

Great, thanks for your feedback! I'll add these two. I'm not sure about XX though: this may not be valid for your use case, but may be valid for an application that does process returns?

from postcode.

jamiethompson avatar jamiethompson commented on August 22, 2024

You might be right. XX is the only thing I'm not absolutely sure about. My understanding is that it's only used by large retailers to generate their own return labels, like when you return something to Amazon. I don't think it would hurt to support the XX area though. An application can of course perform its own additional validation if these need to be excluded for any reason.

from postcode.

jamiethompson avatar jamiethompson commented on August 22, 2024

This looks good to me. I'm really appreciative of the work you've done here. I originally thought i'd make these changes myself and hit you with a PR, but what you've done here is much nicer than anything I could have envisioned

from postcode.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.