Coder Social home page Coder Social logo

Comments (11)

JoshData avatar JoshData commented on June 8, 2024

The quotes are hard to fix because the YAML library handles that. It's quoted because we're storing THOMAS IDs as strings, and it applies quotes just in the cases when it could be parsed as a valid integer. I believe a leading zero triggers octal notation, which means quotes are omitted just in the case there are no '8' and '9' digits in the number. When we decided on YAML, I had no idea it would be this strange.

The types are consistent by ID type. All THOMAS IDs are strings. I wanted to preserve the leading zeros because that's how it is on THOMAS. But Congress.gov removes leading zeroes. So I'd be OK with changing them all to integers (and thus removing all quotes).

from congress-legislators.

GPHemsley avatar GPHemsley commented on June 8, 2024

Based on my experience having to deal with them to generate people.xml, I would be against changing them all to integers without a good reason. It was already somewhat annoying to figure out when I had to explicitly convert integer to string and when I didn't. (And I couldn't blindly force everything to strings, either, because apparently str() can't handle the accent marks in names.) I'd be more comfortable if every field was a string.

from congress-legislators.

konklone avatar konklone commented on June 8, 2024

I think it is probably best if all unique identifiers are strings, and for
THOMAS IDs to ditch the leading zeroes. Gordon, is there an easy way to
handle this in the serialization logic?

On Wed, Jan 9, 2013 at 12:28 PM, Gordon P. Hemsley <[email protected]

wrote:

Based on my experience having to deal with them to generate people.xml, I
would be against changing them all to integers without a good reason. It
was already somewhat annoying to figure out when I had to explicitly
convert integer to string and when I didn't. (And I couldn't blindly force
everything to strings, either, because apparently str() can't handle the
accent marks in names.) I'd be more comfortable if every field was a string.


Reply to this email directly or view it on GitHubhttps://github.com//issues/25#issuecomment-12056498.

Developer | sunlightfoundation.com

from congress-legislators.

JoshData avatar JoshData commented on June 8, 2024

Why serialization? Just load/update/dump?

from congress-legislators.

konklone avatar konklone commented on June 8, 2024

I just mean, is it possible to deal with this inside the utils.py
functions, without making every script have to remember to make exceptions
for specific fields that need special treatment.

On Wed, Jan 9, 2013 at 12:52 PM, Joshua Tauberer
[email protected]:

Why serialization? Just load/update/dump?


Reply to this email directly or view it on GitHubhttps://github.com//issues/25#issuecomment-12057650.

Developer | sunlightfoundation.com

from congress-legislators.

JoshData avatar JoshData commented on June 8, 2024

If we convert thomas/govtrack IDs to strings w/o zero-padding, they'll all be uniformly quoted because all such strings would be parsed as integers if they had no quote. If we convert thomas to integers, they'll all be uniformly not quoted.

from congress-legislators.

konklone avatar konklone commented on June 8, 2024

Oh, then this sounds easy - we just make them strings without zero-padding.
Problem solved.

On Wed, Jan 9, 2013 at 1:13 PM, Joshua Tauberer [email protected]:

If we convert thomas/govtrack IDs to strings w/o zero-padding, they'll all
be uniformly quoted because all such strings would be parsed as integers if
they had no quote. If we convert thomas to integers, they'll all be
uniformly not quoted.


Reply to this email directly or view it on GitHubhttps://github.com//issues/25#issuecomment-12058628.

Developer | sunlightfoundation.com

from congress-legislators.

GPHemsley avatar GPHemsley commented on June 8, 2024

Well, hold on a moment. Surely there is some benefit to having all IDs of a particular type have the same length? I would be surprised if there weren't some way to force them to always be a string, but I'd have to investigate it. (I'm not familiar with how the Python YAML library works just yet.)

from congress-legislators.

GPHemsley avatar GPHemsley commented on June 8, 2024

Well, I suppose it all depends on how your scripts are generating the YAML. If you're feeding it data from native Python format, then it could be as simple as wrapping the value with str().

from congress-legislators.

konklone avatar konklone commented on June 8, 2024

Any benefits to them being the same length that I can imagine are
outweighed by the hassle in always zero prefixing and unzero prefixing them
as necessary in different contexts. Clients of the data would also feel
this burden.

-- Eric
On Jan 9, 2013 1:39 PM, "Gordon P. Hemsley" [email protected]
wrote:

Well, hold on a moment. Surely there is some benefit to having all IDs of
a particular type have the same length? I would be surprised if there
weren't some way to force them to always be a string, but I'd have to
investigate it. (I'm not familiar with how the Python YAML library works
just yet.)


Reply to this email directly or view it on GitHubhttps://github.com//issues/25#issuecomment-12059736.

from congress-legislators.

JoshData avatar JoshData commented on June 8, 2024

The inconsistency is fixed in 2de1484.

I'm going to close the issue, hoping we can side-step some of the questions about removing leading zeroes and changing datatypes for now. (Of the choices, I'd rather see the thomas ID turned into an integer. Though that'll undo all of the quotes I just added. :-)

from congress-legislators.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.