Comments (11)
The quotes are hard to fix because the YAML library handles that. It's quoted because we're storing THOMAS IDs as strings, and it applies quotes just in the cases when it could be parsed as a valid integer. I believe a leading zero triggers octal notation, which means quotes are omitted just in the case there are no '8' and '9' digits in the number. When we decided on YAML, I had no idea it would be this strange.
The types are consistent by ID type. All THOMAS IDs are strings. I wanted to preserve the leading zeros because that's how it is on THOMAS. But Congress.gov removes leading zeroes. So I'd be OK with changing them all to integers (and thus removing all quotes).
from congress-legislators.
Based on my experience having to deal with them to generate people.xml, I would be against changing them all to integers without a good reason. It was already somewhat annoying to figure out when I had to explicitly convert integer to string and when I didn't. (And I couldn't blindly force everything to strings, either, because apparently str() can't handle the accent marks in names.) I'd be more comfortable if every field was a string.
from congress-legislators.
I think it is probably best if all unique identifiers are strings, and for
THOMAS IDs to ditch the leading zeroes. Gordon, is there an easy way to
handle this in the serialization logic?
On Wed, Jan 9, 2013 at 12:28 PM, Gordon P. Hemsley <[email protected]
wrote:
Based on my experience having to deal with them to generate people.xml, I
would be against changing them all to integers without a good reason. It
was already somewhat annoying to figure out when I had to explicitly
convert integer to string and when I didn't. (And I couldn't blindly force
everything to strings, either, because apparently str() can't handle the
accent marks in names.) I'd be more comfortable if every field was a string.—
Reply to this email directly or view it on GitHubhttps://github.com//issues/25#issuecomment-12056498.
Developer | sunlightfoundation.com
from congress-legislators.
Why serialization? Just load/update/dump?
from congress-legislators.
I just mean, is it possible to deal with this inside the utils.py
functions, without making every script have to remember to make exceptions
for specific fields that need special treatment.
On Wed, Jan 9, 2013 at 12:52 PM, Joshua Tauberer
[email protected]:
Why serialization? Just load/update/dump?
—
Reply to this email directly or view it on GitHubhttps://github.com//issues/25#issuecomment-12057650.
Developer | sunlightfoundation.com
from congress-legislators.
If we convert thomas/govtrack IDs to strings w/o zero-padding, they'll all be uniformly quoted because all such strings would be parsed as integers if they had no quote. If we convert thomas to integers, they'll all be uniformly not quoted.
from congress-legislators.
Oh, then this sounds easy - we just make them strings without zero-padding.
Problem solved.
On Wed, Jan 9, 2013 at 1:13 PM, Joshua Tauberer [email protected]:
If we convert thomas/govtrack IDs to strings w/o zero-padding, they'll all
be uniformly quoted because all such strings would be parsed as integers if
they had no quote. If we convert thomas to integers, they'll all be
uniformly not quoted.—
Reply to this email directly or view it on GitHubhttps://github.com//issues/25#issuecomment-12058628.
Developer | sunlightfoundation.com
from congress-legislators.
Well, hold on a moment. Surely there is some benefit to having all IDs of a particular type have the same length? I would be surprised if there weren't some way to force them to always be a string, but I'd have to investigate it. (I'm not familiar with how the Python YAML library works just yet.)
from congress-legislators.
Well, I suppose it all depends on how your scripts are generating the YAML. If you're feeding it data from native Python format, then it could be as simple as wrapping the value with str().
from congress-legislators.
Any benefits to them being the same length that I can imagine are
outweighed by the hassle in always zero prefixing and unzero prefixing them
as necessary in different contexts. Clients of the data would also feel
this burden.
-- Eric
On Jan 9, 2013 1:39 PM, "Gordon P. Hemsley" [email protected]
wrote:
Well, hold on a moment. Surely there is some benefit to having all IDs of
a particular type have the same length? I would be surprised if there
weren't some way to force them to always be a string, but I'd have to
investigate it. (I'm not familiar with how the Python YAML library works
just yet.)—
Reply to this email directly or view it on GitHubhttps://github.com//issues/25#issuecomment-12059736.
from congress-legislators.
The inconsistency is fixed in 2de1484.
I'm going to close the issue, hoping we can side-step some of the questions about removing leading zeroes and changing datatypes for now. (Of the choices, I'd rather see the thomas ID turned into an integer. Though that'll undo all of the quotes I just added. :-)
from congress-legislators.
Related Issues (20)
- Bioguide now has structured data HOT 1
- Why doesn't the `legislators-current.{json|yaml}` have social entries for twitter, fb, etc? HOT 1
- "House Select Subcommittee on the Coronavirus Crisis" is listed as a committee
- Unlabeled Issues
- Errors in C-SPAN IDs
- Missing opensecrets id for several legislators in legislators-current.yaml HOT 1
- Birthdays Missing HOT 5
- Sen. Pete Rickets -- Missing HOT 1
- Update Committes for 118th Congress HOT 5
- The 118th Congress - leadership_roles HOT 3
- Missing Committee Members HOT 5
- Update validation scripts for census 2020 district results HOT 1
- Franklin C. Scott Name Mistake HOT 2
- Update URL for LOC Thomas in Readme HOT 1
- Potential inconsistency with Byron Donalds's fec_ids
- Missing opensecrets id for 81 legislators
- Rep. Brian Higgins Resigned HOT 2
- Possible missing FEC code for Senator Pete Ricketts HOT 2
- GitHub Pages: Page not found (404) HOT 1
- Incorrect information for Govtrack IDs 412464 and 456856 HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from congress-legislators.