Coder Social home page Coder Social logo

TigerLine 2011? about geocoder HOT 15 OPEN

geocommons avatar geocommons commented on July 30, 2024
TigerLine 2011?

from geocoder.

Comments (15)

hexatridecimal avatar hexatridecimal commented on July 30, 2024

It seems this is a problem that's been active since at least the 2010 data came out:

https://groups.google.com/d/msg/geocommons-geocode/PH6g20m7kaU/Z_W065lbyjkJ

It looks like the data files in the 2011 distribution are kept in the same directories, instead of spread out over different states and counties. The principal directories are:

EDGES
ADDR
FEATNAMES

These just contain the zip files directly. The script tiger2009_import seems to do this:

# Foreach ZIP in FEATNAMES ADDR EDGES
unzip -q $ZIP -d $TMP
# We're building SQL here so create the tables
cat ${SQL}/setup.sql > output.sql
# Foreach file in EDGES do
shp2sqlite -aS ${TMP}/*_${file}.shp tiger_${file} >> output.sql
# Foreach file in FEATNAMES and ADDR
shp2sqlite -an ${TMP}/*_${file}.dbf tiger_${file} >> output.sql
# Now do transformations using temporary tables
cat ${SQL}/convert.sql >> output.sql
# Now run the SQL
cat ouput.sql | sqlite3 $DATABASE

Would anyone be offended if I rewrote this using Ruby for Tiger2011?

from geocoder.

hexatridecimal avatar hexatridecimal commented on July 30, 2024

I get why it's written as a single long pipe command, it's an elegant solution to the problem of the size of the data. I have the following script written in Ruby.

https://gist.github.com/1631758

This took roughly two hours on my quad core Mac to create the loading.sql file. It was roughly 99Gb. Unfortunately it seems to get stuck on the "cat loading.sql | sqlite3 #{database}" part. I gave it 16 hours, after which it was stuck using 1% of the CPU. Very strange. Probably need to rewrite it to use a single long pipe.

from geocoder.

alexsenxu avatar alexsenxu commented on July 30, 2024

I could be wrong but the state/county organization from TIGER/Line 2009 might be used in further steps after the import step.

from geocoder.

hexatridecimal avatar hexatridecimal commented on July 30, 2024

I just double checked and I'm not seeing any place where it's used. It looks like it simply imports the shp and dbf files into the database without regard to the folder names / placement. Of course, this shell script is pretty dense stuff for me. Here's my attempt to rewrite the above script while maintaining the whole pipe mechanism.

https://gist.github.com/1694885

I haven't ran it since I just decided to use a commercial product for geocoding. But I hope we can get to the bottom of this and update geocoder to the new database. I'm going to work on it this week-end.

from geocoder.

alexsenxu avatar alexsenxu commented on July 30, 2024

Good call. Just out of curiosity, what commercial geocoding software (or service) are you using?

I am working on porting TIGER/Line2011 onto HDFS instead of a database. Will post update once there are progress.

from geocoder.

hexatridecimal avatar hexatridecimal commented on July 30, 2024

Well, the data I was working on was 90% just city/state/zip. So I used for those:
http://www.zipcodedownload.com/

Then i used the geocoder gem with Bing maps for the last 10%:
https://github.com/alexreisner/geocoder

This is not ideal but I think I ended up with pretty high quality results. I'm hoping to get this geocoder database fixed, it doesn't have any usage limits and it's not locked down by any corporation or government.

from geocoder.

hexatridecimal avatar hexatridecimal commented on July 30, 2024

I can't believe it didn't occur to me but all you need to do is use the tiger_import script. Import for 2011 goes like this:

First follow the Prerequisites section of the Geocoder man page (https://github.com/geocommons/geocoder) but skip "Additionally, you will need a custom build of the ‘sqlite3-ruby’ gem". It's not needed anymore. Next build the geocoder gem:

git clone git://github.com/geocommons/geocoder.git
cd geocoder
make install
gem install Geocoder-US-2.0.2pre.gem

On Mac OS X it will fail at "make install" with "ld: symbol(s) not found for architecture x86_64". Here's the fix:

cd src/shp2sqlite
make -f Makefile.macosx
cd ../..
make install
gem install Geocoder-US-2.0.2pre.gem
ruby -rgeocoder/us -e ''
# This last command will fail with a nasty error like:
# /Users/jjeffus/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/site_ruby/1.9.1/geocoder/us/database.rb:96:in `load_extension': dlopen(/Users/jjeffus/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/site_ruby/1.9.1/geocoder/us/sqlite3.so, 10): image not found (RuntimeError)
# To get a working geocoder/us you need to take the filename after dlopen( and copy the correct file there. In this case
# the file is: /Users/jjeffus/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/site_ruby/1.9.1/geocoder/us/sqlite3.so
# so: cp lib/geocoder/us/sqlite3.so /Users/jjeffus/.rvm/rubies/ruby-1.9.2-p290/lib/ruby/site_ruby/1.9.1/geocoder/us/sqlite3.so

After you have successfully built geocoder::us please do the next from geocoder root.

mkdir data
mkdir database
cd data
wget -nd -r -A.zip ftp://ftp2.census.gov/geo/tiger/TIGER2011/ADDR/
wget -nd -r -A.zip ftp://ftp2.census.gov/geo/tiger/TIGER2011/FEATNAMES/
wget -nd -r -A.zip ftp://ftp2.census.gov/geo/tiger/TIGER2011/EDGES/
cd ..

Now open "build/tiger_import" in the text editor of your choice and change:

SHP2SQLITE=../src/shp2sqlite/shp2sqlite 
# to
SHP2SQLITE="$BASE/shp2sqlite"

Now we can finally do the import:

build/tiger_import database/geocoder.sqlite3 data
chmod +x build/build_indexes
build/build_indexes database/geocoder.sqlite3
sudo gem install text --no-rdoc --no-ri
bin/rebuild_metaphones database/geocoder.sqlite3

It took my Amazon EC2 extra-large instance about 8 hours to do the import. I'm going to put up a torrent of the finished sqlite database, as well as upload it on rapidshare or something. I'll post the links here.

Also, I'm going to fork the codebase and update the docs. This is one of the coolest libraries out there. I hope we can come together as a community and keep this thing working.

from geocoder.

hexatridecimal avatar hexatridecimal commented on July 30, 2024

I've uploaded a torrent of the full data here:
http://assuredwebdevelopment.com/geocoder_us_tigerline_2011.7z.torrent

Backup here:
http://www.mybtfiles.com/torrents/65950942/

from geocoder.

hekaldama avatar hekaldama commented on July 30, 2024

Can someone just upload their sqlite db file with 2011 loaded so that we can just use that? Are there problems with this approach?

from geocoder.

campgurus avatar campgurus commented on July 30, 2024

hekaldama: I did, it's in my last post. I uploaded it as a Torrent file. Let me know how that works out.

from geocoder.

hekaldama avatar hekaldama commented on July 30, 2024

Trying to download now. I am not sure if my firewall is blocking me or not, but it currently isn't downloading...

from geocoder.

mattyb avatar mattyb commented on July 30, 2024

I used this method on the TIGER2012 data. I was able to import and pass the tests. However, there are several lines like this in the log:
/tmp/tiger-import.9161/*_addr.dbf: dbf file (.dbf) can not be opened.
is that something I should be worried about?

from geocoder.

hexatridecimal avatar hexatridecimal commented on July 30, 2024

Here you go guys: https://www.dropbox.com/s/7so3ivq2npxcndy/geocoder_us_tigerline_2011.7z

from geocoder.

DonFuego avatar DonFuego commented on July 30, 2024

Anyone uploaded a 2012 sqlite built database? This 2011 7z file is throwing an error trying to decompress :/

from geocoder.

Shelnutt2 avatar Shelnutt2 commented on July 30, 2024

Here is the 2014 raw sqlite db.
http://downloads.codefi.re/shelnutt2/geocoder_tiger_2014.db

from geocoder.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.