Coder Social home page Coder Social logo

myscrapy's Introduction

Scrapy

This project and readme its in github: https://github.com/guilhermetavares/myscrapy In the url https://www.fara.gov/quick-search.html, click on "Active Foreign Principals".

This url "https://efile.fara.gov/pls/apex/f?p=171:130:0::NO:RP,130:P130_DATERANGE:N" is the starts url in Scrapy.

With the startup url set, i inspect the page and she loads a POST in javascript for pagination the results, and this POST is the navigations pages for Scrapy.

The POST url "https://efile.fara.gov/pls/apex/wwv_flow.show" navigates on all data pages avaible.

The item has the structure:

{
        'address': '150 Broomielaw 5 Atlantic Quay',
        'country_name': 'UNITED KINGDOM',
        'date': '05/01/2012',
        'exhibit_url': 'http://www.fara.gov/docs/6334-Exhibit-AB-20160106-1.pdf',
        'foreign_principal': 'Scottish Development International',
        'registration': 'Development Counsellors International',
        'registration_date': '03/12/1993',
        'registration_number': '4777',
        'state': '',
        'url': 'https://efile.fara.gov/pls/apex/f?p=171:200:6624565563159::NO:RP,200:P200_REG_NUMBER,P200_DOC_TYPE,P200_COUNTRY:4777,Exhibit%20AB,UNITED%20KINGDOM'
}

First, clone the project:

git clone [email protected]:guilhermetavares/myscrapy.git

Install the requirements, on enviroment with Python >= 3.4.3:

pip install requirements.txt

To run the tests, cd /faragov/faragov/:

python3 tests.py

The response, running with coverage:

Name                          Stmts   Miss Branch BrPart  Cover
    ---------------------------------------------------------------
    faragov/__init__.py               0      0      0      0   100%
    faragov/middlewares.py           18     18      4      0     0%
    faragov/settings.py               4      4      0      0     0%
    faragov/spiders/__init__.py       0      0      0      0   100%
    faragov/spiders/fara.py         100      0     46      1    99%
    faragov/tests.py                 63      0      6      1    99%
    ---------------------------------------------------------------
    TOTAL                           185     22     56      2    88%

For running the spider:

scrapy crawl fara -o faragov.json

myscrapy's People

Contributors

guilhermetavares avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.