Coder Social home page Coder Social logo

scrayping-lecture's Introduction

Crawling & Scraping勉強会(内部向け)

事前準備

# 初回起動方法
(scrayping-lecture) kazu0716 MacBook-Pro-4 $ git clone https://github.com/kazu0716/scrayping-lecture.git
(scrayping-lecture) kazu0716 MacBook-Pro-4 $ cd scrayping-lecture/
(scrayping-lecture) kazu0716 MacBook-Pro-4 $ vagrant up
(scrayping-lecture) kazu0716 MacBook-Pro-4 $ vagrant ssh

# 参考: Vagrantの使い方 : https://qiita.com/skinoshita/items/57ac059ff8b1008f5e1d
  • victim_appsの起動
01021657 CA2842 $ vagrant ssh
ubuntu@ubuntu-xenial:~$ cd scrayping-lecture/victim_apps/
# 下記DBにテーブルを作成&諸々の設定を自動に行う
ubuntu@ubuntu-xenial:~/scrayping-lecture/victim_apps$ python3 manage.py createdb
# 省略 #
Hit enter to use the default (127.0.0.1:8000):

Creating default site record: 127.0.0.1:8000 ...


Creating default account ...

Username (leave blank to use 'ubuntu'):
Email address: [email protected]
Password:
Password (again):
Superuser created successfully.
Installed 2 object(s) from 1 fixture(s)

Would you like to install some initial demo pages?
Eg: About us, Contact form, Gallery. (yes/no): yes

Creating demo pages: About us, Contact form, Gallery ...

Installed 16 object(s) from 3 fixture(s)
# ユーザアカウントを自動作成するScript実行
ubuntu@ubuntu-xenial:~/scrayping-lecture/victim_apps$ python3 manage.py runscript create_users
# サーバ起動(Backgroundで実行)
ubuntu@ubuntu-xenial:~/scrayping-lecture/victim_apps$ nohup python3 manage.py runserver &
[1] 5332
ubuntu@ubuntu-xenial:~/scrayping-lecture/victim_apps$ nohup: ignoring input and appending output to 'nohup.out'
## エラーだとこの後エラーが出る。何もでなければOK
  • answer scriptの実行
ubuntu@ubuntu-xenial:~/scrayping-lecture/answer$ cd/home/ubuntu/scrayping-lecture/answer
# requestsのレクチャの使い方
ubuntu@ubuntu-xenial:~/scrayping-lecture/answer$ python3 request_crawler.py -h
usage: request_crawler.py [-h] -u URL -f FILE_NAME

requests crawler for internal lecture

optional arguments:
  -h, --help            show this help message and exit
  -u URL, --url URL     access url to want to get web-page as htlm file.
  -f FILE_NAME, --filename FILE_NAME
                        filename of output file which got by requetsts access
# scraperの使い方
ubuntu@ubuntu-xenial:~/scrayping-lecture/answer$ python3 scraper.py -h
usage: scraper.py [-h] -f FILE_NAME

scrayping script for internal lecture

optional arguments:
  -h, --help            show this help message and exit
  -f FILE_NAME, --filename FILE_NAME
                        filename of input file to get some tags
# Seleniumの使い方
ubuntu@ubuntu-xenial:~/scrayping-lecture/answer$ python3 selenium_crawler.py

アジェンダ(1回目)

ゴール: requetstsモジュールを利用し、HTMLが取得できるようになる

  • パワーポイントによる基本のレクチャ
  • victim appsの起動
  • requestsモジュールを用いたcrawilng scriptの作成
  • requestsを利用し http://127.0.0.1:8000/ のHTMLを取得し、./htmlへhtmlファイルを保存する

アジェンダ(2回目)

ゴール: Beautiful Soupを利用し、HTMLの要素を取得できる

アジェンダ(3回目)

ゴール: ログインを実施し、ログイン後のHTMLの取得

scrayping-lecture's People

Contributors

kazu0716 avatar

Stargazers

 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.