Scrapy web crawling framework used to crawl thai website and extract headline and content on their pages for using in acadamic.
Currently, spider website are focus on clickbait and non-clickbait
Website | Description |
---|---|
blognone.com | Tech blog |
matichon.co.th | News |
manager.co.th | News |
khaosod.co.th | News |
thairath.co.th | News |
dailynews.co.th | News |
thematter.co | Online magazine |
themomentum.co | Online magazine |
kanchanapisek.or.th | Thai wiki |
bangkokbiznews.com | News |
baabin.com | - |
btsstation.com | - |
kaijeaw.com | - |
liekr.com | - |
meekhao.com | - |
tsood.com | - |
topicza.com | - |
Name | Description |
---|---|
clickbait100k | Headline of clickbait and non-clickbait 100k samples, 1:1 ratio |