Coder Social home page Coder Social logo

mkearney / kaggler Goto Github PK

View Code? Open in Web Editor NEW
58.0 7.0 14.0 418 KB

🏁 API client for Kaggle

Home Page: https://github.com/mkearney/kaggler

License: Other

R 100.00%
kaggle machine-learning r rstats r-package neural-networks data data-api mkearney-r-package mkearney-dataset

kaggler's Introduction

kaggler

🏁 An R client for accessing Kaggle’s API

Installation

You can install the dev version of {kaggler} from CRAN with:

## install kaggler package from github
devtools::install_packages("mkearney/kaggler")

API authorization

1. Go to https://www.kaggle.com/ and sign in

2. Click Account or navigate to https://www.kaggle.com/{username}/account

3. Scroll down to the API section and click Create New API Token (which should cause you to download a kaggle.json file with your username and API key)

4. There are a few different ways to store your credentials

  • Save/move the kaggle.json file as ~/.kaggle/kaggle.json
  • Save/move the kaggle.json file to your current working directory
  • Enter your username and key and use the kgl_auth() function like in the example below
kgl_auth(username = "mkearney", key = "9as87f6faf9a8sfd76a9fsd89asdf6dsa9f8")
#> Your Kaggle key has been recorded for this session and saved as `KAGGLE_PAT` environment variable for future sessions.

kgl_competitions_list_.*()

Browse or search for Kaggle compeitions.

## look through all competitions (paginated)
comps1 <- kgl_competitions_list()
comps1
#> # A tibble: 20 x 23
#>   ref     description       id title   url     deadline            category reward organizationName
#> * <chr>   <chr>          <int> <chr>   <chr>   <dttm>              <chr>    <chr>  <chr>           
#> 1 house-~ Predict sales~  5407 House ~ https:~ 2030-01-01 00:00:00 Getting~ Knowl~ Kaggle          
#> 2 digit-~ Learn compute~  3004 Digit ~ https:~ 2030-01-01 00:00:00 Getting~ Knowl~ Kaggle          
#> 3 titanic Start here! P~  3136 Titani~ https:~ 2030-01-01 00:00:00 Getting~ Knowl~ Kaggle          
#> 4 imagen~ Identify and ~  6796 ImageN~ https:~ 2029-12-31 07:00:00 Research Knowl~ ImageNet        
#> 5 imagen~ Identify and ~  6800 ImageN~ https:~ 2029-12-31 07:00:00 Research Knowl~ ImageNet        
#> # ... with 15 more rows, and 14 more variables: organizationRef <chr>, kernelCount <int>,
#> #   teamCount <int>, userHasEntered <lgl>, userRank <lgl>, mergerDeadline <dttm>,
#> #   newEntrantDeadline <dttm>, enabledDate <dttm>, maxDailySubmissions <int>, maxTeamSize <int>,
#> #   evaluationMetric <chr>, awardsPoints <lgl>, isKernelsSubmissionsOnly <lgl>,
#> #   submissionsDisabled <lgl>

## it's paginated, so to see page two:
comps2 <- kgl_competitions_list(page = 2)
comps2
#> # A tibble: 20 x 23
#>   ref     description       id title   url     deadline            category reward organizationName
#> * <chr>   <chr>          <int> <chr>   <chr>   <dttm>              <chr>    <chr>  <chr>           
#> 1 cvpr-2~ Can you segme~  8899 CVPR 2~ https:~ 2018-06-11 23:59:00 Research $2,500 CVPR 2018 WAD   
#> 2 inatur~ Long tailed c~  8243 " iNat~ https:~ 2018-06-04 23:59:00 Research Kudos  <NA>            
#> 3 imater~ Image classif~  8219 iMater~ https:~ 2018-05-30 23:59:00 Research $2,500 <NA>            
#> 4 imater~ Image Classif~  8220 iMater~ https:~ 2018-05-30 23:59:00 Research $2,500 <NA>            
#> 5 landma~ Given an imag~  8396 Google~ https:~ 2018-05-29 23:59:00 Research $2,500 Google          
#> # ... with 15 more rows, and 14 more variables: organizationRef <chr>, kernelCount <int>,
#> #   teamCount <int>, userHasEntered <lgl>, userRank <lgl>, mergerDeadline <dttm>,
#> #   newEntrantDeadline <dttm>, enabledDate <dttm>, maxDailySubmissions <int>, maxTeamSize <lgl>,
#> #   evaluationMetric <chr>, awardsPoints <lgl>, isKernelsSubmissionsOnly <lgl>,
#> #   submissionsDisabled <lgl>

## search by keyword for competitions
imagecomps <- kgl_competitions_list(search = "image")
imagecomps
#> # A tibble: 3 x 23
#>   ref     description       id title   url     deadline            category reward organizationName
#> * <chr>   <chr>          <int> <chr>   <chr>   <dttm>              <chr>    <chr>  <chr>           
#> 1 draper~ "Can you put ~  5229 Draper~ https:~ 2016-06-27 23:59:00 Featured $75,0~ <NA>            
#> 2 carvan~ Automatically~  6927 Carvan~ https:~ 2017-09-27 23:59:00 Featured $25,0~ Carvana         
#> 3 cdisco~ Categorize e-~  7115 "Cdisc~ https:~ 2017-12-14 23:59:00 Featured $35,0~ Cdiscount       
#> # ... with 14 more variables: organizationRef <chr>, kernelCount <int>, teamCount <int>,
#> #   userHasEntered <lgl>, userRank <lgl>, mergerDeadline <dttm>, newEntrantDeadline <dttm>,
#> #   enabledDate <dttm>, maxDailySubmissions <int>, maxTeamSize <int>, evaluationMetric <chr>,
#> #   awardsPoints <lgl>, isKernelsSubmissionsOnly <lgl>, submissionsDisabled <lgl>

kgl_competitions_data_.*()

Look up the datalist for a given Kaggle competition. IF you’ve already accepted the competition rules, then you should be able to download the dataset too (I haven’t gotten there yet to test it)

## data list for a given competition
c1_datalist <- kgl_competitions_data_list(comps1$id[1])
c1_datalist
#> # A tibble: 7 x 6
#>   ref                  description name                 totalBytes url          creationDate       
#> * <chr>                <lgl>       <chr>                     <int> <chr>        <dttm>             
#> 1 data_description.txt NA          data_description.txt      13370 https://www~ 2016-08-25 20:29:24
#> 2 train.csv.gz         NA          train.csv.gz              91387 https://www~ 2016-08-29 20:43:35
#> 3 train.csv            NA          train.csv                460676 https://www~ 2016-08-29 20:43:54
#> 4 test.csv.gz          NA          test.csv.gz               83948 https://www~ 2016-08-29 20:44:10
#> 5 test.csv             NA          test.csv                 451405 https://www~ 2016-08-29 20:44:14
#> # ... with 2 more rows

## download set sets (IF YOU HAVE ACCEPTED COMPETITION RULES)
c1_data <- kgl_competitions_data_download(
  comps1$id[1], c1_datalist$name[1])
#> Warning in kgl_api_get(glue::glue("competitions/data/download/{id}/{fileName}")): Forbidden (HTTP
#> 403).
#> You must accept this competition's rules before you can continue

kgl_datasets_.*()

Get a list of all of the datasets.

## get competitions data list
datasets <- kgl_datasets_list()
datasets
#> # A tibble: 20 x 20
#>   ref       creatorName creatorUrl totalBytes url       lastUpdated         downloadCount isPrivate
#> * <chr>     <chr>       <chr>           <int> <chr>     <dttm>                      <int> <lgl>    
#> 1 passnyc/~ Chris Craw~ crawford       167711 https://~ NA                           2789 FALSE    
#> 2 ramamet4~ Ramanathan  ramamet4      5904947 https://~ NA                            955 FALSE    
#> 3 shrutime~ Shruti Meh~ shrutimeh~    5732263 https://~ NA                           5934 FALSE    
#> 4 heesoo37~ Randi H Gr~ heesoo37      5690692 https://~ NA                            655 FALSE    
#> 5 abecklas~ Andre Beck~ abecklas       357590 https://~ NA                          12143 FALSE    
#> # ... with 15 more rows, and 12 more variables: isReviewed <lgl>, isFeatured <lgl>,
#> #   licenseName <chr>, description <chr>, ownerName <chr>, ownerRef <chr>, kernelCount <int>,
#> #   title <chr>, topicCount <int>, viewCount <int>, voteCount <int>, currentVersionNumber <int>

kgl_competitions_leaderboard_.*()

View the leaderboard for a given competition.

## get competitions data list
c1_leaderboard <- kgl_competitions_leaderboard_view(comps1$id[1])
c1_leaderboard
#> # A tibble: 50 x 4
#>    teamId teamName           submissionDate      score  
#> *   <int> <chr>              <dttm>              <chr>  
#> 1 1780632 GroundTruth        NA                  0.00000
#> 2  439244 DSXL               NA                  0.06628
#> 3 1752010 chi7moveon         NA                  0.10677
#> 4  365763 Paulo Pinto        NA                  0.10910
#> 5 1363349 Dmitry Storozhenko NA                  0.10915
#> # ... with 45 more rows

Note(s)

  • The author is in no way affiliated with Kaggle.com, and, as such, makes no assurances that there won’t be breaking changes to the API at any time.

  • Although I am not affiliated, it’s good practice to be informed, so here is the link to Kaggle’s terms of service: https://www.kaggle.com/terms

kaggler's People

Contributors

mkearney avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

kaggler's Issues

: Unauthorized (HTTP 401)

input:
data <- kgl_datasets_download(owner_dataset = "secareanualin/football-events",
fileName = "events.csv")

output:
Warning message: In kgl_api_get(glue::glue("datasets/download/{ownerSlug}/{datasetSlug}/{fileName}"), : Unauthorized (HTTP 401)

i get this message for every type of data pull... i think its the API... but i reset it like 3 times... any ideas?

Confusion on submissions

There look to be three different functions here for submissions, but I'm not very clear on how they are to be used? So far some basic trial and error hasn't worked. Any suggestions?

Dataset zip downloade

Hi i was trying to download Kaggle Data from borismarjanovic/price-volume-data-for-all-us-stocks-etfs
using kaggler::kgl_datasets_download(owner_dataset = "borismarjanovic/price-volume-data-for-all-us-stocks-etfs",
fileName = XXX)

Is it possible to downlod a zip File using this function ?

I tried several things like download the files seperatly
kaggler::kgl_datasets_download(owner_dataset = "borismarjanovic/price-volume-data-for-all-us-stocks-etfs",
fileName = "acwi.us.txt")
or
kaggler::kgl_datasets_download (owner_dataset = "borismarjanovic/price-volume-data-for-all-us-stocks-etfs",
fileName = "Stocks/acwi.us.txt")
or
kaggler::kgl_datasets_download (owner_dataset = "borismarjanovic/price-volume-data-for-all-us-stocks-etfs",
fileName = "4538_7213_bundle_archive.zip")

but each attemt failed and i got
In kgl_api_get(glue::glue("datasets/download/{ownerSlug}/{datasetSlug}/{fileName}"), :
Not Found (HTTP 404).

download data issue

library(kaggler)

kgl_auth(username = "spirosparaskevas", key = "5c4359b2493e0d4c73fef611273f09e8")

KAGGLE_USERNAME={"spirosparaskevas"}
KAGGLE_KEY={"************************"}

comps <- kgl_competitions_list(search = "kkbox")
comps

c1_datalist <- kgl_competitions_data_list(comps$id[2])
c1_datalist
c1_data <- kgl_competitions_data_download(id = comps$id[2], fileName = c1_datalist$name[1])

final command gives an error: ->

Error in if (is.character(txt) && length(txt) == 1 && nchar(txt, type = "bytes") < : missing value where TRUE/FALSE needed

Any suggestion???

does not produce required data

hi, i used following commands but required train.csv file does not show up:

`
comps1 <- kgl_competitions_list(search = 'santander')

c1_datalist <- kgl_competitions_data_list(10385)

url = c1_datalist$url[1]

df <- read.csv(url)

head(df)
`

Any idea what is going on?

kgl_datasets_download error.

First of all, thank you for this amazing project!

I'm getting some issues when downloading a dataset from kaggle using the code bellow:

kgl_auth(creds_file = '~/.kaggle/kaggle.json')
kgl_auth()
data <- kgl_datasets_download(owner_dataset = "secareanualin/football-events", 
                               fileName = "events.csv")

I get the following error:

Error: lexical error: invalid char in json text.
                                       PK-
                     (right here) ------^

Any ideas?

cant instal package

i used following:
install.packages("devtools") library("devtools") install_github("mkearney/kaggler")
install.packages("devtools")
library("devtools")
install_github("mkearney/kaggler")

then i saw following:

install_github("mkearney/kaggler")
Downloading GitHub repo mkearney/kaggler@master
These packages have more recent versions available.
Which would you like to update?

1: curl (3.0 -> 3.3 ) [CRAN]
2: glue (1.1.1 -> 1.3.0) [CRAN]
3: httr (1.3.1 -> 1.4.0) [CRAN]
4: jsonlite (1.5 -> 1.6 ) [CRAN]
5: mime (0.5 -> 0.6 ) [CRAN]
6: openssl (0.9.7 -> 1.2.1) [CRAN]
7: pkgconfig (2.0.1 -> 2.0.2) [CRAN]
8: R6 (2.2.2 -> 2.3.0) [CRAN]
9: rlang (0.1.2 -> 0.3.1) [CRAN]
10: tibble (1.3.4 -> 2.0.1) [CRAN]
11: CRAN packages only
12: All
13: None
Enter one or more numbers separated by spaces, or an empty line to cancel

I entered 12:

1: 12

when i use:
library("kaggler")

i see:

library("kaggler")
Error in library("kaggler") : no package called ‘kaggler’

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.