Coder Social home page Coder Social logo

conda env support about reticulate HOT 24 CLOSED

rstudio avatar rstudio commented on July 23, 2024
conda env support

from reticulate.

Comments (24)

earino avatar earino commented on July 23, 2024 1

I should have known this was already something you would consider! My only bit of advice may be that conda itself comes with a way of listing what environments it knows about:

conda info --envs

So it may be valuable to tie into that plumbing. Aside from adding an explicit call, something like reticulate::conda_env("tensorflow") which "configures" the environment. Nothing really comes to mind. I will definitely think on it further. Thanks for the prompt response!

from reticulate.

jjallaire avatar jjallaire commented on July 23, 2024 1

We've added a $call() method for slightly nicer syntax (d16d51b).

So the code would now read:

parser <- spacy$English()
multiSentence <- "...."
parsedData <- parser$call(multiSentence)

from reticulate.

terrytangyuan avatar terrytangyuan commented on July 23, 2024 1

from reticulate.

jjallaire avatar jjallaire commented on July 23, 2024

It does work with conda. We actually scan for conda environments here:

https://github.com/rstudio/reticulate/blob/master/R/python_config.R#L66-L76

So by default if the first package you import is named "tensorflow" then we'll look in e.g. ~/anaconda/envs/tensorflow automatically. Outside of that you can use either the RETICULATE_PYTHON environment variable to point to the python binary in your conda env or just make sure the env bin directory is first on the path as described here:

https://github.com/rstudio/reticulate#locating-python

Open to any and all suggests for how we can make binding to conda environments more natural and straightforward.

from reticulate.

jjallaire avatar jjallaire commented on July 23, 2024

Okay, thanks. I'll look into that.

from reticulate.

jjallaire avatar jjallaire commented on July 23, 2024

@earino I've implemented a family of use functions that simplify configuration of conda envs, virtual envs, etc. Documentation is here: https://github.com/rstudio/reticulate/#locating-python

from reticulate.

earino avatar earino commented on July 23, 2024

@jjallaire cool am gonna test, will let you know any results on my family of environments.

from reticulate.

earino avatar earino commented on July 23, 2024

@jjallaire I got the following error:

> library(reticulate)
> use_condaenv("pymc_tutorial")
Error in use_condaenv("pymc_tutorial") : 
  object 'conda_locations' not found

I stepped through with the debugger, and even though "conda" is set with the correct path, conda_locations isn't set:

> debugonce(use_condaenv)
> use_condaenv("pymc_tutorial")
debugging in: use_condaenv("pymc_tutorial")
debug: {
    if (identical(conda, "auto")) {
        conda <- Sys.which("conda")
        if (!nzchar(conda)) {
            conda_locations <- c(path.expand("~/anaconda/bin/conda"), 
                path.expand("~/anaconda3/bin/conda"))
            if (is_windows()) {
                anaconda_versions <- read_anaconda_versions_from_registry()
                if (length(anaconda_versions) > 0) {
                  conda_scripts <- file.path(dirname(anaconda_versions), 
                    "Scripts", "conda.exe")
                  conda_locations <- c(conda_locations, conda_scripts)
                }
            }
        }
        conda_locations <- conda_locations[file.exists(conda_locations)]
        if (length(conda_locations) > 0) 
            conda <- conda_locations[[1]]
        else if (required) 
            stop("Unable to locate conda binary, please specify 'conda' argument explicitly.")
        else return(invisible(NULL))
    }
    else if (!file.exists(conda)) {
        stop("Specified conda binary '", conda, "' does not exist.")
    }
    conda_envs <- system2(conda, args = c("info", "--envs"), 
        stdout = TRUE)
    matches <- regexec(paste0("^", condaenv, "[ \\*]+(.*)$"), 
        conda_envs)
    matches <- regmatches(conda_envs, matches)
    for (match in matches) {
        if (length(match) == 2) {
            conda_env_dir <- match[[2]]
            if (!is_windows()) 
                conda_env_dir <- file.path(conda_env_dir, "bin")
            conda_env_python <- file.path(conda_env_dir, "python")
            if (is_windows()) 
                conda_env_python <- paste0(conda_env_python, 
                  ".exe")
            conda_env_python <- normalizePath(conda_env_python)
            use_python(conda_env_python)
            return(invisible(NULL))
        }
    }
    if (required) 
        stop("Unable to locate conda environment '", condaenv, 
            "'.")
    invisible(NULL)
}
Browse[2]> n
debug: if (identical(conda, "auto")) {
    conda <- Sys.which("conda")
    if (!nzchar(conda)) {
        conda_locations <- c(path.expand("~/anaconda/bin/conda"), 
            path.expand("~/anaconda3/bin/conda"))
        if (is_windows()) {
            anaconda_versions <- read_anaconda_versions_from_registry()
            if (length(anaconda_versions) > 0) {
                conda_scripts <- file.path(dirname(anaconda_versions), 
                  "Scripts", "conda.exe")
                conda_locations <- c(conda_locations, conda_scripts)
            }
        }
    }
    conda_locations <- conda_locations[file.exists(conda_locations)]
    if (length(conda_locations) > 0) 
        conda <- conda_locations[[1]]
    else if (required) 
        stop("Unable to locate conda binary, please specify 'conda' argument explicitly.")
    else return(invisible(NULL))
} else if (!file.exists(conda)) {
    stop("Specified conda binary '", conda, "' does not exist.")
}
Browse[2]> n
debug: conda <- Sys.which("conda")
Browse[2]> n
debug: if (!nzchar(conda)) {
    conda_locations <- c(path.expand("~/anaconda/bin/conda"), 
        path.expand("~/anaconda3/bin/conda"))
    if (is_windows()) {
        anaconda_versions <- read_anaconda_versions_from_registry()
        if (length(anaconda_versions) > 0) {
            conda_scripts <- file.path(dirname(anaconda_versions), 
                "Scripts", "conda.exe")
            conda_locations <- c(conda_locations, conda_scripts)
        }
    }
}
Browse[2]> conda
                          conda 
"/usr/local/anaconda/bin/conda" 
Browse[2]> n
debug: conda_locations <- conda_locations[file.exists(conda_locations)]
Browse[2]> n
Error in use_condaenv("pymc_tutorial") : 
  object 'conda_locations' not found
> 

Note that my conda is in /usr/local/anaconda and your path.expands is not looking there?

conda_locations <- c(path.expand("~/anaconda/bin/conda"), 
				path.expand("~/anaconda3/bin/conda"))

This seems like where the bug is.

from reticulate.

jjallaire avatar jjallaire commented on July 23, 2024

Found the issue and just pushed a fix. Try again and let me know if it is now working. Thanks!

from reticulate.

earino avatar earino commented on July 23, 2024

I have good news and I have bad news. :)

The good news is that use_condaenv absolutely positively works. It found all the conda environments I tested. The bad news is that when I tried to use a Python package for a nontrivial thing, I couldn't get it to work:

> library(reticulate)
> 
> use_condaenv("spacy_environment")
> 
> spacy <- import("spacy.en")
> parser <- spacy$English()
> 
> multiSentence = "There is an art, it says, or rather, a knack to flying.
+   The knack lies in learning how to throw yourself at the ground and miss.
+   In the beginning the Universe was created. This has made a lot of people
+   very angry and been widely regarded as a bad move."
> 
> parsedData <- parser$parser(multiSentence)
Error in py_call(attrib, args, keywords) : 
  TypeError: Argument 'tokens' has incorrect type (expected spacy.tokens.doc.Doc, got str)

(note that to get this to work, you need to install spacy and then execute python -m spacy.en.download all to download the corpus.) Would you like for me to file a seperate issue so we can keep these apart?

This is the environment.yml i used to define this conda environment:

name: spacy_environment

dependencies:
- spacy

On a wholly unrelated note, if you're taking requests for C tools to wrap in R, may I suggest Dan Bloomberg's incredible leptonica library? (http://www.leptonica.com/) by far the best image processing library I ever used for morphological image processing.

from reticulate.

terrytangyuan avatar terrytangyuan commented on July 23, 2024

Seems like your argument to parser$parser need to be a spacy.tokens.doc.Doc object instead a string like multiSentence.

from reticulate.

jjallaire avatar jjallaire commented on July 23, 2024

Yes, I noticed that as well (multiSentence is an R character vector so will be passed as a strong).

Let's close this issue and the file a new one if the problem isn't just the type of multiSentence but something deeper.

from reticulate.

earino avatar earino commented on July 23, 2024

@terrytangyuan thanks for your observation. i assure you I try to read error messages :) I don't believe this was a simple case, as this code was taken from this example https://nicschrading.com/project/Intro-to-NLP-with-spaCy/

I am attaching a session of me running that code, including enabling the conda environment and downloading the corpus. You will notice that multiSentence is a simple python string that Python "does the smart thing to" in order to make it a spacy.tokens.doc.Doc. That magic behavior is what works in the python code, that doesn't seem to work in @jjallaire's reticulate

ubuntu@domino-run-589bc3e005f419a2f32b498f:/mnt$ source activate spacy_environment
(spacy_environment) ubuntu@domino-run-589bc3e005f419a2f32b498f:/mnt$ python -m spacy.en.download all
Downloading...
Downloaded 532.28MB 100.00% 15.72MB/s eta 0s
archive.gz checksum/md5 OK
Model successfully installed.
(spacy_environment) ubuntu@domino-run-589bc3e005f419a2f32b498f:/mnt$ cat test.py
# Set up spaCy
from spacy.en import English
parser = English()

# Test Data
multiSentence = "There is an art, it says, or rather, a knack to flying." \
                 "The knack lies in learning how to throw yourself at the ground and miss." \
                 "In the beginning the Universe was created. This has made a lot of people "\
                 "very angry and been widely regarded as a bad move."

# all you have to do to parse text is this:
#note: the first time you run spaCy in a file it takes a little while to load up its modules
parsedData = parser(multiSentence)

# Let's look at the tokens
# All you have to do is iterate through the parsedData
# Each token is an object with lots of different properties
# A property with an underscore at the end returns the string representation
# while a property without the underscore returns an index (int) into spaCy's vocabulary
# The probability estimate is based on counts from a 3 billion word
# corpus, smoothed using the Simple Good-Turing method.
for i, token in enumerate(parsedData):
    print("original:", token.orth, token.orth_)
    print("lowercased:", token.lower, token.lower_)
    print("lemma:", token.lemma, token.lemma_)
    print("shape:", token.shape, token.shape_)
    print("prefix:", token.prefix, token.prefix_)
    print("suffix:", token.suffix, token.suffix_)
    print("log probability:", token.prob)
    print("Brown cluster id:", token.cluster)
    print("----------------------------------------")
    if i > 1:
        break(spacy_environment) ubuntu@domino-run-589bc3e005f419a2f32b498f:/mnt$ ,.te^C
(spacy_environment) ubuntu@domino-run-589bc3e005f419a2f32b498f:/mnt$ python test.py
original: 640 There
lowercased: 530 there
lemma: 530 there
shape: 489815 Xxxxx
prefix: 2907 T
suffix: 48458 ere
log probability: -7.347356796264648
Brown cluster id: 1918
----------------------------------------
original: 474 is
lowercased: 474 is
lemma: 488 be
shape: 21581 xx
prefix: 570 i
suffix: 474 is
log probability: -4.457748889923096
Brown cluster id: 762
----------------------------------------
original: 523 an
lowercased: 523 an
lemma: 523 an
shape: 21581 xx
prefix: 469 a
suffix: 523 an
log probability: -6.014852046966553
Brown cluster id: 3
----------------------------------------
(spacy_environment) ubuntu@domino-run-589bc3e005f419a2f32b498f:/mnt$

@jjallaire if you want, i will open up another issue.

from reticulate.

earino avatar earino commented on July 23, 2024

@terrytangyuan I did some further looking (thanks for the encouragement!) I can see that spaCy calls parseC with a Doc object here:

https://github.com/explosion/spaCy/blob/c9fdd9917c0273c71d00d41b7b1ccfc2006da74a/spacy/syntax/parser.pyx#L142

and when I look I see that this is the Doc class, and how to instantiate it:

https://github.com/explosion/spaCy/blob/master/spacy/tokens/doc.pyx#L60

Code: Construction 1
        doc = nlp.tokenizer(u'Some text')

Code: Construction 2
        doc = Doc(nlp.vocab, orths_and_spaces=[(u'Some', True), (u'text', True)])

This implied to me that I should be able to explicitly construct a Doc object (though sadly it's not needed in the native python implementation.) I was able to get past the error message with the following change:

doc <- parser$tokenizer(multiSentence)
parseData <- parser$parser(doc)

This no longer blows up with an error message. Unfortunately it just returns a NULL :) I will keep hunting this down. Cheers and thanks for this module!

from reticulate.

jjallaire avatar jjallaire commented on July 23, 2024

In the example code it looks like they are using unicode string literals. In Python 2 we marshall R strings as 8-bit strings whereas in Python 3 we marshall them as Unicode strings.

I wonder if the API explicitly requires Unicode strings and if in the case that you are running Python 2 that would be the problem. We could overcome this by providing an explicit unicode function which would force Unicode strings in Python 2. Let me know if this theory holds any water.

from reticulate.

jjallaire avatar jjallaire commented on July 23, 2024

I managed to reproducible case under Python 2.7 and found two things:

  1. I was correct about the Unicode text (as a result added a py_unicode function which can force a character vector to be converted to a Unicode object rather than a String object (note that this shouldn't be a problem under Python 3 since all strings are unicode objects there).

  2. The parser is a "callable" Python object, which means that it's fields can be accessed AND it can be called like a function. We don't have such a beast in R, so it's just treated as an object rather than a function. There is a workaround for this but it's ugly (see below).

Here's the code that behaves as expected working around both issues:

multiSentence = py_unicode("There is an art, it says, or rather, a knack to flying.
 The knack lies in learning how to throw yourself at the ground and miss.
 In the beginning the Universe was created. This has made a lot of people
 very angry and been widely regarded as a bad move.")

parsedData <- parser$`__call__`(multiSentence)

I'll have to think about what the best way to handle callable objects in R is.

from reticulate.

jjallaire avatar jjallaire commented on July 23, 2024

Okay, I added an as.function generic for Python objects, so you can now ask for the callable interface explicitly. This code should now work under Python 3 (under Python 2 you'd need the py_unicode call as well):

library(reticulate)

use_condaenv("spacy_environment")
 
spacy <- import("spacy.en")
parser <- as.function(spacy$English())
 
multiSentence = "There is an art, it says, or rather, a knack to flying.
   The knack lies in learning how to throw yourself at the ground and miss.
   In the beginning the Universe was created. This has made a lot of people
   very angry and been widely regarded as a bad move."
 
parsedData <- parser(multiSentence)

from reticulate.

earino avatar earino commented on July 23, 2024

@jjallaire seems reasonable. From an API standpoint, my mind was moving towards dplyr's do verb.

Something like:

library(reticulate)

use_condaenv("spacy_environment")
 
spacy <- import("spacy.en")
parser <- spacy$English()
 
multiSentence = "There is an art, it says, or rather, a knack to flying.
   The knack lies in learning how to throw yourself at the ground and miss.
   In the beginning the Universe was created. This has made a lot of people
   very angry and been widely regarded as a bad move."
 
parsedData <- do(parser, multiSentence)

With the idea that maybe someday you could have these callable objects in dplyr pipelines? I definitely hadn't thought it out all the way through. Regardless, implemented code beats ideas any day.

from reticulate.

jjallaire avatar jjallaire commented on July 23, 2024

The as.function isn't quite as elegant but is very explicit. I could see it either way. One nice thing about creating a proper R function is that you can then pass that to other R code that doesn't know anything about Python / callables.

from reticulate.

terrytangyuan avatar terrytangyuan commented on July 23, 2024

I had an issue when I was trying to build a wrapper around custom_fn in TF.Learn. The function being returned by the R function turns out to be an environment instead of a function. Would that be related to this?

from reticulate.

jjallaire avatar jjallaire commented on July 23, 2024

It could be that it's returning a callable object rather than a function. Try as.function to see if you get an R function back.

BTW we now represent Python objects as environments rather than raw externalptrs (this is so that we can delay load modules until after the use functions have run to indicate which version of Python to load).

from reticulate.

terrytangyuan avatar terrytangyuan commented on July 23, 2024

Will try it out. I might need to setup Travis soon to test tensorflow and reticulate changes regularly.

from reticulate.

jjallaire avatar jjallaire commented on July 23, 2024

An update, as of the latest development version of reticulate you can now call Python callable objects with the standard R function call syntax (you don't need the special $call syntax, in fact that has now been removed since it's no longer necessary). So the code now reads:

parser <- spacy$English()
multiSentence <- "...."
parsedData <- parser(multiSentence)

While in the above example the parser is a proper R function, it also still carries it's object interface so you can access all of it's properties and methods via $ just as you could before.

from reticulate.

terrytangyuan avatar terrytangyuan commented on July 23, 2024

👍

from reticulate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.