Comments (24)
I should have known this was already something you would consider! My only bit of advice may be that conda itself comes with a way of listing what environments it knows about:
conda info --envs
So it may be valuable to tie into that plumbing. Aside from adding an explicit call, something like reticulate::conda_env("tensorflow")
which "configures" the environment. Nothing really comes to mind. I will definitely think on it further. Thanks for the prompt response!
from reticulate.
We've added a $call()
method for slightly nicer syntax (d16d51b).
So the code would now read:
parser <- spacy$English()
multiSentence <- "...."
parsedData <- parser$call(multiSentence)
from reticulate.
from reticulate.
It does work with conda. We actually scan for conda environments here:
https://github.com/rstudio/reticulate/blob/master/R/python_config.R#L66-L76
So by default if the first package you import is named "tensorflow" then we'll look in e.g. ~/anaconda/envs/tensorflow automatically. Outside of that you can use either the RETICULATE_PYTHON environment variable to point to the python binary in your conda env or just make sure the env bin directory is first on the path as described here:
https://github.com/rstudio/reticulate#locating-python
Open to any and all suggests for how we can make binding to conda environments more natural and straightforward.
from reticulate.
Okay, thanks. I'll look into that.
from reticulate.
@earino I've implemented a family of use functions that simplify configuration of conda envs, virtual envs, etc. Documentation is here: https://github.com/rstudio/reticulate/#locating-python
from reticulate.
@jjallaire cool am gonna test, will let you know any results on my family of environments.
from reticulate.
@jjallaire I got the following error:
> library(reticulate)
> use_condaenv("pymc_tutorial")
Error in use_condaenv("pymc_tutorial") :
object 'conda_locations' not found
I stepped through with the debugger, and even though "conda" is set with the correct path, conda_locations isn't set:
> debugonce(use_condaenv)
> use_condaenv("pymc_tutorial")
debugging in: use_condaenv("pymc_tutorial")
debug: {
if (identical(conda, "auto")) {
conda <- Sys.which("conda")
if (!nzchar(conda)) {
conda_locations <- c(path.expand("~/anaconda/bin/conda"),
path.expand("~/anaconda3/bin/conda"))
if (is_windows()) {
anaconda_versions <- read_anaconda_versions_from_registry()
if (length(anaconda_versions) > 0) {
conda_scripts <- file.path(dirname(anaconda_versions),
"Scripts", "conda.exe")
conda_locations <- c(conda_locations, conda_scripts)
}
}
}
conda_locations <- conda_locations[file.exists(conda_locations)]
if (length(conda_locations) > 0)
conda <- conda_locations[[1]]
else if (required)
stop("Unable to locate conda binary, please specify 'conda' argument explicitly.")
else return(invisible(NULL))
}
else if (!file.exists(conda)) {
stop("Specified conda binary '", conda, "' does not exist.")
}
conda_envs <- system2(conda, args = c("info", "--envs"),
stdout = TRUE)
matches <- regexec(paste0("^", condaenv, "[ \\*]+(.*)$"),
conda_envs)
matches <- regmatches(conda_envs, matches)
for (match in matches) {
if (length(match) == 2) {
conda_env_dir <- match[[2]]
if (!is_windows())
conda_env_dir <- file.path(conda_env_dir, "bin")
conda_env_python <- file.path(conda_env_dir, "python")
if (is_windows())
conda_env_python <- paste0(conda_env_python,
".exe")
conda_env_python <- normalizePath(conda_env_python)
use_python(conda_env_python)
return(invisible(NULL))
}
}
if (required)
stop("Unable to locate conda environment '", condaenv,
"'.")
invisible(NULL)
}
Browse[2]> n
debug: if (identical(conda, "auto")) {
conda <- Sys.which("conda")
if (!nzchar(conda)) {
conda_locations <- c(path.expand("~/anaconda/bin/conda"),
path.expand("~/anaconda3/bin/conda"))
if (is_windows()) {
anaconda_versions <- read_anaconda_versions_from_registry()
if (length(anaconda_versions) > 0) {
conda_scripts <- file.path(dirname(anaconda_versions),
"Scripts", "conda.exe")
conda_locations <- c(conda_locations, conda_scripts)
}
}
}
conda_locations <- conda_locations[file.exists(conda_locations)]
if (length(conda_locations) > 0)
conda <- conda_locations[[1]]
else if (required)
stop("Unable to locate conda binary, please specify 'conda' argument explicitly.")
else return(invisible(NULL))
} else if (!file.exists(conda)) {
stop("Specified conda binary '", conda, "' does not exist.")
}
Browse[2]> n
debug: conda <- Sys.which("conda")
Browse[2]> n
debug: if (!nzchar(conda)) {
conda_locations <- c(path.expand("~/anaconda/bin/conda"),
path.expand("~/anaconda3/bin/conda"))
if (is_windows()) {
anaconda_versions <- read_anaconda_versions_from_registry()
if (length(anaconda_versions) > 0) {
conda_scripts <- file.path(dirname(anaconda_versions),
"Scripts", "conda.exe")
conda_locations <- c(conda_locations, conda_scripts)
}
}
}
Browse[2]> conda
conda
"/usr/local/anaconda/bin/conda"
Browse[2]> n
debug: conda_locations <- conda_locations[file.exists(conda_locations)]
Browse[2]> n
Error in use_condaenv("pymc_tutorial") :
object 'conda_locations' not found
>
Note that my conda is in /usr/local/anaconda and your path.expands is not looking there?
conda_locations <- c(path.expand("~/anaconda/bin/conda"),
path.expand("~/anaconda3/bin/conda"))
This seems like where the bug is.
from reticulate.
Found the issue and just pushed a fix. Try again and let me know if it is now working. Thanks!
from reticulate.
I have good news and I have bad news. :)
The good news is that use_condaenv
absolutely positively works. It found all the conda environments I tested. The bad news is that when I tried to use a Python package for a nontrivial thing, I couldn't get it to work:
> library(reticulate)
>
> use_condaenv("spacy_environment")
>
> spacy <- import("spacy.en")
> parser <- spacy$English()
>
> multiSentence = "There is an art, it says, or rather, a knack to flying.
+ The knack lies in learning how to throw yourself at the ground and miss.
+ In the beginning the Universe was created. This has made a lot of people
+ very angry and been widely regarded as a bad move."
>
> parsedData <- parser$parser(multiSentence)
Error in py_call(attrib, args, keywords) :
TypeError: Argument 'tokens' has incorrect type (expected spacy.tokens.doc.Doc, got str)
(note that to get this to work, you need to install spacy and then execute python -m spacy.en.download all
to download the corpus.) Would you like for me to file a seperate issue so we can keep these apart?
This is the environment.yml i used to define this conda environment:
name: spacy_environment
dependencies:
- spacy
On a wholly unrelated note, if you're taking requests for C tools to wrap in R, may I suggest Dan Bloomberg's incredible leptonica library? (http://www.leptonica.com/) by far the best image processing library I ever used for morphological image processing.
from reticulate.
Seems like your argument to parser$parser
need to be a spacy.tokens.doc.Doc
object instead a string like multiSentence
.
from reticulate.
Yes, I noticed that as well (multiSentence
is an R character vector so will be passed as a strong).
Let's close this issue and the file a new one if the problem isn't just the type of multiSentence
but something deeper.
from reticulate.
@terrytangyuan thanks for your observation. i assure you I try to read error messages :) I don't believe this was a simple case, as this code was taken from this example https://nicschrading.com/project/Intro-to-NLP-with-spaCy/
I am attaching a session of me running that code, including enabling the conda
environment and downloading the corpus. You will notice that multiSentence is a simple python string that Python "does the smart thing to" in order to make it a spacy.tokens.doc.Doc. That magic behavior is what works in the python code, that doesn't seem to work in @jjallaire's reticulate
ubuntu@domino-run-589bc3e005f419a2f32b498f:/mnt$ source activate spacy_environment
(spacy_environment) ubuntu@domino-run-589bc3e005f419a2f32b498f:/mnt$ python -m spacy.en.download all
Downloading...
Downloaded 532.28MB 100.00% 15.72MB/s eta 0s
archive.gz checksum/md5 OK
Model successfully installed.
(spacy_environment) ubuntu@domino-run-589bc3e005f419a2f32b498f:/mnt$ cat test.py
# Set up spaCy
from spacy.en import English
parser = English()
# Test Data
multiSentence = "There is an art, it says, or rather, a knack to flying." \
"The knack lies in learning how to throw yourself at the ground and miss." \
"In the beginning the Universe was created. This has made a lot of people "\
"very angry and been widely regarded as a bad move."
# all you have to do to parse text is this:
#note: the first time you run spaCy in a file it takes a little while to load up its modules
parsedData = parser(multiSentence)
# Let's look at the tokens
# All you have to do is iterate through the parsedData
# Each token is an object with lots of different properties
# A property with an underscore at the end returns the string representation
# while a property without the underscore returns an index (int) into spaCy's vocabulary
# The probability estimate is based on counts from a 3 billion word
# corpus, smoothed using the Simple Good-Turing method.
for i, token in enumerate(parsedData):
print("original:", token.orth, token.orth_)
print("lowercased:", token.lower, token.lower_)
print("lemma:", token.lemma, token.lemma_)
print("shape:", token.shape, token.shape_)
print("prefix:", token.prefix, token.prefix_)
print("suffix:", token.suffix, token.suffix_)
print("log probability:", token.prob)
print("Brown cluster id:", token.cluster)
print("----------------------------------------")
if i > 1:
break(spacy_environment) ubuntu@domino-run-589bc3e005f419a2f32b498f:/mnt$ ,.te^C
(spacy_environment) ubuntu@domino-run-589bc3e005f419a2f32b498f:/mnt$ python test.py
original: 640 There
lowercased: 530 there
lemma: 530 there
shape: 489815 Xxxxx
prefix: 2907 T
suffix: 48458 ere
log probability: -7.347356796264648
Brown cluster id: 1918
----------------------------------------
original: 474 is
lowercased: 474 is
lemma: 488 be
shape: 21581 xx
prefix: 570 i
suffix: 474 is
log probability: -4.457748889923096
Brown cluster id: 762
----------------------------------------
original: 523 an
lowercased: 523 an
lemma: 523 an
shape: 21581 xx
prefix: 469 a
suffix: 523 an
log probability: -6.014852046966553
Brown cluster id: 3
----------------------------------------
(spacy_environment) ubuntu@domino-run-589bc3e005f419a2f32b498f:/mnt$
@jjallaire if you want, i will open up another issue.
from reticulate.
@terrytangyuan I did some further looking (thanks for the encouragement!) I can see that spaCy calls parseC with a Doc object here:
and when I look I see that this is the Doc class, and how to instantiate it:
https://github.com/explosion/spaCy/blob/master/spacy/tokens/doc.pyx#L60
Code: Construction 1
doc = nlp.tokenizer(u'Some text')
Code: Construction 2
doc = Doc(nlp.vocab, orths_and_spaces=[(u'Some', True), (u'text', True)])
This implied to me that I should be able to explicitly construct a Doc object (though sadly it's not needed in the native python implementation.) I was able to get past the error message with the following change:
doc <- parser$tokenizer(multiSentence)
parseData <- parser$parser(doc)
This no longer blows up with an error message. Unfortunately it just returns a NULL :) I will keep hunting this down. Cheers and thanks for this module!
from reticulate.
In the example code it looks like they are using unicode string literals. In Python 2 we marshall R strings as 8-bit strings whereas in Python 3 we marshall them as Unicode strings.
I wonder if the API explicitly requires Unicode strings and if in the case that you are running Python 2 that would be the problem. We could overcome this by providing an explicit unicode
function which would force Unicode strings in Python 2. Let me know if this theory holds any water.
from reticulate.
I managed to reproducible case under Python 2.7 and found two things:
-
I was correct about the Unicode text (as a result added a
py_unicode
function which can force a character vector to be converted to a Unicode object rather than a String object (note that this shouldn't be a problem under Python 3 since all strings are unicode objects there). -
The parser is a "callable" Python object, which means that it's fields can be accessed AND it can be called like a function. We don't have such a beast in R, so it's just treated as an object rather than a function. There is a workaround for this but it's ugly (see below).
Here's the code that behaves as expected working around both issues:
multiSentence = py_unicode("There is an art, it says, or rather, a knack to flying.
The knack lies in learning how to throw yourself at the ground and miss.
In the beginning the Universe was created. This has made a lot of people
very angry and been widely regarded as a bad move.")
parsedData <- parser$`__call__`(multiSentence)
I'll have to think about what the best way to handle callable objects in R is.
from reticulate.
Okay, I added an as.function
generic for Python objects, so you can now ask for the callable interface explicitly. This code should now work under Python 3 (under Python 2 you'd need the py_unicode call as well):
library(reticulate)
use_condaenv("spacy_environment")
spacy <- import("spacy.en")
parser <- as.function(spacy$English())
multiSentence = "There is an art, it says, or rather, a knack to flying.
The knack lies in learning how to throw yourself at the ground and miss.
In the beginning the Universe was created. This has made a lot of people
very angry and been widely regarded as a bad move."
parsedData <- parser(multiSentence)
from reticulate.
@jjallaire seems reasonable. From an API standpoint, my mind was moving towards dplyr's do
verb.
Something like:
library(reticulate)
use_condaenv("spacy_environment")
spacy <- import("spacy.en")
parser <- spacy$English()
multiSentence = "There is an art, it says, or rather, a knack to flying.
The knack lies in learning how to throw yourself at the ground and miss.
In the beginning the Universe was created. This has made a lot of people
very angry and been widely regarded as a bad move."
parsedData <- do(parser, multiSentence)
With the idea that maybe someday you could have these callable objects in dplyr pipelines? I definitely hadn't thought it out all the way through. Regardless, implemented code beats ideas any day.
from reticulate.
The as.function
isn't quite as elegant but is very explicit. I could see it either way. One nice thing about creating a proper R function is that you can then pass that to other R code that doesn't know anything about Python / callables.
from reticulate.
I had an issue when I was trying to build a wrapper around custom_fn in TF.Learn. The function being returned by the R function turns out to be an environment instead of a function. Would that be related to this?
from reticulate.
It could be that it's returning a callable object rather than a function. Try as.function
to see if you get an R function back.
BTW we now represent Python objects as environments rather than raw externalptrs (this is so that we can delay load modules until after the use
functions have run to indicate which version of Python to load).
from reticulate.
Will try it out. I might need to setup Travis soon to test tensorflow and reticulate changes regularly.
from reticulate.
An update, as of the latest development version of reticulate you can now call Python callable objects with the standard R function call syntax (you don't need the special $call
syntax, in fact that has now been removed since it's no longer necessary). So the code now reads:
parser <- spacy$English()
multiSentence <- "...."
parsedData <- parser(multiSentence)
While in the above example the parser
is a proper R function, it also still carries it's object interface so you can access all of it's properties and methods via $
just as you could before.
from reticulate.
👍
from reticulate.
Related Issues (20)
- Release reticulate 1.33.0
- Reticulate not detecting python3-venv even though it is installed so cant create virtualenvs HOT 4
- reticulate::conda_list() Error: lexical error: invalid character inside string. HOT 5
- rmarkdown::render errors when compiling tex to pdf, with png figure created using python code in French locale HOT 3
- Problems with reticulate executing a Quarto file HOT 2
- Better support for Altair compound Chart HOT 1
- FR: `%timeit` magic command in `repl_python()`
- No numpy HOT 3
- Ability to install from test pypi servers HOT 2
- FR: allow `use_python()` to accept command to bootstrap a python installation
- Allow `use_virtualenv()` to fall back to looking for system wide installations.
- Support for `--user` option in `pip_install()` HOT 5
- pymc issue when knitting a document HOT 4
- Trouble using virtualenv_create HOT 2
- Issue with bert models from hugging face HOT 1
- `py_to_r` does not unwrap external pointer HOT 2
- Error when trying to install Python 3.12 HOT 1
- miniconda_path does not substitute $USER HOT 1
- Flag a delayed-load module as a promise or lazy HOT 3
- Python Package Creation Inside RStudio with reticulate.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from reticulate.