Comments (16)
@jjallaire pandas.DataFrame.as_matrix
is the trick
from reticulate.
Feather / Apache Arrow being a data frame serialization framework that supports both R and Python could be useful here. https://github.com/wesm/feather
I took a stab. Any suggestions are welcomed. I have not extensively tested it yet. https://github.com/saurfang/reticulate.df
from reticulate.
from reticulate.
Correct, these are all very fast memory to memory copies of contiguous vectors.
from reticulate.
If Pandas data frames can be decomposed into NumPy arrays then it should be possible to do this without too much more invention.
from reticulate.
Well ... when I once needed to get NumPy into R the only way to do was to wrap an external (C) library as part of RcppCNPy. Maybe there are better ways now, I'd be eager to learn about them either way.
from reticulate.
As has been pointed out, right now this is possible by decomposing the R and/or Pandas data frame into vector / matrixes. I think this is all we will do for the foreseeable future, as fully handling data frames will involve dealing with character vectors, dates, list columns, etc. and end up being too large of a project.
from reticulate.
Makes sense. The handling for those would be very hard to maintain.
from reticulate.
Yes, feather definitely has all of the bits required to do this sorted out. Our plan is to add data frame support to reticulate using the same techniques as feather (sharing code if possible), but not to require a full serialize/deserialize to disk to do the conversion.
from reticulate.
That's very exciting. Specifically, are you talking about converting pandas DataFrame to Apache Arrow format (or something similar) in a memory buffer, and reading that into R via Rcpp (to avoid disk serialization/compression and memory copy)? or would this be a more ambitious implementation of a data.frame backend that lives in external memory entirely?
Any rough timeline that you might be working and releasing this?
from reticulate.
Did this end up working? I'm interested if the R-Python communication can be done with large dataframes without expensive ser/de
from reticulate.
Yes, this is now available: https://rstudio.github.io/reticulate/articles/calling_python.html#data-frames
from reticulate.
Neat! Am I to understand that the discussion around arrays (no copies needed) also applies to dataframes?
from reticulate.
No, Pandas data frames created from NumPy arrays automatically make copies of the arrays. So there "no copy" going from R vector to NumPy array but there is ultimately a copy made by Pandas.
from reticulate.
But that Pandas copy is memory-to-memory, so at least the disk is never involved?
from reticulate.
Hello,
For my case I was reading the data from an S3 bucket and I had the same issue. what helped for me was adding in the Python function the following snippet:
for c in df.columns:
df[c] = np.array(df[c].values)
So the Python function would look like this:
def get_data_from_db(db_name, query):
df = wr.athena.read_sql_query(
sql=query,
database=db_name,
ctas_approach=False
)
for c in df.columns:
df[c] = np.array(df[c].values)
return df
Hope this helps,
Regards
from reticulate.
Related Issues (20)
- Conversion scope does not work in `py_to_r.pandas.core.frame.DataFrame` HOT 8
- NUMPY NOT FOUND BUT ACTUALLY INSTALLED HOT 7
- AttributeError: module 'jax.numpy' has no attribute 'product' HOT 1
- Release reticulate 1.37.0 HOT 1
- Release reticulate 1.36.1
- Support `results = 'hide'` directly in python engine HOT 2
- Interrupting keras training reliably produces a segfault HOT 5
- FR: R Print method for Python callables should show signature HOT 1
- Installing miniconda on Raspberry Pi HOT 2
- Release reticulate 1.37.0
- Custom conversion function no longer works with reticulate >= 1.36.0 HOT 4
- Convert NumPy recarray directly to R data.frame
- Need to reinstall reticulate whenever I want to access arcpy HOT 4
- Timeout when calling Async python function HOT 1
- Corrupt dataframe when converting from pandas to R HOT 5
- Release reticulate 1.38.0 HOT 1
- Reticulate 1.38.0 for R 4.2.1 can install but not import numpy, ... HOT 5
- Error in reticulate::use_python(): failed to initialize requested version of Python HOT 2
- Namespace conflict with python config package HOT 1
- Python variables sourced via `reticulate::source_python()` are not contained within `py` list HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from reticulate.