Comments (2)
Hello boscoraju,
I'm sorry for this stupid mistake. Use index = recordlinkage.Pairs(df_a, df_b)
instead.
Index was used in an old version. It was confusing with the pandas.Index in the pandas API and therefore renamed.
For more docs: http://recordlinkage.readthedocs.io/en/latest/notebooks/link_two_dataframes.html
Good luck.
from recordlinkage.
I've tried changing it to Pairs, but now have a problem with the index line from the example:
politie_sample = politie_sel.sample(frac = 0.001)
lbz_sample = lbz_sel.sample(frac = 0.001)indexer = recordlinkage.Pairs(politie_sample, lbz_sample)
indexer.block('Gender')
candidate_links = indexer.index(politie_sample, lbz_sample)
Traceback (most recent call last):
. File "", line 6, in
candidate_links = indexer.index(politie_sample, lbz_sample)
File "C:\Anaconda3\lib\site-packages\recordlinkage\indexing.py", line 330, in index
d = next(self._iterindex(index_func, *args, **kwargs))
File "C:\Anaconda3\lib\site-packages\recordlinkage\indexing.py", line 483, in _iterindex
*args, **kwargs
TypeError: 'DataFrame' object is not callable
I've then tried to call the last line without data-frames as arguments, but then can't compute the length:
Traceback (most recent call last):
File "", line 1, in
len(candidate_links)TypeError: object of type 'method' has no len()
When I then try to use the Compare function (it asks for an argument, so I supplied the indexer object), I get the following error message:
compare_cl = recordlinkage.Compare(indexer)
compare_cl.exact('Gender', 'Gender', label='Gender')
Traceback (most recent call last):File "", line 1, in
compare_cl = recordlinkage.Compare(indexer)File "C:\Anaconda3\lib\site-packages\recordlinkage\comparing.py", line 86, in init
self.vectors = pandas.DataFrame(index=pairs)File "C:\Anaconda3\lib\site-packages\pandas\core\frame.py", line 266, in init
mgr = self._init_dict(data, index, columns, dtype=dtype)File "C:\Anaconda3\lib\site-packages\pandas\core\frame.py", line 402, in _init_dict
return _arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)File "C:\Anaconda3\lib\site-packages\pandas\core\frame.py", line 5411, in _arrays_to_mgr
index = _ensure_index(index)File "C:\Anaconda3\lib\site-packages\pandas\indexes\base.py", line 3665, in _ensure_index
return Index(index_like)File "C:\Anaconda3\lib\site-packages\pandas\indexes\base.py", line 312, in new
subarr = _asarray_tuplesafe(data, dtype=object)File "C:\Anaconda3\lib\site-packages\pandas\core\common.py", line 369, in _asarray_tuplesafe
values = list(values)TypeError: 'Pairs' object is not iterable
Is there a way to go back to the original Index function?
from recordlinkage.
Related Issues (20)
- threshold in at compere is broken
- missing values HOT 4
- compare.date
- What languages are supported by this toolkit? only English?
- optimize Performance ?
- fastparquet 0.8.1: writing dataframe to parquet file from a table data field with rtf doc content falls with TypeError exception
- Data Corruptors a la GeCO
- AttributeError: module 'recordlinkage' has no attribute 'SortedNeighbourhoodIndex' HOT 1
- How to utilize prob-related methods of ECM classifier
- Support for pandas datatypes
- missing value is not working and it is default to 0 even if we change the value. HOT 1
- Possible bug with _dedup_index when df has only 1 row.
- For when support for packages like Dask or Ray (or Modin)?
- Candidate pairs issue
- Indexing - performance warning - full index can result in a large number of pairs HOT 3
- `ECMClassifier` returns almost all candidate pairs HOT 2
- Address Matching Conditional on value of another column HOT 1
- Duplicated matching columns with rl_comparer.compute while looping over zip code HOT 2
- automatically check how many components are defined in rl.Compare()
- Length mismatch at
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from recordlinkage.