Comments (5)
Pushed fixes for 1 and 2 in master. 3 is left for later as it ties into mapping.
The handling of URL (between Rest/BufferedRest client) is not settled down yet but once the mapping feature comes into play, this will be addressed as well.
Thanks for the feedback and it would be great to get feedback on the ESTap in particular with regards to [1]. This was added through [2] since without it only one split was used instead of one-per-shard (default of 5) and I'm still not sure why it occurs (only with Cascading).
Cheers,
[1] https://github.com/elasticsearch/elasticsearch-hadoop/blob/master/src/main/java/org/elasticsearch/hadoop/cascading/ESHadoopTap.java#L52
[2] e04a5ce
from elasticsearch-hadoop.
Thanks man, that was fast :-)
On 06/11/2013 02:05 PM, Costin Leau wrote:
Pushed fixes for 1 and 2 in master. 3 is left for later as it ties into mapping.
The handling of URL (between Rest/BufferedRest client) is not settled down yet but once the mapping feature comes into play, this will be addressed as well.Thanks for the feedback and it would be great to get feedback on the ESTap in particular with regards to [1]. This was added through [2] since without it only one split was used instead of one-per-shard (default of 5) and I'm still not sure why it occurs (only with Cascading).
Cheers,
[1] https://github.com/elasticsearch/elasticsearch-hadoop/blob/master/src/main/java/org/elasticsearch/hadoop/cascading/ESHadoopTap.java#L52
[2] e04a5ce
Reply to this email directly or view it on GitHub:
#52 (comment)
from elasticsearch-hadoop.
Hi Costin,
We at cascading http://cascading.org are about to deliver a new
feature - (data) provider plugins for cascading and lingual.
As an example, I've create an ES plugin.
Essentially, it has this contract:
public class ElasticsearchProviderFactory
public String description()
public Scheme createScheme(Fields fields, Properties properties) //
{new ESTap(,"dummy-resource",), esTap.sourceConfInit(
FlowProcess.NULL,), esTap.getScheme()}
public Tap createTap(Scheme scheme, String path, Properties
properties)// {new ESTap(,scheme.getSourceFields())}
plus cascading/lingual/catalog/provider.properties with at least
factory.class.name =
cascading.lingual.catalog.ElasticsearchProviderFactory
There are 2 implementations:
- static dependencies via gradle maven (fatjar) and
- dynamic dependencies from remote maven repos via jcabi-aether
http://www.jcabi.com/jcabi-aether/.
Both work fine.
Loading 1000 records from a tab-delimited file (your 'artists') into ES:
hits:{
- total:994
- max_score:1
- hits:[
o {
+ _index:artists
+ _type:artist
+ _id:Mexa6CRkQTSm8k7JS6OBsg
+ _score:1
+ _source:{
# Id:16
# Name:London After Midnight
# PageUrl:http://www.last.fm/music/London+After+Midnight
# PictureUrl:http://userserve-ak.last.fm/serve/252/5364091.jpg
}
}
o {
+ _index:artists
+ _type:artist
+ _id:ML1lBWjYT0usmmaEBOP7-w
+ _score:1
+ _source:{
# Id:18
# Name:The Crüxshadows
# PageUrl:http://www.last.fm/music/The+Cr%C3%BCxshadows
#
PictureUrl:http://userserve-ak.last.fm/serve/252/10323129.jpg
}
}, ........
and then doing search "artists/artist/_search?q=me*":
artists artist N1S_cm9SQ1WjyecLgEXS2Q 0.0 {Id=86, Name=Katie
Melua, PageUrl=http://www.last.fm/music/Katie+Melua,
PictureUrl=http://userserve-ak.last.fm/serve/252/38702721.png}
artists artist wOzfEndsSVa2VI0hxT9tUA 0.0 {Id=471, Name=Metro
Station, PageUrl=http://www.last.fm/music/Metro+Station,
PictureUrl=http://userserve-ak.last.fm/serve/252/8127003.jpg}
artists artist tXH8osdPQYaLefAN-FED5Q 0.0 {Id=707,
Name=Metallica, PageUrl=http://www.last.fm/music/Metallica,
PictureUrl=http://userserve-ak.last.fm/serve/252/7560709.jpg}
artists artist Ely-7TbWSlmRh6xqm0mRjA 0.0 {Id=914, Name=Medina,
PageUrl=http://www.last.fm/music/Medina,
PictureUrl=http://userserve-ak.last.fm/serve/252/60964027.png}
artists artist KbtwPj3PQH2HAchnkO9Hyg 0.0 {Id=996, Name=Mike &
The Mechanics,
PageUrl=http://www.last.fm/music/Mike%2B%2526%2BThe%2BMechanics,
PictureUrl=http://userserve-ak.last.fm/serve/252/57142699.png}
artists artist Kr_oQ7bXRMyGJtfvjO6krw 0.0 {Id=779, Name=Bring
Me The Horizon, PageUrl=http://www.last.fm/music/Bring+Me+The+Horizon,
PictureUrl=http://userserve-ak.last.fm/serve/252/51720179.jpg}
artists artist gd_FBH9RToa4xHyEL6LijQ 0.0 {Id=918,
Name=Megadeth, PageUrl=http://www.last.fm/music/Megadeth,
PictureUrl=http://userserve-ak.last.fm/serve/252/8129787.jpg}
artists artist d_9Gj_GLR-CWBJkgjRt4Xg 0.0 {Id=657, Name=Paolo
Meneguzzi, PageUrl=http://www.last.fm/music/Paolo+Meneguzzi,
PictureUrl=http://userserve-ak.last.fm/serve/252/8575439.jpg}
artists artist Q52O_Jj5QxaiMZrrINpFhQ 0.0 {Id=721, Name=Wim
Mertens, PageUrl=http://www.last.fm/music/Wim+Mertens,
PictureUrl=http://userserve-ak.last.fm/serve/252/35625237.png}
artists artist xz4iayREQvysM4V3nYkzmQ 0.0 {Id=847, Name=The
Mercury Arc, PageUrl=http://www.last.fm/music/The+Mercury+Arc,
PictureUrl=http://userserve-ak.last.fm/serve/252/39053993.jpg}
artists artist 15vP6TBhSi6-UxOeuK9Qrg 0.0 {Id=477, Name=Daniel
Merriweather, PageUrl=http://www.last.fm/music/Daniel+Merriweather,
PictureUrl=http://userserve-ak.last.fm/serve/252/53480041.png}
artists artist 7lDFkbFUTmCEYp2dH7J13g 0.0 {Id=777, Name=The
Crystal Method, PageUrl=http://www.last.fm/music/The+Crystal+Method,
PictureUrl=http://userserve-ak.last.fm/serve/252/26115391.jpg}
artists artist OghYAvWhRlWYknoOH58LFA 0.0 {Id=170, Name=Mew,
PageUrl=http://www.last.fm/music/Mew,
PictureUrl=http://userserve-ak.last.fm/serve/252/42247291.jpg}
artists artist o_d8GDDSSrSKmnREVz71eA 0.0 {Id=359, Name=Maria
Mena, PageUrl=http://www.last.fm/music/Maria+Mena,
PictureUrl=http://userserve-ak.last.fm/serve/252/13556587.jpg}
artists artist gKC8UA62S3Sgzd6OCK3lUA 0.0 {Id=643, Name=Nikolas
Metaxas, PageUrl=http://www.last.fm/music/Nikolas+Metaxas,
PictureUrl=http://userserve-ak.last.fm/serve/252/61486893.png}
flows:
2013-06-07 16:19:17,829 INFO [main] provider.TestCatalogProviderUtil
(TestCatalogProviderUtil.java:testElasticsearchProvider(341)) - loading
data into elasticsearch
2013-06-07 16:19:18,266 INFO [flow] flow.Flow
(BaseFlow.java:logInfo(1300)) - [] source:
FileTap["TextDelimited[['Id', 'Name', 'PageUrl',
'PictureUrl']]"]["/home/oleg/dev/git/lingual/lingual-core/src/test/resources/artists.tab"]
2013-06-07 16:19:18,266 INFO [flow] flow.Flow
(BaseFlow.java:logInfo(1300)) - [] sink:
ESLocalTap["ESLocalScheme[['Id', 'Name', 'PageUrl',
'PictureUrl']]"]["artists"]
2013-06-07 16:19:18,893 INFO [main] provider.TestCatalogProviderUtil
(TestCatalogProviderUtil.java:testElasticsearchProvider(375)) -
searching elasticsearch
2013-06-07 16:19:19,004 INFO [flow] flow.Flow
(BaseFlow.java:logInfo(1300)) - [] source:
ESLocalTap["ESLocalScheme[['Id', 'Name', 'PageUrl',
'PictureUrl']]"]["artists/artist/_search?q=me*"]
2013-06-07 16:19:19,005 INFO [flow] flow.Flow
(BaseFlow.java:logInfo(1300)) - [] sink: StdOutTap["TextLine[['num',
'line']->[ALL]]"]["stdOut"]
Best,
Oleg
On 06/11/2013 02:05 PM, Costin Leau wrote:
..would be great to get feedback on the ESTap
from elasticsearch-hadoop.
Moving this to milestone 1.3 M2 to address the last bit, namely being id aware.
from elasticsearch-hadoop.
I know this is an old bug but want to check whether the type
is still required. The bulk already supports the various meta-data options and the type and index are built in (it's the es.resource
).
from elasticsearch-hadoop.
Related Issues (20)
- Supports spark to write elasticsearch rate limitation HOT 1
- can't append a value to array in elasticsearch from hive HOT 1
- failed in inserting into elasticsearch table on cdp HOT 1
- Cannot detect ES version in AWS Glue HOT 1
- Unable to load the data from ELK using Databricks HOT 8
- Support Cross Cluster Search
- Version of hadoop-common which is reported as vulnerable HOT 1
- elasticsearch-hadoop doesn't work with PySpark HOT 4
- IS NOT IN operator is translated to a wrong query
- Does elasticsearch-hadoop support IPv6 addresses HOT 1
- ELK to Databricks Data load HOT 1
- hello, is async http support? HOT 1
- Nested objects fail parsing in Spark SQL when empty objects present
- Latency Spike during Spark Structured Streaming HOT 2
- Upgrade to Spark 3.4.x HOT 3
- Policy about type name in index name is too harsh HOT 3
- Unable to build the newly cloned project due to invalid dependency paths
- Spark dependency is not compatible resulting in compile error HOT 1
- 7.17.11 failed backwards compatibility for 5.5.3 HOT 4
- [Bug][CVE-2019-10172] found CVE in the latest release 8.12.2 and 8.9.1 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elasticsearch-hadoop.