Coder Social home page Coder Social logo

linkedinattic / cleo Goto Github PK

View Code? Open in Web Editor NEW
560.0 560.0 78.0 682 KB

A flexible, partial, out-of-order and real-time typeahead search library

Home Page: http://sna-projects.com/cleo

License: Apache License 2.0

PHP 0.01% Groovy 0.08% Java 99.91%

cleo's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

cleo's Issues

Quick Start - "main" suggestion

At http://sna-projects.com/cleo/quickstart.php there is:

 Download cleo-primer from Github.

 git clone --depth 1 [email protected]:jingwei/cleo-primer.git cleo-primer


  Launch the cleo-primer web application from the main folder.

 MAVEN_OPTS="-Xms1g -Xmx1g" mvn jetty:run \
 -Dcleo.instance.name=Company \
 -Dcleo.instance.type=cleo.primer.GenericTypeaheadInstance \
 -Dcleo.instance.conf=src/main/resources/config/generic-typeahead

For someone new to Maven (as I am) who simply wants to evaluate your project's demo and not undertake learning the niceties of Maven, the reference to "main folder" above may cause one to go into the folder "cleo-primer/src/main" and then attempt to run the proffered command there. Doing so, of course, will cause Maven to complain that it cannot find the jetty plugin (see below).

Since you have specified the project folder as "cleo-primer", why not then have the following instruction be in accord and state:

 Launch the cleo-primer web application from the folder "cleo-primer".

This will remove any ambiguity about "main" meaning ".../cleo-primer" in the generic sense as you intend or ".../cleo-primer/src/main" and possibly save people posting error messages such as this:

jlpoole@themis /usr/local/src/cleo-primer/src/main $ MAVEN_OPTS="-Xms1g -Xmx1g" mvn jetty:run -Dcleo.instance.name=Company -Dcleo.instance.type=cleo.primer.GenericTypeaheadInstance -Dcleo.instance.conf=src/main/resources/config/generic-typeahead
[INFO] Scanning for projects...
[INFO] Searching repository for plugin with prefix: 'jetty'.
[INFO] org.apache.maven.plugins: checking for updates from central
[INFO] org.codehaus.mojo: checking for updates from central
[INFO] artifact org.apache.maven.plugins:maven-jetty-plugin: checking for updates from central
[INFO] ------------------------------------------------------------------------
[ERROR] BUILD ERROR
[INFO] ------------------------------------------------------------------------
[INFO] The plugin 'org.apache.maven.plugins:maven-jetty-plugin' does not exist or no valid version could be found
[INFO] ------------------------------------------------------------------------
[INFO] For more information, run Maven with the -e switch
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1 second
[INFO] Finished at: Sun Mar 04 05:07:06 PST 2012
[INFO] Final Memory: 6M/981M
[INFO] ------------------------------------------------------------------------
jlpoole@themis /usr/local/src/cleo-primer/src/main $ 

want to change ElementID datatype from int to long

Now I want to use cleo to set up our typeahead system, but the native element id in cleo is int. I want to change this from int to long. Is there someting important that should I pay more attention to? Very thanks for reply ^.^

Running from within a firewall

This is really a maven issue; however, others may be in the same position I find myself and addressing this issue could prove helpful.

I was able to successfully run the cleo-primer from behind a less-restrictive firewall; the remote fetching worked and the server started. Now I am trying to run the sample from behind a restrictive firewall and the startup fails because remote files cannot be downloaded.

I checked Maven's site http://maven.apache.org/guides/mini/guide-proxies.html and duly created a file ~jlpoole/.m2/repository/settings.xml as follows:

jlpoole@themis /usr/local/src/cleo-primer $ cat ~/.m2/repository/settings.xml
--settings--
--proxies--
--proxy--
--active--true--/active--
--protocol>http--/protocol--
--host--[the proxy server]--/host--
--port--[the proxy port]--/port--
--/proxy--
--/proxies--
--/settings--

Note: I had to substitute "--" for opening and closing markup since my pasting them in caused them to disappear.
jlpoole@themis /usr/local/src/cleo-primer $

Yet, when I try to run the launching command, it's showing the proxy is not working:

    jlpoole@themis /usr/local/src/cleo-primer $ MAVEN_OPTS="-Xms1g -Xmx1g" mvn jetty:run -Dcleo.instance.name=Company -Dcleo.instance.type=cleo.primer.GenericTypeaheadInstance -Dcleo.instance.conf=src/main/resources/config/generic-typeahead
[INFO] Scanning for projects...
[INFO]                                                                         
[INFO] ------------------------------------------------------------------------
[INFO] Building cleo-primer 1.0
[INFO] ------------------------------------------------------------------------
[INFO] 
[INFO] >>> maven-jetty-plugin:6.1.25:run (default-cli) @ cleo-primer >>>
[WARNING] The POM for org.codehaus.jackson:jackson-lgpl:jar:0.9.4 is invalid, transitive dependencies (if any) will not be available, enable debug logging for more details
[WARNING] Could not transfer metadata org.springframework:spring-core/maven-metadata.xml from/to m1.dev.java.net (http://download.java.net/maven/1): No connector available to access repository m1.dev.java.net (http://download.java.net/maven/1) of type legacy using the available factories WagonRepositoryConnectorFactory
[WARNING] Could not transfer metadata org.springframework:spring/maven-metadata.xml from/to m1.dev.java.net (http://download.java.net/maven/1): No connector available to access repository m1.dev.java.net (http://download.java.net/maven/1) of type legacy using the available factories WagonRepositoryConnectorFactory
[WARNING] Could not transfer metadata org.springframework:spring-beans/maven-metadata.xml from/to m1.dev.java.net (http://download.java.net/maven/1): No connector available to access repository m1.dev.java.net (http://download.java.net/maven/1) of type legacy using the available factories WagonRepositoryConnectorFactory
[WARNING] Could not transfer metadata org.springframework:spring-context/maven-metadata.xml from/to m1.dev.java.net (http://download.java.net/maven/1): No connector available to access repository m1.dev.java.net (http://download.java.net/maven/1) of type legacy using the available factories WagonRepositoryConnectorFactory
[WARNING] Could not transfer metadata org.springframework:spring-web/maven-metadata.xml from/to m1.dev.java.net (http://download.java.net/maven/1): No connector available to access repository m1.dev.java.net (http://download.java.net/maven/1) of type legacy using the available factories WagonRepositoryConnectorFactory
[INFO] 
[INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ cleo-primer ---
[debug] execute contextualize
[WARNING] Using platform encoding (ANSI_X3.4-1968 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] Copying 4 resources
[INFO] 
[INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ cleo-primer ---
[INFO] Nothing to compile - all classes are up to date
[INFO] 
[INFO] --- maven-resources-plugin:2.5:testResources (default-testResources) @ cleo-primer ---
[debug] execute contextualize
[WARNING] Using platform encoding (ANSI_X3.4-1968 actually) to copy filtered resources, i.e. build is platform dependent!
[INFO] skip non existing resourceDirectory /usr/local/src/cleo-primer/src/test/resources
[INFO] 
[INFO] --- maven-compiler-plugin:2.3.2:testCompile (default-testCompile) @ cleo-primer ---
[INFO] No sources to compile
[INFO] 
[INFO] <<< maven-jetty-plugin:6.1.25:run (default-cli) @ cleo-primer <<<
[WARNING] Failure to transfer org.springframework:spring-core/maven-metadata.xml from http://download.java.net/maven/1 was cached in the local repository, resolution will not be reattempted until the update interval of m1.dev.java.net has elapsed or updates are forced. Original error: Could not transfer metadata org.springframework:spring-core/maven-metadata.xml from/to m1.dev.java.net (http://download.java.net/maven/1): No connector available to access repository m1.dev.java.net (http://download.java.net/maven/1) of type legacy using the available factories WagonRepositoryConnectorFactory
[WARNING] Failure to transfer org.springframework:spring/maven-metadata.xml from http://download.java.net/maven/1 was cached in the local repository, resolution will not be reattempted until the update interval of m1.dev.java.net has elapsed or updates are forced. Original error: Could not transfer metadata org.springframework:spring/maven-metadata.xml from/to m1.dev.java.net (http://download.java.net/maven/1): No connector available to access repository m1.dev.java.net (http://download.java.net/maven/1) of type legacy using the available factories WagonRepositoryConnectorFactory
[WARNING] Failure to transfer org.springframework:spring-beans/maven-metadata.xml from http://download.java.net/maven/1 was cached in the local repository, resolution will not be reattempted until the update interval of m1.dev.java.net has elapsed or updates are forced. Original error: Could not transfer metadata org.springframework:spring-beans/maven-metadata.xml from/to m1.dev.java.net (http://download.java.net/maven/1): No connector available to access repository m1.dev.java.net (http://download.java.net/maven/1) of type legacy using the available factories WagonRepositoryConnectorFactory
[WARNING] Failure to transfer org.springframework:spring-context/maven-metadata.xml from http://download.java.net/maven/1 was cached in the local repository, resolution will not be reattempted until the update interval of m1.dev.java.net has elapsed or updates are forced. Original error: Could not transfer metadata org.springframework:spring-context/maven-metadata.xml from/to m1.dev.java.net (http://download.java.net/maven/1): No connector available to access repository m1.dev.java.net (http://download.java.net/maven/1) of type legacy using the available factories WagonRepositoryConnectorFactory
[WARNING] Failure to transfer org.springframework:spring-web/maven-metadata.xml from http://download.java.net/maven/1 was cached in the local repository, resolution will not be reattempted until the update interval of m1.dev.java.net has elapsed or updates are forced. Original error: Could not transfer metadata org.springframework:spring-web/maven-metadata.xml from/to m1.dev.java.net (http://download.java.net/maven/1): No connector available to access repository m1.dev.java.net (http://download.java.net/maven/1) of type legacy using the available factories WagonRepositoryConnectorFactory
[INFO] 
[INFO] --- maven-jetty-plugin:6.1.25:run (default-cli) @ cleo-primer ---
[INFO] Configuring Jetty for project: cleo-primer
[INFO] Webapp source directory = /usr/local/src/cleo-primer/src/main/webapp
[INFO] Reload Mechanic: manual
[INFO] Classes = /usr/local/src/cleo-primer/target/classes
2012-03-06 14:48:47.802:INFO::Logging to STDERR via org.mortbay.log.StdErrLog
[INFO] Jetty server exiting.
[INFO] ------------------------------------------------------------------------
[INFO] BUILD FAILURE
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 3.554s
[INFO] Finished at: Tue Mar 06 14:48:47 PST 2012
[INFO] Final Memory: 10M/981M
[INFO] ------------------------------------------------------------------------
[ERROR] Failed to execute goal org.mortbay.jetty:maven-jetty-plugin:6.1.25:run (default-cli) on project cleo-primer: Failure: Bad temp directory: /usr/local/src/cleo-primer/target/work -> [Help 1]
[ERROR] 
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR] 
[ERROR] For more information about the errors and possible solutions, please read the following articles:
[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
2012-03-06 14:48:48.074:INFO::Shutdown hook executing
2012-03-06 14:48:48.074:INFO::Shutdown hook complete
jlpoole@themis /usr/local/src/cleo-primer $

Is there something that can be tweaked to bypass refreshing from remote sources so when the example is run from within a restrictive firewall it is ready to run and/or aborts trying to get the latest files? Or have I missed some basic Maven configuration making my attempt to create a proxy entry useless?

Parameter Substitution For Properties Is Broken By Backslashes

If you write something like

rootDir=C:\some\where\
anotherDir=${rootDir}\cleoDir

The parameter substitution fails because it doesn't correctly escape backslashes. This causes problems on Windows, particularly if you load the system properties and want to use something like user.home.

Orignal scores were overridden in GenericTypeaheadInitializer code

I noticed that this code actually ignores original scores when reading serialized index. So i was forced to comment it.

    protected GenericTypeahead<E> createTypeahead(GenericTypeaheadConfig<E> config) throws Exception {
        SOME CODE

      // create scoreScanner
   ScoreScanner scoreScanner = new ElementScoreScanner(config.getElementScoreFile());

SOME CODE
    }

Cleo needs lot of memory -- using SSD instead of memory

We have been using Cleo in production for an year now. It works flawlessly. We are indexing 9m documents that needs hefty amount of memory.

In the world of SSD, is there a way I can use some kind of DiskStore to keep elements instead of memory. I am not sure how to make this happen. Any help would be greatly appreciated.

Cleo Score -- Why is it needed ?

Can someone shed some light on where to find details about Score being set along with Element and how to define a scoring strategy. Also is this is used for sorting the results inside Cleo.

Need Boosting For Exact Matches

Sometimes a topic whose name is a prefix for a lot of other topics has a lower score than the other topics and thus doesn't appear in the search. The trouble is that you can't narrow the competition by typing more stuff because you've already typed all the stuff so the topic is not findable at all via typeahead.

To some extent this can be addressed by playing with the score, but I think the real answer is that an exact hit ought to get a boost in its rank.

Word repetition issue

PrefixSelector doesnt properly works with word repetition. I.e. you can type same word(or even short prfix) several times and get same suggestion.

You can use something like this to solve the problem: (scala code)

  class MySelector(terms: Array[String]) extends PrefixSelector[MyElement](terms: _*) {
    val storage = new Array[Boolean](20)

    def startWith(child: Array[String], parent: Array[String]):Boolean = {
      val pi = parent.toIterator.zipWithIndex
      for (i <- 0 until parent.size) storage(i) = false
      child.forall {
        cw => pi.exists {
          case (w, i) => if (w.startsWith(cw) && !storage(i)) {
            storage(i) = true;
            true
          } else false
        }
      }
    }

    override def select(element: MyElement, ctx: SelectorContext): Boolean = {
        val select = startWith(terms,element.getTerms)
        select
      }
    }
  }

Quick Start - git fails

Quick Start Guide has:

You can follow the steps below to set up a typeahead service for public companies listed at Nasdaq.

   Download cleo-primer from Github.

     git clone --depth 1 [email protected]:jingwei/cleo-primer.git cleo-primer

When I copy pasted the command line, I was denied access:

jlpoole@themis /usr/local/src $ git clone --depth 1 [email protected]:jingwei/cleo-primer.git cleo-primer
Cloning into cleo-primer...
The authenticity of host 'github.com (207.97.227.239)' can't be established.
RSA key fingerprint is 16:27:ac:a5:76:28:2d:36:63:1b:56:4d:eb:df:a6:48.
Are you sure you want to continue connecting (yes/no)? y
Please type 'yes' or 'no': yes
Warning: Permanently added 'github.com,207.97.227.239' (RSA) to the list of known hosts.
Permission denied (publickey).
fatal: The remote end hung up unexpectedly
jlpoole@themis /usr/local/src $ git clone --depth 1 [email protected]:jingwei/cleo-primer.git cleo-primer
Cloning into cleo-primer...
Permission denied (publickey).
fatal: The remote end hung up unexpectedly
jlpoole@themis /usr/local/src $

Heapdump: Shows objects are not GC'd from Krati

image

I took this snapshot after 20min the program indexed all elements. I am indexing 400000 elements into cleo with 10 partitions each owning 50000 elements in each. The program did not crash and all elements are indexed properly. Was wondering any reason why objects are still living after even 20min of time for GC to do its work. I am running 8 core Mac with 4gb of -Xms and -Xmx setting.

Let me know if you want me to conduct any other tests further findings. Not sure the problem is in Cleo or Krati or in my program. However, the Element object is no where showing up in Heap that means my program is doing fine.

GenericTypeaheadInitializer override the scores with a default min value

The GenerictypeaheadInitializer create a default ElementScoreScanner for create a Generictypeahead,whatever if the param elementScoreFile seted or not.

And this action make the typeahead instance override the scores with a default MIN_FLOAT_VALUE whatever if the element has a defined score or not when it init.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.