Coder Social home page Coder Social logo

solr-streams-dsl's Introduction

Solr Stream DSL

Overview

This project aims to provide a safe and ergonomic way to build Solr streaming expressions.

Why this project?

At the time of putting this together, SolrJ does not provide a clean and easy way to build out stream expressions. The current mechanism involves newing up a series of classes for each stream and adding a bunch of parameters to them. Ideally, a developer would be able to build out a Solr streams expression in a way that looked very similar to the expression itself, making it clear to see what the expression was doing at a glance.

The recommended approach at this time seems to be building out the expression as a string. This approach, although much less verbose than using the various constructors provided by SolrJ, is very prone to human-error. One can easily leave off a parenthesis or comma, as well as passing in the wrong type or format of input into one of the stream expressions.

How does this project solve that?

This project provides a rich DSL, in the form of static methods, that allows a developer to build out stream expressions using method signatures that look very similar to the stream functions themselves. Since these are method calls, rather than strings, the compiler will check your parentheses/commas for you. Since the parameters are statically typed, you cannot invoke them with the wrong types of inputs.

Example

That all sounds great, but what does it look like in practice? Here is a simple example using the Solr techproducts tutorial collection. For this example, we will perform a simple search that selects items with a popularity of 6-10, grouped by manufacturer. We will then use select trim down the group key to just the manufacturer name/id.

Here is the streaming expression we will be building.

select(
    reduce(
        search(
            techproducts,
            q="popularity:[6 TO 10]",
            rows="25",
            sort="manu asc, name asc, popularity desc"
        ),
        by="manu",
        group(sort="name asc", n="10")
    ),
    manu,
    manu_id_s,
    group
)

Using SolrJ Classes

First, let's try to build this stream using the various TupleStream classes that SolrJ provides:

// If I'm not going across Solr clusters, can't Solr just figure the zkHost out?
String ZK_ensemble_string = "localhost:9983";
StreamFactory factory = new StreamFactory().
    withCollectionZkHost("techproducts", ZK_ensemble_string);
StreamContext context = new StreamContext();

// Build up our innner search()
// Hopefully these parameters are valid, but there's currently no other way to build a SearchStream
StreamExpression searchExpression = new StreamExpression("search")
    .withParameter("techproducts")
    .withParameter(new StreamExpressionNamedParameter("q", "popularity:[6 TO 10]"))
    .withParameter(new StreamExpressionNamedParameter("rows", "25"))
    .withParameter(new StreamExpressionNamedParameter("sort", "manu asc, name asc, popularity desc"));
SearchStream searchStream = new SearchStream(searchExpression, factory);

// Now wrap it with reduce()
// Yay, we have strongly typed parameters this time!
ReducerStream reduceStream = new ReducerStream(
    searchStream,
    new FieldEqualitor("manu"),
    new GroupOperation(new FieldComparator("name", ComparatorOrder.ASCENDING), 10));

// And finally wrap that with our select()
Map<String, String> fields = new HashMap<String, String>();
// Either nothing is aliased, or everything is aliased...
fields.put("manu", "manufacturer");
fields.put("manu_id_s", "manufacturerId");
fields.put("group", "group");
SelectStream selectStream = new SelectStream(reduceStream, fields);
selectStream.setStreamContext(context);
try {
    selectStream.open();
    readUntilEof(selectStream);
} finally {
    selectStream.close();
}

That's a lot of code, and it's not very easy to tell what's going on at a first glance. Let's see if we can refactor it a bit to draw out the shape of the stream expression.

Using SolrJ Classes - take 2

We can nest stream constructors to create something that more closely resembles our streaming expression. Here's the updated code. We now have a select that wraps a reduce that wraps a search. It's still a bit messy though and hard to process at a glance.

// If I'm not going across Solr clusters, can't Solr just figure the zkHost out?
String zkHost = "localhost:9983";
StreamFactory factory = new StreamFactory().
    withCollectionZkHost("techproducts", zkHost);
StreamContext context = new StreamContext();

SelectStream selectStream =
    new SelectStream(
        new ReducerStream(
            new SearchStream(
                // Still hoping these parameters are valid
                new StreamExpression("search")
                    .withParameter("techproducts")
                    .withParameter(new StreamExpressionNamedParameter("q", "popularity:[6 TO 10]"))
                    .withParameter(new StreamExpressionNamedParameter("rows", "25"))
                    .withParameter(new StreamExpressionNamedParameter("sort", "manu asc, name asc, popularity desc")),
                // What's this have to do with a searchStream?
                factory
            ),
            new FieldEqualitor("manu"),
            new GroupOperation(new FieldComparator("name", ComparatorOrder.ASCENDING), 10)
        ),
        Map.of(
            "manu", "manufacturer",
            "manu_id_s", "manufacturerId",
            "group", "group"
        )
    );
selectStream.setStreamContext(context);
try {
    selectStream.open();
    readUntilEof(selectStream);
} finally {
    selectStream.close();
}

Using Strings

As an alternative approach, we can simply craft a string that represents our streaming expression.

// Hopefully I don't miss a parenthesis or comma here...
// I also have no idea if I'm passing the right parametrs to everything...
String cexpr =  "select(" +
                    "reduce(" +
                        "search(" +
                            "techproducts," +
                            "q=\"popularity:[6 TO 10]\"," +
                            "rows=\"25\"," +
                            "sort=\"manu asc, name asc, popularity desc\"" +
                        ")," +
                        "by=\"manu\"," +
                        "group(sort=\"name asc\", n=\"10\")" +
                    ")," +
                    "manu as manufacturer," +
                    "manu_id_s as manufacturerId," +
                    "group" +
                ")";

// Now let's pass our expression to Solr and hope it all works...
ModifiableSolrParams paramsLoc = new ModifiableSolrParams();
paramsLoc.set("expr", cexpr);
paramsLoc.set("qt", "/stream");
TupleStream solrStream = new SolrStream(solrUrl, paramsLoc);
StreamContext context = new StreamContext();
solrStream.setStreamContext(context);
try {
    solrStream.open();
    readUntilEof(solrStream);
} finally {
    solrStream.close();
}

This looks a lot closer to the stream we're building; it's very easy to see the shape and know that we're building the right expression. We also no longer need to specify the zkHost. This looks like a huge improvement.

Unfortunately, it's not all good. We've now lost all parameter validation. Not all of the TupleStream classes allowed us to provide strongly typed parameters, but at least many of them did. With strings, we've lost all of it. Additionally, we've now opened ourselves up to a whole new class of issues: we now have the possibility of forgetting or misplacing something as simple as a ) or , character, something that the compiler would have warned us of with TupleStreams. We've gained readability, but at the cost of safety.

Solr Streams DSL

This is where this library fills the gaps. Here's how this same stream would be built using this library:

// We  could use Strings as well, but declaring Fields will
// allow us to not rely on "magic strings" and make it easier
// to rename them in the future.  Additionally, it allows us 
// to use the Field APIs for sorting/aliasing.
// We can now also Find References on a field and see every
// place we use it, including sorts/aliases.
Field manu = new Field("manu");
Field name = new Field("name");
Field popularity = new Field("popularity");
Field manu_id_s = new Field("manu_id_s");
Field group = new Field("group");

StreamExpression streamExpression =
    select(
        reduce(
            search(
                "techproducts",
                q("popularity:[6 TO 10]"),
                rows(25),
                sort(manu.asc(), name.asc(), popularity.desc())
            ),
            by(manu),
            group(sort(name.asc()), n(10))
        ),
        // No need to care if we use "field as alias" or "alias:field here.
        // We just create an Alais and which format to use is handled for us.
        manu.as("manufacturer"),
        manu_id_s.as("manufacturerId"),
        group
    );

ModifiableSolrParams paramsLoc = new ModifiableSolrParams();
paramsLoc.set("expr", streamExpression.toString());
paramsLoc.set("qt", "/stream");
TupleStream solrStream = new SolrStream(solrUrl, paramsLoc);
solrStream.setStreamContext(new StreamContext());
try {
    solrStream.open();
    readUntilEof(solrStream);
} finally {
    solrStream.close();
}

This gives us the best of both worlds. Our expression is still clearly readable, like the String implementation. We even gain a bit in in that we get some syntax coloring and we no longer need to escape " characters all over. Additionally, each of the stream methods like select, reduce, or group are all strongly typed. When building a search, we must pass in a collection name, followed by zero-or-more search parameters. If we try to pass an on clause, for instance, the compiler will complain that it's not a valid parameter. The decorator reduce takes in a stream, a by clause, and a reducer (group or distinct). Those are our only options, and they're all required.

solr-streams-dsl's People

Contributors

thesench avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.