Coder Social home page Coder Social logo

kreeben / resin Goto Github PK

View Code? Open in Web Editor NEW
564.0 22.0 39.0 64.75 MB

Vector space search engine. Available as a HTTP service or as an embedded library.

License: MIT License

C# 95.53% Batchfile 0.38% HTML 3.74% CSS 0.35%
information-retrieval search-engine vector-space-model machine-learning nlu-engine nlu search search-algorithms resin vector-space

resin's People

Contributors

abdullah2993 avatar alexanderpersson avatar jhashemi avatar kburman avatar kreeben avatar shanselman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

resin's Issues

Parse query into doubly-chained linked list

Regarding Resin's QL: A plus sign means "AND". A space means "OR". A minus sign means "NOT.

The QL currently doesn't allow for grouping/nesting. We need nesting to be able to rewrite this fuzzy query of two terms:

+title:religion +body:jesus~

into these three terms:

+title:religion +(body:jesus body:jesuz)

Let each term (query clause) be a node in a doubly-chained linked list, let left be down and let right be forward. The depth of a node will represent the nesting level.

Application startup exception: System.PlatformNotSupportedException: The named version of this synchronization primitive is not supported on this platform.

I'm getting the following error while trying to run Http server in macos.

Application startup exception: System.PlatformNotSupportedException: The named version of this synchronization primitive is not supported on this platform.
   at System.Threading.Semaphore.CreateSemaphore(Int32 initialCount, Int32 maximumCount, String name)
   at System.Threading.Semaphore..ctor(Int32 initialCount, Int32 maximumCount, String name, Boolean& createdNew)
   at Sir.Store.SessionFactory..ctor(ITokenizer tokenizer, IConfigurationProvider config) in /Users/kshitij/github/resin/src/Sir.Store/Session/SessionFactory.cs:line 36
   at Sir.Store.Start.OnApplicationStartup(IServiceCollection services, ServiceProvider serviceProvider, IConfigurationProvider config) in /Users/kshitij/github/resin/src/Sir.Store/Start.cs:line 16
   at Sir.HttpServer.ServiceConfiguration.Configure(IServiceCollection services) in /Users/kshitij/github/resin/src/Sir.HttpServer/ServiceConfiguration.cs:line 66
   at Sir.HttpServer.Startup.ConfigureServices(IServiceCollection services) in /Users/kshitij/github/resin/src/Sir.HttpServer/Startup.cs:line 26
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.AspNetCore.Hosting.ConventionBasedStartup.ConfigureServices(IServiceCollection services)
   at Microsoft.AspNetCore.Hosting.Internal.WebHost.EnsureApplicationServices()
   at Microsoft.AspNetCore.Hosting.Internal.WebHost.Initialize()
--- End of stack trace from previous location where exception was thrown ---
   at Microsoft.AspNetCore.Hosting.Internal.WebHost.BuildApplication()

Dotnet version

01:13 $ dotnet --info
.NET Core SDK (reflecting any global.json):
 Version:   2.2.107
 Commit:    2212cac826

Runtime Environment:
 OS Name:     Mac OS X
 OS Version:  10.14
 OS Platform: Darwin
 RID:         osx.10.14-x64
 Base Path:   /usr/local/share/dotnet/sdk/2.2.107/

Host (useful for support):
  Version: 2.2.5
  Commit:  0a3c9209c0

.NET Core SDKs installed:
  2.2.106 [/usr/local/share/dotnet/sdk]
  2.2.107 [/usr/local/share/dotnet/sdk]

.NET Core runtimes installed:
  Microsoft.AspNetCore.All 2.2.4 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.All 2.2.5 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.All]
  Microsoft.AspNetCore.App 2.2.4 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.AspNetCore.App 2.2.5 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
  Microsoft.NETCore.App 2.2.4 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
  Microsoft.NETCore.App 2.2.5 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]

To install additional .NET Core runtimes or SDKs:
  https://aka.ms/dotnet-download

I have tried to run the server on tag v0.3a and it worked completely fine.

@kreeben - Can you please take a look at it.
I have some experience working with C# and I'm ready to help if you could guide me.

And I want to thank you for making this open source.

Implement collation

Lexiographical ordering of keys is currently achieved by adhering to the Unicode ordering of characters. This will not work for all cultures.

Migrate to Core

  1. Create Core sln
  2. Add all files from v4 sln
  3. Add JSON.NET and log4net Core Nuget packages
  4. Unit testing for Core? Right now I'm using nunit 2.

Let v4 and Core solutions live in same repo.

Should be it.

Create abstraction layer for System.IO

Today we want to be able to promise that a commit is acctually commited. We can do this by making the same promise to our clients as the file system make to us. In other words, we can write a file to disk, verify it's really there, commited and in a readable state and then we can tell our client that the scope of our promise has been fulfilled.

Later, we want to make other types of promises that the file system cannot subscribe to, e.g. we want to promise that data has been persisted not only one machine but on two and that both machines have the data in a readable state.

Thus the need for a file system abstraction layer.

[Questions] Architecture documentation

Hi

Very interesting project.

We also looking for a good search engine for Location based data (mostly addresses) and now we are working on a Elastic based version but would be great to create other prototype with this. I'm fan of .Net Core so I'm happy to see project like this implementing a good tech stuff on .Net Core.
I have some question what came into my mind.

Are there any project or company behind this implementation or it is just your free time project?
Do you have architecture documentation? I'm very interested in how this system works, but reading the code is not the best starting point. I'm interested in a high level architecture and how the docs and trie scanned, how docs and trie held in memory or scanned on disk, etc.

Thanks

Benchmarks

Hi,

I am following Ayende's review and I am motivated to make some tests with Ants Profiler.

What do I have to do to run some benchmarks?

Btw: I found this free profiler: http://www.getcodetrack.com/

Grouping of query statements

The query

+label:golden +label:age of porn~ +genre:documentaries

should be rewritten by the query parser to

+label:golden +(label:age~ label:of~ label:porn~) +genre:documentaries

GetTicks/GetNextChronologicalFileId question

I was looking over your commits and noticed your changes to GetTicks() in 139da03

Since GetNextChronologicalFileId() is the sole consumer of GetTicks() I am trying to understand what you are trying to do with it? Before it was just a wrapper for DateTime.Now.Ticks, but now it is an incrementing number. The implementation of GetNext() is also not theadsafe. Since Random isn't a thread safe type and the Ticks++ is not guaranteed to be atomic. In fact i'm not sure what the use of Random is besides to introduce a delay in the function.

Based on the name of GetNextChronologicalFileId() and the commit message I am assuming it is intended to be:

  • Strictly increasing in return value
  • Safe to call simultaneously from multiple threads

Are there any other rules that this function must follow?

Version for .NETFramework,Version=v4.5.2

Severity Code Description Project File Line Suppression State
Error Could not install package 'ResinDB 2.0.3'. You are trying to install this package into a project that targets '.NETFramework,Version=v4.5.2', but the package does not contain any assembly references or content files that are compatible with that framework. For more information, contact the package author. 0

Implement "concept" as a first-class citizen along-side Term.

The purpose of a concept is to give meaning to a word or cluster of words so that an aggregated concept can be built that describe either a paragraph or the document in its entirety.

In a corpus there are always fewer concepts than there are terms. Therefore, if you could compare concepts in vector space instead of terms, you would gain in querying speed.

In order to give new meaning to a word or cluster of words, more information has to be added to the equation than just the words.

It's a good thing then that concepts may be extracted from the context in which a word or sentence live, the context being the words or sentences that surround them.

Sounds like fun, right?

Implement term count as DocumentPosting<int>

Let each trie node carry a Data field where T can be any class or struct. Return this via a Word to the Collector and include it in the scorable DocumentPosting. Data.TermCount is where the data goes that support the tf-idf scoring model. Data.Value is where your custom data is. Only EndOfWord nodes carry data.

About LazyTrie Object

Hi,
I cant build it, because i dont find any LazyTrie object or reference in solution.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.