kreeben / resin Goto Github PK
View Code? Open in Web Editor NEWVector space search engine. Available as a HTTP service or as an embedded library.
License: MIT License
Vector space search engine. Available as a HTTP service or as an embedded library.
License: MIT License
Regarding Resin's QL: A plus sign means "AND". A space means "OR". A minus sign means "NOT.
The QL currently doesn't allow for grouping/nesting. We need nesting to be able to rewrite this fuzzy query of two terms:
+title:religion +body:jesus~
into these three terms:
+title:religion +(body:jesus body:jesuz)
Let each term (query clause) be a node in a doubly-chained linked list, let left be down and let right be forward. The depth of a node will represent the nesting level.
Re-use Query when mapping across multiple ReadSessions.
The root of the tree should be in the center so the tree is split in half upon first traversal step.
I'm getting the following error while trying to run Http server in macos.
Application startup exception: System.PlatformNotSupportedException: The named version of this synchronization primitive is not supported on this platform.
at System.Threading.Semaphore.CreateSemaphore(Int32 initialCount, Int32 maximumCount, String name)
at System.Threading.Semaphore..ctor(Int32 initialCount, Int32 maximumCount, String name, Boolean& createdNew)
at Sir.Store.SessionFactory..ctor(ITokenizer tokenizer, IConfigurationProvider config) in /Users/kshitij/github/resin/src/Sir.Store/Session/SessionFactory.cs:line 36
at Sir.Store.Start.OnApplicationStartup(IServiceCollection services, ServiceProvider serviceProvider, IConfigurationProvider config) in /Users/kshitij/github/resin/src/Sir.Store/Start.cs:line 16
at Sir.HttpServer.ServiceConfiguration.Configure(IServiceCollection services) in /Users/kshitij/github/resin/src/Sir.HttpServer/ServiceConfiguration.cs:line 66
at Sir.HttpServer.Startup.ConfigureServices(IServiceCollection services) in /Users/kshitij/github/resin/src/Sir.HttpServer/Startup.cs:line 26
--- End of stack trace from previous location where exception was thrown ---
at Microsoft.AspNetCore.Hosting.ConventionBasedStartup.ConfigureServices(IServiceCollection services)
at Microsoft.AspNetCore.Hosting.Internal.WebHost.EnsureApplicationServices()
at Microsoft.AspNetCore.Hosting.Internal.WebHost.Initialize()
--- End of stack trace from previous location where exception was thrown ---
at Microsoft.AspNetCore.Hosting.Internal.WebHost.BuildApplication()
Dotnet version
01:13 $ dotnet --info
.NET Core SDK (reflecting any global.json):
Version: 2.2.107
Commit: 2212cac826
Runtime Environment:
OS Name: Mac OS X
OS Version: 10.14
OS Platform: Darwin
RID: osx.10.14-x64
Base Path: /usr/local/share/dotnet/sdk/2.2.107/
Host (useful for support):
Version: 2.2.5
Commit: 0a3c9209c0
.NET Core SDKs installed:
2.2.106 [/usr/local/share/dotnet/sdk]
2.2.107 [/usr/local/share/dotnet/sdk]
.NET Core runtimes installed:
Microsoft.AspNetCore.All 2.2.4 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.All]
Microsoft.AspNetCore.All 2.2.5 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.All]
Microsoft.AspNetCore.App 2.2.4 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.AspNetCore.App 2.2.5 [/usr/local/share/dotnet/shared/Microsoft.AspNetCore.App]
Microsoft.NETCore.App 2.2.4 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
Microsoft.NETCore.App 2.2.5 [/usr/local/share/dotnet/shared/Microsoft.NETCore.App]
To install additional .NET Core runtimes or SDKs:
https://aka.ms/dotnet-download
I have tried to run the server on tag v0.3a and it worked completely fine.
@kreeben - Can you please take a look at it.
I have some experience working with C# and I'm ready to help if you could guide me.
And I want to thank you for making this open source.
This includes readers, writers and tests.
see https://github.com/kreeben/resin/blob/master/src/ResinCore/Field.cs#L30
Shouldn't the datetime value be stored as utc?
ps. numeric values are also stored using the culture specific ToString().
So that readers never have to know beforehand if a index was compressed or not.
Lexiographical ordering of keys is currently achieved by adhering to the Unicode ordering of characters. This will not work for all cultures.
Let there be a IDocumentReader and IDocumentWriter for all document operations so that the storage engine becomes pluggable.
Let v4 and Core solutions live in same repo.
Should be it.
Can we expect faceting features like in solr?
The GET request seems to fail with a timeout.
You'll find ' ', ':', '~' and more in the code. Mostly query parsing does this.
Let the analyzer split long strings to make tries shallow.
Today we want to be able to promise that a commit is acctually commited. We can do this by making the same promise to our clients as the file system make to us. In other words, we can write a file to disk, verify it's really there, commited and in a readable state and then we can tell our client that the scope of our promise has been fulfilled.
Later, we want to make other types of promises that the file system cannot subscribe to, e.g. we want to promise that data has been persisted not only one machine but on two and that both machines have the data in a readable state.
Thus the need for a file system abstraction layer.
doc is already used by Microsoft Word. Maybe use a different extension? .rsin ?
There is a delete by pk operation that might be interesting to have a look at. If one resolves a term into primary keys, one could reuse the existing delete operation.
Please replace log4net as a dependency with https://www.nuget.org/packages/Microsoft.Extensions.Logging.Abstractions
That allows you to use the logger of your application instead hard depending on a specific logger. It's up to the application to configure logging not the library.
Hi
Very interesting project.
We also looking for a good search engine for Location based data (mostly addresses) and now we are working on a Elastic based version but would be great to create other prototype with this. I'm fan of .Net Core so I'm happy to see project like this implementing a good tech stuff on .Net Core.
I have some question what came into my mind.
Are there any project or company behind this implementation or it is just your free time project?
Do you have architecture documentation? I'm very interested in how this system works, but reading the code is not the best starting point. I'm interested in a high level architecture and how the docs and trie scanned, how docs and trie held in memory or scanned on disk, etc.
Thanks
Hi,
I am following Ayende's review and I am motivated to make some tests with Ants Profiler.
What do I have to do to run some benchmarks?
Btw: I found this free profiler: http://www.getcodetrack.com/
It's just messy now. Make it pretty where ever there is a analyzer.
The query
+label:golden +label:age of porn~ +genre:documentaries
should be rewritten by the query parser to
+label:golden +(label:age~ label:of~ label:porn~) +genre:documentaries
Is it planned to make the Levenshtein pluggable and for example replace it with Trigram?
Prohibit first statement in a clause from being a "not" statement. "Or" and "and" statements are allowed.
Remove old documents. Re-write indices.
I was looking over your commits and noticed your changes to GetTicks()
in 139da03
Since GetNextChronologicalFileId()
is the sole consumer of GetTicks()
I am trying to understand what you are trying to do with it? Before it was just a wrapper for DateTime.Now.Ticks
, but now it is an incrementing number. The implementation of GetNext()
is also not theadsafe. Since Random
isn't a thread safe type and the Ticks++
is not guaranteed to be atomic. In fact i'm not sure what the use of Random
is besides to introduce a delay in the function.
Based on the name of GetNextChronologicalFileId()
and the commit message I am assuming it is intended to be:
Are there any other rules that this function must follow?
Severity Code Description Project File Line Suppression State
Error Could not install package 'ResinDB 2.0.3'. You are trying to install this package into a project that targets '.NETFramework,Version=v4.5.2', but the package does not contain any assembly references or content files that are compatible with that framework. For more information, contact the package author. 0
The purpose of a concept is to give meaning to a word or cluster of words so that an aggregated concept can be built that describe either a paragraph or the document in its entirety.
In a corpus there are always fewer concepts than there are terms. Therefore, if you could compare concepts in vector space instead of terms, you would gain in querying speed.
In order to give new meaning to a word or cluster of words, more information has to be added to the equation than just the words.
It's a good thing then that concepts may be extracted from the context in which a word or sentence live, the context being the words or sentences that surround them.
Sounds like fun, right?
Store term positions at indexing time. At scoring time multiply the weight of a term in a phrase query by a factor proportional to the distance to its predecessor term.
Using these terms def con 26 badge ama
the web page I wanted was this: https://www.reddit.com/r/Defcon/comments/973jik/dc26_official_badge_hardware_ama/
(This is just a real life example...)
Google returns that page for those terms as the second result. Duck Duck Go returns the parent page, but not the desired page.
DidYouGoGo.com doesn't return anything related.
Let each trie node carry a Data field where T can be any class or struct. Return this via a Word to the Collector and include it in the scorable DocumentPosting. Data.TermCount is where the data goes that support the tf-idf scoring model. Data.Value is where your custom data is. Only EndOfWord nodes carry data.
Unable to inject plugged in store into MergeOperation ctor otherwise.
Development has stopped?
Hi,
I cant build it, because i dont find any LazyTrie object or reference in solution.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.