Coder Social home page Coder Social logo

slamby / slamby-api Goto Github PK

View Code? Open in Web Editor NEW
7.0 5.0 2.0 1.87 MB

Slamby API under AGPL-3.0 license

Home Page: https://www.slamby.com/api

License: GNU Affero General Public License v3.0

C# 94.94% PowerShell 3.36% Shell 1.10% Nginx 0.30% CSS 0.03% JavaScript 0.01% XSLT 0.27%
slamby-api docker redis nginx elasticsearch text-classification matchmaker search search-engine nlp

slamby-api's Introduction

Slamby API

Slamby introduces Slamby Server (API). Build powerful data management service, store and analyze your data.

Product Documentation

Check out our API documentation.

Installation with Docker

Slamby API can find on the Docker Hub.

With Docker Compose (recommended)

Because Slamby API has dependencies: Elasticsearch, Redis and Nginx (recommended), the easiest way to use Slamby API is with Docker-Compose.

We made a prepared Docker Compose file for the easy installation.

Steps

  1. Install Docker on your machine: Official Docker installation guide

  2. Install Docker Compose on your machine (minimum 1.9.0 required): Official Docker Compose installation guide

  3. Download our Docker Compose file

$ curl -L "https://github.com/slamby/slamby-api/releases/download/v1.7.2/docker-compose.yml" > docker-compose.yml
  1. Compose the containers (run next to the compose file)
$ docker-compose -p slamby up -d
  1. Your server is installed. Check that if it's working correctly
$ curl localhost

{
  "Name": "Slamby.API",
  "Version": "1.7.2",
  "InstanceId": "817021ac-cc23-4473-b203-5083c3e7e00e",
  "Information": "https://developers.slamby.com"
}

  1. Open the setup page in a browser (http://localhost/setup) and follow the orders

    During the setup you need to:

    • Request a Slamby License,
    • Copy your Slamby API License,
    • Set the secret (password) for your Slamby API

With Docker (advanced)

You can use Slamby API server without composing. But Slamby API has prerequisites. You have to give the settings to the Slamby API server via environment variables (these are like: SlambyApi__...). Note that if you run it in a container you have to set the environment variables to the container not to the host. if you use an operating system in which you can use : in the environment variable names than you have to use : instead of __.

Prerequisites

Elasticsearch

Slamby API is using Elasticsearch as data storage system. You can use an own instance or cluster. The recommended version is 2.3. It has to be empty (no indices) and it is recommended to install mapper-attachments plugin. Set the elasticsearch url to the SlambyApi__ElasticSearch__Uris__0 (e.g.: http://elasticsearchserver:9200/). Or if you have a cluster with multiple endpoints set all the endpoints to the SlambyApi__ElasticSearch__Uris__0, SlambyApi__ElasticSearch__Uris__1, SlambyApi__ElasticSearch__Uris__2 etc. environment variables).

Redis

Slamby API using Redis for preindexing and for saving some metrics. Set the Redis connection string in the SlambyApi__Redis__Configuration.
You can even disable the usage of Redis if you want, set the are set SlambyApi__Redis__Enabled to false. (note that in that case, you can't use some features like PRC preindexing).

Nginx

Slamby API using dotnet core and Kestrel under the hood. It is recommended to use an nginx of the top of it. We have a preconfigured nginx image in the dockerhub. It is recommended to use this but you can use your own nginx server.

Slamby directory

Create a directory on the host computer for the persistent Slamby API files

Installation

Pull the image from docker hub

docker pull slamby/slamby.api:1.7.2

Run the container with settings

docker run -d \
  --name slamby_api \
  -p 5000:5000 \
  -v /yourDataDirectory:/Slamby \
  slamby/slamby.api:1.7.2

The Slamby API is using the port 5000 by default, but you can bind it to whatever port you want on your Docker host.

Settings

You can override the settings by environment variables. Please note that if you use an operating system in which you can use : in the environment variable names than you have to use : instead of __.

Here is a list of the most important settings. You can find all the setting in the appsettings.json file.

SlambyApi__ApiSecret

Default value: s3cr3t

This is the secret for your API. You have to use this to authenticate your requests.

SlambyApi__BaseUrlPrefix

It's empty by default.

If you are using the API behind a reverse proxy, than you have to use this value. Because in that case, the hostname won't be accurate. The API will put the http host of the request after it.

ElasticSearch__Uris__NUMBER

Note that this is an array configuration value. So you have to put 0, 1, 2... instead of the NUMBER.

There is a default one ElasticSearch__Uris__0, with default value: 'http://elasticsearch:9200/'

SlambyApi__Serilog__Output

Default value: /Slamby/Logs

The output directory of the log files.

SlambyApi__Serilog__MinimumLevel

Default value: Information

The minimum log level.

SlambyApi__Redis__Configuration

Default value: redis,abortConnect=false,ssl=false,syncTimeout=30000

The connection string for the Redis server.

SlambyApi__Parallel__ConcurrentTasksLimit

Default value: 0

The maximum limit of the used threads in each operation. If it's 0 then the API using core number * 2 for the best performance. Tip: you can limit it in each request header also. Check it in the API documentation.

SlambyApi__RequestsLimiting__MaxConcurrentRequests

Default value: 50

With this setting you can set up a Maximum Concurrent Request number. If there are more concurrent requests than this number, the API will response with HTTP Status Code 503 (Service Unavailable).

Issues

We use GitHub issues to track public bugs. Please ensure your description is clear and has sufficient instructions to be able to reproduce the issue.

Contributing

Please check our contribution guide here

License

This project is licensed under the GNU Affero General Public License version 3.0.

For commercial use please contact us at [email protected] and purchase commercial license.

Contact

If you have any questions please visit our community group or write an email to us at [email protected]

slamby-api's People

Contributors

attilaersek avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

slamby-api's Issues

SearchFieldList validation problem

Th API validate if the given field is a valid field in the dataset.
But the validation failed for boosted fields, for e.g.: title^2 but (in this case) the title is a valid field
Validation should allow the boosted fields also

Service prepare double call

If a service prepare (or activate) calling two times in a short time that can cause the process to start twice

Dataset groups

Grouping datasets by labels. Create labels and assign datasets to labels. Display datasets grouping by labels via Insight. Visual help for better dataset management.

concurrent requests limit

Basic research about max connection number. What happen when 1000 requests hit the API server? Any settings?

Storage capacity check

Make a middleware or action filter, if the HDD do not have enough capacity then some of the request (document add, document bulk add, tag add, service prepare... etc) should return with a human readable error message.

Filters with invalid Query can cause 500

Wherever the Query property is used, the API pass the query as it is to the elasticsearch. So it can fail. But the API not response the error message from the elasticsearch (which is usually human readable) rather just return with an 500.

It can be better if we return with something like "QueryParserException" and the error message from the elasticsearch?

Endpoint statistics

The API store the endpoint statistics (hit numbers) in Redis.
There can be an endpoint to show these statistics for the user.

Processes 30 days list + all time

Process list returns all the processes from the last 30 days as default. There is a new settings option to set the process time interval.

Search engine service (MVP)

Smart search engine as a service. Optional service paring + smart result. Elastic search with improved settings + smarter search using our technology + related services such as PRC for keyword extraction + related products + classifier service suggesting categories.

Features:

  • Search
  • Classifier Service Integration
  • Service with create, prepare and activate functions
  • Typo auto fix
  • Activation settings as default
  • Settings: autoCompleteSettings, SearchSettings, ClassifierSettings
  • SearchSettings: filter, weights, responseFieldList, searchFieldList
  • Classifier result validation flag: searchResultMatch field: true, false.
  • Search history save
  • useDefaultFilter for get search
  • useDefaultWeights
  • Order settings (by field, asc, desc)
  • Total search count
  • Highlight
  • Auto Complete

High Availability (HA) feature

Let the user create multiple API (mainly 2) instances and make them work together in a cluster. The main goal is the high availability

Log management

Manage logs into files, and make them available via file manager. + log settings with log levels.

Test

Document / Copy

Document / Bulk add

Document / Filter

Document / Sample

  • Sample test with percentage for status code
  • Sample test with percentage for size check
  • Sample test with fix fix size for status code
  • Sample test with fix size for size check
  • Sample test for different fields variation
  • Sample test for tag field test. Status code and size check

Statistics / Get

Process / Get / Get inactive

Resources / Get

Services / Get all services

File Manager (MVP)

Storing and managing files on the Slamby API server. List files and folders, create folders, delete files and folders, move and copy files, download files.

Twister QC

Quality measurement during/after the training process. Top result measurement, + preciseness measurement. Integrating preciseness value in classifier service recommend. Visualize result data.

PRC keywords tag id is not required

When there is a new PRC request, tagId filed is not required anymore. When tagId is empty, PRC predict the most suitable tagId and uses it.

Search Service activation error.

Server: Skye

Api version: 1.4.17030.02
Insight version: 1.3.17026.01

I created two new services, one classifier and one search service.

After activating the classifier service, I try to activate the search service. The search service settings contains the related classifier service settings as well. Fatal error occures.

Classifier Service parent tag filter problem

If the ParentIdList contains a tag that is not prepared (which is a usual case) then it says:
"There is at least one parent tag in the ParentTagIdList which is not a parent of any activated tags!"

delete processes?

A user can have hundreds or thousands of processes, after a time. And can be a big (and slow) response if you want to get all.
Maybe a "delete all processes" endpoint can be useful.

PRC recommendation across datasets

When we create a PRC service on dataset A, the service can only recommend documents from the A dataset. That would be useful if the service can recommend documents across the datasets

HA (High Availability) research

Basic market research about the available SLA solutions. Preparing for SLA development capability built-in the Slamby API server.

Estimated time need: 3 days.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.