Coder Social home page Coder Social logo

ptt-search-engine's Introduction

Ptt_SearchEngineWeb

Introduction

Build a search engine include all articles and commands crawl from Ptt(web forum), Searching content by keyword or by user’s id observe what articles they replied or post. This Repo only conclude Wep UI part. ptt: https://en.wikipedia.org/wiki/PTT_Bulletin_Board_System Demo-Link

Build a Web application using Flask and Vuejs

Use the vue front-end framework to develop a single-page application (SPA), and then hand over the built front-end interface to Flask as the server to process Request and the API for search commands under Elasticsearch. This note will be divided into two parts: Vue.js and Flask.

see VueJS README in vueptt directory.

Architecture diagram

For basic concepts of web application and routing, please refer to: SPA & Router

Flask

Installing

Install Flask and the Flask-CORS extension

(env)$ pip install Flask Flask-Cors

Under background processing (save Session, you can useTmux),Execute main.py to start the service on the server.

//Flask directory
tmux
<Ctrl+b> + s//select Session 
python app.py

Query API

Handle how to process the search conditions sent by the front end into Query and send it to ElasticSearch.

Introduce the Elasticsearch suite in main.py, and send a request to the database to return the data.

The environment parameters are used here to have more flexibility in changing the database domain and password. It is brought in by the Docker command when deploying, or you can directly enter the connection format of your ES service (as noted below).

Link to ES(database) settings:

from elasticsearch import Elasticsearch, RequestsHttpConnection
es = Elasticsearch(os.environ['ES'],connection_class=RequestsHttpConnection, use_ssl=True,verify_certs=False,send_get_body_as='POST' )
//ES is the variabe of ['http://elastic:<key>@<Domain>']

Search processing example (retrieved by id)

The API format is as follows. After the "?" symbol is the parameter used to set the DSL syntax search. For how to query in Elastisearch, please refer to:Link1Link2

<yourDomain>/api/GetByUserId?user_id=String&start=timeStamp&end=timeStamp?size=25?from=1
 //
<yourDomain>/api/GetByContent?Content=String&start=timeStamp&end=timeStamp?size=25?from=1

The following variables are used:

  1. user_id:author id
  2. content:In-text search (keywords)
  3. start:Date start point (in Timestamp format)
  4. end:Date end point (in Timestamp format)
  5. size:Show several data
  6. from:From the first few data

After the data is processed and packaged into a DSL, the search function of ES will be used to send it to the DB, and finally it will be sent back to the front end in JSON format:

dsl = {
 "size": size,
 "from": page,
 "query": {
   "bool": {
     "must": [
       {
         "match": {
           "content": content
         }
       },
       {
         "match": {
           "article_title": content
         }
       },
       {
         "range": {
           "date": {
             "gt": start,
             "lt": end
           }
         }
       }
     ],
     "must_not": [],
     "should": []
   }
 },
 "sort": [
   {
     "date": {
       "order": "desc"
     }
   }
 ]
}
   res=es.search(index="article2", body= dsl)
   return json.dumps(res['hits'], indent=2, ensure_ascii=False)

See the code comments for the rest of the details

Wrap the whole set of front and back into Docker image

Dockerfile

Write a Dockerfile to package the built front-end project and Flask into an image.

official documentation

Dockerfile

FROM python:3.6.9 #Base Image Environment version

WORKDIR /PTTapp #The working directory to be created

ADD . /PTTapp  #Copy files and folders to the specified location

RUN pip install -r requirements.txt
# Each RUN command adds a new layer on top of the existing image and is the command that is executed during the build process of the image.
CMD python main.py # instructions executed at runtime in container

requirements.txt:The version of the kit used.

After completion, use the command Build to generate a docker image, under the Flask folder, Terminal input

docker image build -t <Account>/<Image Name>:<Version Tag> .
//docker Account can be omitted unless you need to push to DockerHub

When finished, you can type docker images in Terminal or see the file you just created on the application.

Execute

Use the docker run command to put the service on the Container.

docker run -d -e"ES=http://elastic:[email protected] port of DB" -p 80:9527 --name newcontainer kenny2330/pttwebapp:ver_1.0

Instruction parameters:
-e Parameter setting, the ES mentioned earlier is used to set the Domain and password of the database
-p Bind the host's port to the container's port
-name Name your Container

ptt-search-engine's People

Contributors

kenny75557 avatar hackmd-deploy avatar whjuan avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.