Coder Social home page Coder Social logo

logvision's Introduction

LogVision / 使用大数据的分布式实时日志分析与入侵检测系统

开发文档

版本记录(当前2.0)

  • 2018.12.8 v1.0:原型版本,有bug。
  • 2020.5.9 v2.0:初步完善版本,可以实现预期效果。

问题交流

简介

LogVision是一个整合了web日志聚合、分发、实时分析、入侵检测、数据存储与可视化的日志分析解决方案。聚合采用Apache Flume,分发采用Apache Kafka,实时处理采用Spark Streaming,入侵检测采用Spark MLlib,数据存储使用HDFS与Redis,可视化采用Flask、SocketIO、Echarts、Bootstrap。

本文下述的使用方法均面向单机伪分布式环境,你可以根据需求进行配置上的调整以适应分布式部署。

本系统各模块由个人独立开发,期间参考了一些有价值的文献与资料。本系统还是个人的本科毕业设计。

获得的奖项:2019年全国大学生计算机设计大赛安徽省二等奖、2019年安徽省信息安全作品赛二等奖。

原型版本的介绍视频

系统架构

arch

数据流向

(数字代表处理步骤)
dataflow

入侵检测流程

idsflow

项目结构

  • flask:Flask Web后端
  • spark:日志分析与入侵检测的实现
  • flume:Flume配置文件
  • log_gen:模拟日志生成器
  • datasets:测试日志数据集
  • images:README的图片

依赖与版本

  • 编译与Web端需要用到的:
    • Java 8, Scala 2.11.12, Python 3.8 (包依赖见requirements), sbt 1.3.8
  • 计算环境中需要用到的:
    • Java 8, Apache Flume 1.9.0, Kafka 2.4, Spark 2.4.5, ZooKeeper 3.5.7, Hadoop 2.9.2, Redis 5.0.8

使用说明

在开始之前,你需要修改源码或配置文件中的IP为你自己的地址。具体涉及到flume配置文件、Spark主程序、Flask Web后端。

编译Spark应用

在安装好Java8与Scala11的前提下,在spark目录下,初始化sbt

sbt

退出sbt shell并使用sbt-assembly对Spark项目进行编译打包:

sbt assembly

然后将生成的jar包重命名为logvision.jar

环境准备

你需要一个伪分布式环境(测试环境为CentOS 7),并完成了所有对应版本组件依赖的配置与运行。
使用flume目录下的standalone.conf启动一个Flume Agent。
datasets文件夹中的learning-datasets提交如下路径:

/home/logv/learning-datasets

datasets文件夹中的access_log提交如下路径:

/home/logv/access_log

入侵检测模型训练与测试

提交jar包至Spark集群并执行入侵检测模型的生成与测试:

spark-submit --class learning logvision.jar

你将可以看到如下结果:
idoutput
两个表格分别代表正常与异常数据集的入侵检测结果,下面四个表格可用于判断识别准确率。如图中所示250条正常测试数据被检测为250条正常,识别率100%;250条异常测试数据被检测为240条异常,10条正常,准确率96%。

启动可视化后端

flask目录下执行如下命令,下载依赖包:

pip3 install -r requirements.txt

启动Flask Web:

python3 app.py

启动实时日志生成器

log_gen中的实时日志生成器可根据传入参数(每次写入行数、写入间隔时间)将样本日志中的特定行块追加至目标日志中,以模拟实时日志的生成过程,供后续实时处理。

java log_gen [日志源] [目标文件] [每次追加的行数] [时间间隔(秒)]

提交至环境,编译并运行,每2秒将/home/logv/access_log文件中的5行追加至/home/logSrc中:

javac log_gen.java
java log_gen /home/logv/access_log /home/logSrc 5 2

启动分析任务

提交jar包至Spark集群并执行实时分析任务:

spark-submit --class streaming logvision.jar

查看可视化结果

至此你已经完成了后端组件的配置,通过浏览器访问Web端主机的5000端口可以查看到实时日志分析的可视化结果:
欢迎界面:
welcome
实时日志分析界面:
analysis
实时入侵检测界面:
id

logvision's People

Contributors

xander-wang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

logvision's Issues

blocked web

When multiple users try to access the dashboard, or during runtime of the web framework, web request may get blocked.

关于flask_socketio总是断掉

您好,我按照您的实现方式写了一个while死循环,没过一会儿就断了,然后就重新连接,我是自己写的一个请求,传参这种,为什么会断呢?下面是断掉后打印的log

127.0.0.1 - - [13/Nov/2020 16:34:32] "GET /socket.io/?EIO=3&transport=websocket&sid=077dbe01e0574c94915a00cf01af67b3 HTTP/1.1" 200 0 84.709405
2020-11-13 16:34:33,150-INFO: e2bac2e578e34e9383506d9be7e138a6: Sending packet OPEN data {'sid': 'e2bac2e578e34e9383506d9be7e138a6', 'upgrades': ['websocket'], 'pingTimeout': 60000, 'pingInterval': 25000}
2020-11-13 16:34:33,150-INFO: e2bac2e578e34e9383506d9be7e138a6: Sending packet MESSAGE data 0
127.0.0.1 - - [13/Nov/2020 16:34:33] "GET /socket.io/?EIO=3&transport=polling&t=NN0lUet HTTP/1.1" 200 349 0.001994
client connected.
2020-11-13 16:34:33,164-INFO: e2bac2e578e34e9383506d9be7e138a6: Received packet MESSAGE data 0/dcenter,
2020-11-13 16:34:33,166-INFO: e2bac2e578e34e9383506d9be7e138a6: Sending packet MESSAGE data 0/dcenter
127.0.0.1 - - [13/Nov/2020 16:34:33] "POST /socket.io/?EIO=3&transport=polling&t=NN0lUf4&sid=e2bac2e578e34e9383506d9be7e138a6 HTTP/1.1" 200 219 0.001954
127.0.0.1 - - [13/Nov/2020 16:34:33] "GET /socket.io/?EIO=3&transport=polling&t=NN0lUfC&sid=e2bac2e578e34e9383506d9be7e138a6 HTTP/1.1" 200 194 0.000000
(14528) accepted ('127.0.0.1', 60614)
2020-11-13 16:34:33,473-INFO: e2bac2e578e34e9383506d9be7e138a6: Received request to upgrade to websocket
127.0.0.1 - - [13/Nov/2020 16:34:33] "GET /socket.io/?EIO=3&transport=polling&t=NN0lUfM&sid=e2bac2e578e34e9383506d9be7e138a6 HTTP/1.1" 200 183 0.297349
2020-11-13 16:34:33,476-INFO: e2bac2e578e34e9383506d9be7e138a6: Upgrade to websocket successful
2020-11-13 16:34:58,180-INFO: e2bac2e578e34e9383506d9be7e138a6: Received packet PING data None
2020-11-13 16:34:58,180-INFO: e2bac2e578e34e9383506d9be7e138a6: Sending packet PONG data None
2020-11-13 16:35:23,194-INFO: e2bac2e578e34e9383506d9be7e138a6: Received packet PING data None
2020-11-13 16:35:23,194-INFO: e2bac2e578e34e9383506d9be7e138a6: Sending packet PONG data None
2020-11-13 16:35:48,200-INFO: e2bac2e578e34e9383506d9be7e138a6: Received packet PING data None
2020-11-13 16:35:48,200-INFO: e2bac2e578e34e9383506d9be7e138a6: Sending packet PONG data None
2020-11-13 16:36:13,204-INFO: e2bac2e578e34e9383506d9be7e138a6: Received packet PING data None
2020-11-13 16:36:13,204-INFO: e2bac2e578e34e9383506d9be7e138a6: Sending packet PONG data None
2020-11-13 16:36:38,215-INFO: e2bac2e578e34e9383506d9be7e138a6: Received packet PING data None
2020-11-13 16:36:38,215-INFO: e2bac2e578e34e9383506d9be7e138a6: Sending packet PONG data None
2020-11-13 16:37:03,228-INFO: e2bac2e578e34e9383506d9be7e138a6: Received packet PING data None
2020-11-13 16:37:03,228-INFO: e2bac2e578e34e9383506d9be7e138a6: Sending packet PONG data None
2020-11-13 16:37:28,246-INFO: e2bac2e578e34e9383506d9be7e138a6: Received packet PING data None
2020-11-13 16:37:28,246-INFO: e2bac2e578e34e9383506d9be7e138a6: Sending packet PONG data None
2020-11-13 16:37:53,250-INFO: e2bac2e578e34e9383506d9be7e138a6: Received packet PING data None
2020-11-13 16:37:53,250-INFO: e2bac2e578e34e9383506d9be7e138a6: Sending packet PONG data None
2020-11-13 16:38:18,255-INFO: e2bac2e578e34e9383506d9be7e138a6: Received packet PING data None
2020-11-13 16:38:18,255-INFO: e2bac2e578e34e9383506d9be7e138a6: Sending packet PONG data None
2020-11-13 16:38:43,268-INFO: e2bac2e578e34e9383506d9be7e138a6: Received packet PING data None
2020-11-13 16:38:43,268-INFO: e2bac2e578e34e9383506d9be7e138a6: Sending packet PONG data None
2020-11-13 16:39:08,274-INFO: e2bac2e578e34e9383506d9be7e138a6: Received packet PING data None
2020-11-13 16:39:08,274-INFO: e2bac2e578e34e9383506d9be7e138a6: Sending packet PONG data None
2020-11-13 16:39:33,291-INFO: e2bac2e578e34e9383506d9be7e138a6: Received packet PING data None
2020-11-13 16:39:33,291-INFO: e2bac2e578e34e9383506d9be7e138a6: Sending packet PONG data None
2020-11-13 16:39:58,307-INFO: e2bac2e578e34e9383506d9be7e138a6: Received packet PING data None

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.