Coder Social home page Coder Social logo

webstat's Introduction

数据库

CREATE DATABASE `webstat` DEFAULT CHARACTER SET utf8 COLLATE utf8_general_ci;

CREATE TABLE `event_stats` (
`appid` int(11) NOT NULL default '0',
`name` varchar(32) NOT NULL default '',
`total` int(11) NOT NULL default '0',
`count` int(11) NOT NULL default '0',
`created_at` datetime default NULL,
PRIMARY KEY (`created_at`,`appid`)
);

insert into event_stats(appid, name, total, count, created_at)
values(0, 'button_click', 1, 1, now())
on duplicate key update total=total+1, count=count+1;

时间归一化成5分钟粒度

x = 18
x - (x % 5) // 15
x = 14
x - (x % 5) // 10

容量预估

假设有1000个用户,每个用户200个指标,5分钟粒度保留1天,1小时粒度保留30天,1天粒度保留3年

1分钟粒度:(60 / 5) * 24 * 200 * 1000 = 5.74亿 1小时粒度:24 * 30 * 200 * 1000 = 1.44亿 1天粒度:365 * 3 * 200 * 1000 = 2.19亿 共:5.74 + 1.44 + 2.19 = 9.37亿

如果一条记录100字节(名称32,日期8,appid 4,值4,次数4),约需93G磁盘存储

测试环境启动:

supervisor app.js

正式环境运行:

export NODE_ENV=production
forever start app.js

redis队列设计

key webstat:events为一个list,item格式是appid/name/created_at/value, web应用处理用户请求,用户请求来了直接向webstat:events里lpush数据, 然后有个独立的worker进程,不断去webstat:events里rpop数据,每100条数据一批, 合并相同的"appid,name,created_at"的value,然后写库, 如果rpop数据为空, 则sleep 10秒。

如果worker取出100条数据后,自己挂了,那么丢100条数据,不管了。

LPUSH webstat:events "0/btn_click/2015-01-01/15"
LPUSH webstat:events "0/btn_click/2015-01-01/15"
LRANGE webstat:events 1 10

使用express路由

app.get('/add_event/:appid([0-9]+)/:name([0-9a-zA-Z_-]+)/:value([0-9]+)', function(req, res){                                            
curl localhost:1337/add_event/100/btn_click/100

nginx配置

server {
    listen       80;
    server_name  webstat.ihuhao.com;

    location / {
        proxy_pass   http://127.0.0.1:8004;
        proxy_set_header    Host             $host;
        proxy_set_header    X-Real-IP        $remote_addr;
        proxy_set_header    X-Forwarded-For  $proxy_add_x_forwarded_for;
    }


    location /static/ {
        alias /root/src/webstat/src/static/;
        expires 1h;
    }
}

产品借鉴

mixpanel: profile, event, notification

相关链接

require

async moment config winston mocha express

新表结构

events
    - app_id
    - distinct_id 
    - event_name
    - created_on
    - properties // json, key: value

event_names:
    - app_id
    - event_name

properties
    - app_id
    - event_name
    - property_name

property_values
    - app_id
    - event_name
    - property_name
    - value

示例:

获取某时段某事件的次数
select count(*) from events where app_id = ? and created_on >= ? 
    and created_on <= ?

获取某时段某事件的人数 
select count(distinct(distinct_id)) from events where app_id = ? 
    and created_on >= ? and created_on <= ?

获得不同城市的访客3月份的下载量
events = select properties from events where app_id = ? 
    and created_on >= ? and created_on <= ?

cities = defaultdict(int)
for event in events:
    if 'city' in  event.properties:
        city = event.properties['city']
        cities[city] += 1

return cities 

获得不同城市的访客3月份里每天的下载量
// not imp

获得访客3月份下载资源消耗的总积分
events = select properties from events where app_id = ? 
    and created_on >= ? and created_on <= ?

type = 'sum'
metrics = 'score' 
ret = 0

for event in events:
    if metrics in  event.properties and isdigit(event.properties[metrics]):
        value = int(event.properties[metrics])
        ret += value

return ret

获得北、上、广、深的用户3月份的总计下载量
 events = select properties from events where app_id = ? 
    and created_on >= ? and created_on <= ?

where = "city in ('a', 'a', 'c')
ret = 0

for event in events:
    if 'city' in  event.properties:
        city = event.properties['city']
        if city in (a, b, c):
            ret += 1

return ret


获取某app_id的事件列表
select event_name from event_names where app_id = ?

获取某app_id某事件的属性列表
select property_name from properties where app_id = ? and event_name = ?

获取某app_id某事件的某属性的值列表 
select value from property_values where app_id = ? and event_name = ? and 
    property_name = ?

收集事件
app_id, event_name, properties, created_on = inputs()

if not exists_event_name(app_id, event_name):
    add_event_name(app_id, event_name)

for key, value in properties:
    if not exists_property(app_id, event_name, key):
        add_property_name(app_id, event_name, key)
    if not exists_property_value(app_id, event_name, key, value):
        add_property_value(app_id, event_name, key, value)

add_event(app_id, event_name, properties, created_on)

参考项目

测试,覆盖率 https://github.com/expressjs/cookie-parser mocha --reporter spec --bail --check-leaks test/

== todo

  • 防止单IP请求量过大
  • 防止单请求返回过大数据
  • 存储层单点
  • 计算单元如何利用多核

webstat's People

Contributors

onlytiancai avatar

Stargazers

Ken Xu avatar

Watchers

 avatar James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.