alibaba / jstorm Goto Github PK
View Code? Open in Web Editor NEWEnterprise Stream Process Engine
Home Page: http://jstorm.io
License: Apache License 2.0
Enterprise Stream Process Engine
Home Page: http://jstorm.io
License: Apache License 2.0
如题
现在的JStorm Console中,如果是一个Component Emit出Tuple到同进程的Component,在Console中是不做统计的。能否加入Local Send/Recv TPS的统计(可以另开一列统计和跨进程TPS统计分开)。
在虚拟机环境下,
当CPU 核小于等于4时, 设置cpu slot数为4
当物理内存小于等于4G是, 则memory slot数为4G/memory_unit_size, 假设memory_unit_size为512M时,则为8, 如果memory_unity_size 为1g时,则为4
显示备用nimbus
环境:JDK1.7u51(alijdk)
在topology.worker.childopts中增加XMN配置项时Worker只有部分启动正常
重连后,zookeeper
/jstorm/nimbus_host
中没有子结点,
nimbus进程仍然正常,且没有进一步的日志
相关日志如下:
[INFO 2014-04-01 14:57:42 NimbusServer:287 main] Successfully init Follower thread
[INFO 2014-04-01 14:57:42 FollowerRunnable:63 Thread-6] Follower Thread starts!
[INFO 2014-04-01 14:58:45 ClientCnxn:1184 main-SendThread(NYSJHL99-85.opi.com:2181)] Client session timed out, have not heard from server in 13335ms for sessionid 0x1451839f3553622, closing socket connection and attempting reconnect
[INFO 2014-04-01 14:58:45 ConnectionStateManager:142 main-EventThread] State change: SUSPENDED
[INFO 2014-04-01 14:58:45 NimbusServer:112 ConnectionStateManager-0] state change to SUSPENDED
[WARN 2014-04-01 14:58:45 DistributedClusterState:63 main-EventThread] Received event Disconnected:None:null with disconnected Zookeeper.
[INFO 2014-04-01 14:58:47 ClientCnxn:1061 main-SendThread(NYSJHL99-85.opi.com:2181)] Opening socket connection to server NYSJHL99-85.opi.com/10.8.64.85:2181
[INFO 2014-04-01 14:58:47 ClientCnxn:950 main-SendThread(NYSJHL99-85.opi.com:2181)] Socket connection established to NYSJHL99-85.opi.com/10.8.64.85:2181, initiating session
[ERROR 2014-04-01 14:59:01 ConnectionState:97 pool-5-thread-1] Connection timed out for connection string (10.8.64.85:2181/jstorm) and timeout (15000) / elapsed (16004)
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss
at com.netflix.curator.ConnectionState.getZooKeeper(ConnectionState.java:94)
at com.netflix.curator.CuratorZookeeperClient.getZooKeeper(CuratorZookeeperClient.java:106)
at com.netflix.curator.framework.imps.CuratorFrameworkImpl.getZooKeeper(CuratorFrameworkImpl.java:401)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:213)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl$3.call(GetChildrenBuilderImpl.java:202)
at com.netflix.curator.RetryLoop.callWithRetry(RetryLoop.java:106)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.pathInForeground(GetChildrenBuilderImpl.java:198)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:190)
at com.netflix.curator.framework.imps.GetChildrenBuilderImpl.forPath(GetChildrenBuilderImpl.java:37)
at com.alibaba.jstorm.zk.Zookeeper.getChildren(Zookeeper.java:172)
at com.alibaba.jstorm.cluster.DistributedClusterState.get_children(DistributedClusterState.java:121)
at com.alibaba.jstorm.cluster.StormZkClusterState.assignments(StormZkClusterState.java:149)
at com.alibaba.jstorm.schedule.MonitorRunnable.run(MonitorRunnable.java:41)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:895)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:918)
at java.lang.Thread.run(Thread.java:662)
[INFO 2014-04-01 14:59:06 ConnectionStateManager:142 main-EventThread] State change: LOST
[WARN 2014-04-01 14:59:06 DistributedClusterState:63 main-EventThread] Received event Expired:None:null with disconnected Zookeeper.
[INFO 2014-04-01 14:59:06 ClientCnxn:1182 main-SendThread(NYSJHL99-85.opi.com:2181)] Unable to reconnect to ZooKeeper service, session 0x1451839f3553622 has expired, closing socket connection
[WARN 2014-04-01 14:59:06 ConnectionState:254 main-EventThread] Session expired event received
[INFO 2014-04-01 14:59:06 ZooKeeper:379 main-EventThread] Initiating client connection, connectString=10.8.64.85:2181/jstorm sessionTimeout=20000 watcher=com.netflix.curator.ConnectionState@7bcd107f
[INFO 2014-04-01 14:59:06 NimbusServer:112 ConnectionStateManager-0] state change to LOST
[INFO 2014-04-01 14:59:06 ClientCnxn:1061 main-SendThread()] Opening socket connection to server /10.8.64.85:2181
[INFO 2014-04-01 14:59:06 ClientCnxn:521 main-EventThread] EventThread shut down
[INFO 2014-04-01 14:59:06 ClientCnxn:950 main-SendThread(NYSJHL99-85.opi.com:2181)] Socket connection established to NYSJHL99-85.opi.com/10.8.64.85:2181, initiating session
[INFO 2014-04-01 14:59:06 ClientCnxn:739 main-SendThread(NYSJHL99-85.opi.com:2181)] Session establishment complete on server NYSJHL99-85.opi.com/10.8.64.85:2181, sessionid = 0x1451839f3553664, negotiated timeout = 20000
[INFO 2014-04-01 14:59:06 ConnectionStateManager:142 main-EventThread] State change: RECONNECTED
[INFO 2014-04-01 14:59:06 DistributedClusterState:66 main-EventThread] Received event SyncConnected:None:null
[INFO 2014-04-01 14:59:06 NimbusServer:112 ConnectionStateManager-0] state change to RECONNECTED
当Bolt占用内存较大发生OutOfMemory后,所在Worker将会被不断的shutdown/start,但是内存没有释放,最终导致SPOUT无法消息,整个集群进入僵尸状态
注释描述与事实不符,修改。
git clone后导入maven project,很多jar包都下不来,可否考虑增加repository声明
加了
clojars.org
http://clojars.org/repo
才把依赖的包下来的
当发生意外时, 比如错误清理本地目录, 会导致后面supervisor无法杀死worker。
因此,当启动worker时, 查找并杀死之前分配在本端口上的老的worker
jstorm依赖的jar太多, 最好能去掉一些不是必须的或可选的
对于排查问题非常重要
storm 中TopologyContext.codeDir ,名字叫codeDir,但实际使用resourceDir
而JStorm codeDir依旧使用codeDir,导致无法支持shellBolt
在sequence-split-merge的例子中,在SequenceTopology类的SetBuilder方法中,由于重复声明了id为Total的Bolt,导致报错。需修改一下id,使其唯一,或删除重复的id。
建议日志的生成最好是根据topology name而不是和topology id拼接,因为topo name已经是唯一的了,这样每次submit topo不会导致生成很多的日志文件。如果怕复写问题:
1.可以以append方式打开追加写
2.将上次的日志文件合并到名称为今天的日志文件中,并创建一个新文件
如题,方便大伙协同开发。
aloha kill XxxTopology 之后,各个 task 的 close/cleanup 没有被调用。过了大概1分钟,才调用 close/cleanup。
正确逻辑应该是 kill 后马上调用 close/cleanup,让task可以优雅关闭。等超时截止如果 task 未结束,再强制杀死。
可以提供与 storm 类似的 list 命令,列出当前 topology、检查 topology 是否存在。
例如:XxxTopology-15-1395993428-worker-6803.log。
建议简化为 XxxTopology-worker-6803.log。定位日志会更方便。
-rw-rw-r-- 1 admin admin 106723 Mar 28 16:46 XxxTopology-15-1395993428-worker-6803.log
-rw-rw-r-- 1 admin admin 65415 Mar 28 17:16 XxxTopology-16-1395996768-worker-6803.log
-rw-rw-r-- 1 admin admin 2338088 Mar 28 22:21 XxxTopology-17-1395998252-worker-6803.log
-rw-rw-r-- 1 admin admin 3122193 Mar 27 10:54 XxxTopology-31-1395834859-worker-6803.log
线程中的线程标识state
比如:
class:com.alibaba.jstorm.daemon.nimbus.TopologyAssign
code:
public void cleanup() {
runFlag = false;
}
boolean runFlag = false;
问题描述:这里的线程标识runFlag添加volatile关键字
解决方法:volatile boolean runFlag = false;
源码中只看见在system topology中增加 system stream怎么没有增加systemBolt?谢谢
如题
既然在fail时已经提供了IFailValueSpout,那么同时也应该提供IAckValueSpout。
对消息做一些业务统计时非常有用
RT
return JStormUtils.mk_list(index)
应该改为return JStormUtils.mk_list(outTasks.get(index));
有一些storm的配置项在JSTORM中会报错,请支持兼容
建议:
cpu不够,就指明CPU 不够, memory不够就指明memory不够
class com.alibaba.jstorm.daemon.nimbus.NimbusUtils
public static IScheduler mkScheduler(Map conf, INimbus inimbus) {
if (inimbus != null && inimbus.getForcedScheduler() != null)
return inimbus.getForcedScheduler();
else if (conf.get(Config.STORM_SCHEDULER) != null) {
IScheduler result = null;
try {
result = (IScheduler) Class.forName(Config.STORM_SCHEDULER).newInstance();
} catch (Exception e) {
// TODO Auto-generated catch block
LOG.error(Config.STORM_SCHEDULER + " : Scheduler initialize error!");
throw new RuntimeException("Fail to new Scheduler");
}
return result;
} else {
return new DefaultScheduler();
}
}
target code:Class.forName(Config.STORM_SCHEDULER)
to code:Class.forName(conf.get(Config.STORM_SCHEDULER) )
WEBUI中LOGVIEW目前是通过每个节点开放7621端口来获取,但是很多时候STORM的处理节点的访问是物理隔离的。因此这里最好通过WEBUI服务器来通过代理访问的方式实现LOG VIEW
当域名解析错误时,可以直接使用ip
jstorm.py退出时,可否返回java子进程的返回值,方便脚本调用判断是否有异常。
类似这样:
--- a/jstorm-server/bin/jstorm.py
+++ b/jstorm-server/bin/jstorm.py
@@ -91,7 +91,8 @@ def exec_storm_class(klass, jvmtype="-server", childopts="", extrajars=[], args=
args_str = " ".join(map(lambda s: "\"" + s + "\"", args))
command = "java " + jvmtype + " -Djstorm.home=" + JSTORM_DIR + " " + get_config_opts() + " -Djava.library.path=" + nativepath + " " + childopts + " -cp " +
print "Running: " + command
- os.system(command)
+ ret = os.system(command)
+ sys.exit(ret)
def jar(jarfile, klass, *args):
"""Syntax: [jstorm jar topology-jar-path class ...]
当心跳线程检测出worker 死去后,增加打印出该worker所在的supervisor和port
FindBugs 找到一些bug
如题,有时候受限于工作环境,没有充足的linux机器,能否让jstorm支持Windows平台,以更方便进行相关技术研究.
restart命令会强制所有worker 重启一下,然后还可以动态调整并发
增加限定worker最大memory slot 参数
限定worker最大Cpu slot参数
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.