Coder Social home page Coder Social logo

tencentcloud-exporter's People

Contributors

himer avatar juexiaolin avatar leoquote avatar shitoumomo avatar sophie-okk avatar zero3233 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

tencentcloud-exporter's Issues

周期reload实例列表不生效

现象: Prometheus内搜不到特定label的指标

过程:

  • 2021-07-20 启动exporter
  • 2021-07-21 购买一个Redis集群
  • 2021-08-12 发现没有新Redis集群的指标,统计指标数量发现少了一个实例
    查看启动日志发现部分namespace没有打印 reload ${namespace} instances every 300 minutes
    但重启后所有配置的namespace都有打印这一句

日志:

初始化日志
level=info ts=2021-07-20T08:57:08.074Z caller=qcloud_exporter.go:86 msg="Starting qcloud_exporter" version="(version=, branch=, revision=)"
level=info ts=2021-07-20T08:57:08.074Z caller=qcloud_exporter.go:87 msg="Build context" build_context="(go=go1.16.5, user=, date=)"
level=info ts=2021-07-20T08:57:08.080Z caller=qcloud_exporter.go:94 msg="Load config ok"
level=info ts=2021-07-20T08:57:08.634Z caller=cache.go:76 msg="Reload metric meta cache" namespace=QCE/CMONGO num=34
level=info ts=2021-07-20T08:57:09.521Z caller=cache.go:104 msg="Reload instance cache" num=4 changed=4
level=info ts=2021-07-20T08:57:09.522Z caller=product.go:227 msg="Init all query ok" Namespace=QCE/CMONGO numMetric=34 numSeries=160
level=info ts=2021-07-20T08:57:09.522Z caller=collector.go:117 msg="Create product collecter ok" Namespace=QCE/CMONGO
level=info ts=2021-07-20T08:57:09.522Z caller=collector.go:124 msg="reload QCE/CMONGO instances every 300 minutes"
level=info ts=2021-07-20T08:57:09.765Z caller=cache.go:76 msg="Reload metric meta cache" namespace=QCE/REDIS_MEM num=75
level=info ts=2021-07-20T08:57:10.187Z caller=cache.go:104 msg="Reload instance cache" num=16 changed=16
level=info ts=2021-07-20T08:57:11.838Z caller=product.go:227 msg="Init all query ok" Namespace=QCE/REDIS_MEM numMetric=26 numSeries=3896
level=info ts=2021-07-20T08:57:11.838Z caller=collector.go:117 msg="Create product collecter ok" Namespace=QCE/REDIS_MEM
level=info ts=2021-07-20T08:57:12.339Z caller=cache.go:76 msg="Reload metric meta cache" namespace=QCE/CDB num=302
level=info ts=2021-07-20T08:57:12.748Z caller=cache.go:104 msg="Reload instance cache" num=135 changed=135
level=info ts=2021-07-20T08:57:12.754Z caller=product.go:227 msg="Init all query ok" Namespace=QCE/CDB numMetric=11 numSeries=1485
level=info ts=2021-07-20T08:57:12.754Z caller=collector.go:117 msg="Create product collecter ok" Namespace=QCE/CDB
level=info ts=2021-07-20T08:57:12.988Z caller=cache.go:76 msg="Reload metric meta cache" namespace=QCE/NAT_GATEWAY num=6
level=info ts=2021-07-20T08:57:13.405Z caller=cache.go:104 msg="Reload instance cache" num=1 changed=1
level=info ts=2021-07-20T08:57:13.405Z caller=product.go:227 msg="Init all query ok" Namespace=QCE/NAT_GATEWAY numMetric=6 numSeries=6
level=info ts=2021-07-20T08:57:13.405Z caller=collector.go:117 msg="Create product collecter ok" Namespace=QCE/NAT_GATEWAY
level=info ts=2021-07-20T08:57:13.405Z caller=collector.go:124 msg="reload QCE/NAT_GATEWAY instances every 300 minutes"
level=info ts=2021-07-20T08:57:13.405Z caller=collector.go:131 msg="Create all product collecter ok" num=4
level=info ts=2021-07-20T08:57:13.405Z caller=qcloud_exporter.go:114 msg="Listening on" address=:9123
后面都是 Start collect ...... Collect done 没有有用的内容
重启后日志
level=info ts=2021-08-12T11:34:35.968Z caller=qcloud_exporter.go:86 msg="Starting qcloud_exporter" version="(version=, branch=, revision=)"
level=info ts=2021-08-12T11:34:36.044Z caller=qcloud_exporter.go:87 msg="Build context" build_context="(go=go1.16.5, user=, date=)"
level=info ts=2021-08-12T11:34:36.091Z caller=qcloud_exporter.go:94 msg="Load config ok"
level=info ts=2021-08-12T11:34:36.734Z caller=cache.go:76 msg="Reload metric meta cache" namespace=QCE/CMONGO num=34
level=info ts=2021-08-12T11:34:37.634Z caller=cache.go:104 msg="Reload instance cache" num=4 changed=4
level=info ts=2021-08-12T11:34:37.635Z caller=product.go:227 msg="Init all query ok" Namespace=QCE/CMONGO numMetric=34 numSeries=160
level=info ts=2021-08-12T11:34:37.635Z caller=collector.go:117 msg="Create product collecter ok" Namespace=QCE/CMONGO
level=info ts=2021-08-12T11:34:37.635Z caller=collector.go:124 msg="reload QCE/CMONGO instances every 300 minutes"
level=info ts=2021-08-12T11:34:37.882Z caller=cache.go:76 msg="Reload metric meta cache" namespace=QCE/REDIS_MEM num=77
level=info ts=2021-08-12T11:34:38.328Z caller=cache.go:104 msg="Reload instance cache" num=16 changed=16
level=info ts=2021-08-12T11:34:39.985Z caller=product.go:227 msg="Init all query ok" Namespace=QCE/REDIS_MEM numMetric=26 numSeries=2720
level=info ts=2021-08-12T11:34:39.985Z caller=collector.go:117 msg="Create product collecter ok" Namespace=QCE/REDIS_MEM
level=info ts=2021-08-12T11:34:40.379Z caller=cache.go:76 msg="Reload metric meta cache" namespace=QCE/CDB num=302
level=info ts=2021-08-12T11:34:40.736Z caller=cache.go:104 msg="Reload instance cache" num=134 changed=134
level=info ts=2021-08-12T11:34:40.744Z caller=product.go:227 msg="Init all query ok" Namespace=QCE/CDB numMetric=11 numSeries=1474
level=info ts=2021-08-12T11:34:40.744Z caller=collector.go:117 msg="Create product collecter ok" Namespace=QCE/CDB
level=info ts=2021-08-12T11:34:40.992Z caller=cache.go:76 msg="Reload metric meta cache" namespace=QCE/NAT_GATEWAY num=8
level=info ts=2021-08-12T11:34:41.195Z caller=cache.go:104 msg="Reload instance cache" num=1 changed=1
level=info ts=2021-08-12T11:34:41.195Z caller=product.go:227 msg="Init all query ok" Namespace=QCE/NAT_GATEWAY numMetric=8 numSeries=8
level=info ts=2021-08-12T11:34:41.195Z caller=collector.go:117 msg="Create product collecter ok" Namespace=QCE/NAT_GATEWAY
level=info ts=2021-08-12T11:34:41.195Z caller=collector.go:124 msg="reload QCE/NAT_GATEWAY instances every 300 minutes"
level=info ts=2021-08-12T11:34:41.195Z caller=collector.go:131 msg="Create all product collecter ok" num=4
level=info ts=2021-08-12T11:34:41.195Z caller=qcloud_exporter.go:114 msg="Listening on" address=:9123

使用的镜像: boringcat/qcloud-exporter:v2.3.0

Dockerfile
FROM golang:alpine as builder

ARG VERSION

RUN set -xe
; [ -z "${VERSION}" ] && apk add --update curl jq
&& VERSION=curl -s https://api.github.com/repos/tencentyun/tencentcloud-exporter/releases/latest | jq -r .name
; VERSION=${VERSION##*v}
&& wget https://github.com/tencentyun/tencentcloud-exporter/archive/refs/tags/v${VERSION}.tar.gz -O /tmp/v${VERSION}.tar.gz
&& tar xf /tmp/v${VERSION}.tar.gz tencentcloud-exporter-${VERSION}
&& cd tencentcloud-exporter-${VERSION}
&& go build -o /qcloud-exporter cmd/qcloud-exporter/qcloud_exporter.go

FROM alpine
COPY --from=builder /qcloud-exporter /usr/local/bin/qcloud-exporter
ENTRYPOINT [ "/usr/local/bin/qcloud-exporter" ]
EXPOSE 9123

部分配置如下:

products:
  - namespace: QCE/REDIS_MEM
    all_instances: true
    extra_labels: [InstanceName, WanIp]
    instance_filters:
        Status: 2
    relod_interval_minutes: 300

关于 RateLimit 参数的疑问

你好,最在在测试这个项目,我们导入了 66 的CDB(MySQL)的Metric,有六台 CDB,ratelimit 配置为 10 的时候提示 超过了每秒频率上限

然后我看了一下代码,按照我的理解 ratelimit 应该是全局的,对 getMonitorDataByMultipleKeys 调用做限制,但是我发现不太管用,然后和我看到 rateLimitCheck 函数中有一条判断,if sleepCount > 某个数值 就不 sleep 了,我的理解是 ratelimit 就不生效了?我尝试把这个参数调整为 1000 秒后,就能够正常拿到数据了。

我对 Go 不太了解,所以请教一下这个参数有什么意义以及 rate limit 是否是全局的?还是只是对单个 goroutine 内生效,谢谢。

能否支持clickhouse

请问能否支持腾讯云clickhouse产品 ,并且列出export的指标和clickhouse文档中的指标对应关系

云数据库mysql 报错

我看文档:https://mc.qcloudimg.com/static/qc_doc/ef1ccf096001bd855aac0cc56d30a9a2/6140.v20161103185912.pdf 现在mysql namespace叫做qce/cdb, 然而用tc_namespace: QCE/CDB,会直接报错:

FATA[0000] not support product [cdb]  yet, need monitor api code.  source="qcloud_exporter.go:56"

改为QCE/mysql 后正常。
根据README:

tc_namespace: xxx/CVM #命名空间(xxx是a-z随意定的名字, 而后面的cvm是固定的,是每个产品的名字)

似乎把QCE换成随便一个名字都可以?

海外的 CDN 数据无法拿到

参见日志:

❯ go run cmd/qcloud-exporter/qcloud_exporter.go --config.file="/Users/chaim/Work/own/tencent_exporter/cdn.yml"
level=info ts=2020-09-09T07:23:49.424Z caller=qcloud_exporter.go:85 msg="Starting qcloud_exporter" version="(version=, branch=, revision=)"
level=info ts=2020-09-09T07:23:49.424Z caller=qcloud_exporter.go:86 msg="Build context" build_context="(go=go1.14.2, user=, date=)"
level=info ts=2020-09-09T07:23:49.425Z caller=qcloud_exporter.go:93 msg="Load config ok"
level=info ts=2020-09-09T07:23:52.878Z caller=cache.go:65 msg="Reload metric meta cache" namespace=QCE/CDN num=8
level=info ts=2020-09-09T07:23:52.878Z caller=product.go:176 msg="Init all query ok" Namespace=QCE/CDN numMetric=8 numSeries=16
level=info ts=2020-09-09T07:23:52.878Z caller=collector.go:109 msg="Create product collecter ok" Namespace=QCE/CDN
level=info ts=2020-09-09T07:23:52.878Z caller=collector.go:112 msg="Create all product collecter ok" num=1
level=info ts=2020-09-09T07:23:52.878Z caller=qcloud_exporter.go:113 msg="Listening on" address=:9123
level=info ts=2020-09-09T07:24:03.121Z caller=collector.go:69 msg="Start collect......" name=QCE/CDN
level=info ts=2020-09-09T07:24:03.121Z caller=collector.go:70 msg="test <--->" QCE/CDN=(MISSING)
level=warn ts=2020-09-09T07:24:05.673Z caller=repository.go:200 msg="Instance has not monitor data" metric=BackOriginFailRate dimension="map[domain:xxx.com projectId:0]"

redis 5s粒度采集节点指标时添加NodeRole tag

func (h *redisMemHandler) getNodeSeries(m *metric.TcmMetric, ins instance.TcInstance) ([]*metric.TcmSeries, error) {
	var series []*metric.TcmSeries

	resp, err := h.nodeRepo.GetNodeInfo(ins.GetInstanceId())
	if err != nil {
		return nil, err
	}

	for _, node := range resp.Response.Redis {
		ql := map[string]string{
			h.monitorQueryKey: ins.GetMonitorQueryKey(),
			"rnodeid":         *node.NodeId,
			"rnoderole":     *node.NodeRole,
		}
		s, err := metric.NewTcmSeries(m, ql, ins)
		if err != nil {
			return nil, err
		}
		series = append(series, s)
	}

	return series, nil
}

public_clb 负载均衡采集数据问题

  • tc_namespace: tc/public_clb
    tc_metric_name: Outtraffic
    tc_metric_rename: out_traffic
    tc_labels: [LoadBalancerVip,ProjectId,LoadBalancerName]
    tc_statistics: [max]
    period_seconds: 60
    delay_seconds: 300
    range_seconds: 120
  • tc_namespace: tc/public_clb
    tc_metric_name: Intraffic
    tc_metric_rename: in_traffic
    tc_labels: [LoadBalancerVip,ProjectId,LoadBalancerName]
    tc_statistics: [max]
    period_seconds: 60
    delay_seconds: 300
    range_seconds: 120
  • tc_namespace: tc/public_clb
    tc_metric_name: Connum
    tc_metric_rename: connum
    tc_labels: [LoadBalancerVip,ProjectId,LoadBalancerName]
    tc_statistics: [max]
    period_seconds: 60
    delay_seconds: 300
    range_seconds: 120

公网负载均衡,数据采集与console查询数据、以及https://console.cloud.tencent.com/api/explorer?Product=monitor&Version=2018-07-24&Action=GetMonitorData&SignVersion=在线接口调用不一致

部分腾讯云监控指标 数据无法获取

Redis集群延迟的指标 数据无法获取,日志报错:
level=warn ts=2020-09-09T08:30:54.080Z caller=repository.go:200 msg="Instance has not monitor data" metric=LatencySetMin dimension=map[instanceid:crs-8xx]
level=warn ts=2020-09-09T08:30:54.080Z caller=repository.go:200 msg="Instance has not monitor data" metric=LatencySetMin dimension=map[instanceid:crs-hxx]
level=warn ts=2020-09-09T08:30:54.080Z caller=repository.go:200 msg="Instance has not monitor data" metric=LatencySetMin dimension=map[instanceid:crs-6xx]
level=warn ts=2020-09-09T08:30:54.080Z caller=repository.go:200 msg="Instance has not monitor data" metric=LatencySetMin dimension=map[instanceid:crs-2xx]
level=warn ts=2020-09-09T08:30:54.991Z caller=repository.go:200 msg="Instance has not monitor data" metric=LatencyOtherMin dimension=map[instanceid:crs-8xx]
level=warn ts=2020-09-09T08:30:54.991Z caller=repository.go:200 msg="Instance has not monitor data" metric=LatencyOtherMin dimension=map[instanceid:crs-hxx]
level=warn ts=2020-09-09T08:30:54.991Z caller=repository.go:200 msg="Instance has not monitor data" metric=LatencyOtherMin dimension=map[instanceid:crs-6xx]
level=warn ts=2020-09-09T08:30:54.991Z caller=repository.go:200 msg="Instance has not monitor data" metric=LatencyOtherMin dimension=map[instanceid:crs-2xx]
level=warn ts=2020-09-09T08:30:54.993Z caller=repository.go:200 msg="Instance has not monitor data" metric=LatencyGetMin dimension=map[instanceid:crs-8xx]
level=warn ts=2020-09-09T08:30:54.993Z caller=repository.go:200 msg="Instance has not monitor data" metric=LatencyGetMin dimension=map[instanceid:crs-hxx]
level=warn ts=2020-09-09T08:30:54.993Z caller=repository.go:200 msg="Instance has not monitor data" metric=LatencyGetMin dimension=map[instanceid:crs-6xx]
level=warn ts=2020-09-09T08:30:54.993Z caller=repository.go:200 msg="Instance has not monitor data" metric=LatencyGetMin dimension=map[instanceid:crs-2xx]
level=warn ts=2020-09-09T08:30:55.249Z caller=repository.go:200 msg="Instance has not monitor data" metric=LatencyMin dimension=map[instanceid:crs-2xx]
level=warn ts=2020-09-09T08:30:55.249Z caller=repository.go:200 msg="Instance has not monitor data" metric=LatencyMin dimension=map[instanceid:crs-8xx]
level=warn ts=2020-09-09T08:30:55.249Z caller=repository.go:200 msg="Instance has not monitor data" metric=LatencyMin dimension=map[instanceid:crs-hxx]
level=warn ts=2020-09-09T08:30:55.249Z caller=repository.go:200 msg="Instance has not monitor data" metric=LatencyMin dimension=map[instanceid:crs-6xx]
level=warn ts=2020-09-09T08:30:55.575Z caller=repository.go:200 msg="Instance has not monitor data" metric=CacheHitRatioMin dimension=map[instanceid:crs-2xx]

另外 CLB 7层负载的监控指标 数据也无法获取 "Instance has not monitor data"

获取CDB监控数据,报错TencentCloudSDKError

level=info ts=2021-09-09T05:14:45.639Z caller=collector.go:72 msg="Start collect......" name=QCE/CDB
level=error ts=2021-09-09T05:14:55.089Z caller=repository.go:148 msg="[TencentCloudSDKError] Code=InvalidParameterValue, Message=there are no valid statistics type, RequestId=d4b2c621-762a-41e8-9870-e96e8978b089"
level=error ts=2021-09-09T05:15:00.024Z caller=repository.go:148 msg="[TencentCloudSDKError] Code=InvalidParameterValue, Message=there are no valid statistics type, RequestId=892c85aa-deab-46f6-aa28-146e25840046"
level=error ts=2021-09-09T05:15:03.204Z caller=repository.go:148 msg="[TencentCloudSDKError] Code=InvalidParameterValue, Message=there are no valid statistics type, RequestId=f87b7462-1c5d-4fd5-8fdd-dd0dd150a861"
level=error ts=2021-09-09T05:15:03.556Z caller=repository.go:148 msg="[TencentCloudSDKError] Code=InvalidParameterValue, Message=there are no valid statistics type, RequestId=f06ba41a-c2e7-4dae-89d2-32eefd95fd90"
level=error ts=2021-09-09T05:15:03.953Z caller=repository.go:148 msg="[TencentCloudSDKError] Code=InvalidParameterValue, Message=there are no valid statistics type, RequestId=0a824971-f5dc-40af-ba07-4076114b610b"
level=error ts=2021-09-09T05:15:15.008Z caller=repository.go:148 msg="[TencentCloudSDKError] Code=InvalidParameterValue, Message=there are no valid statistics type, RequestId=f45a2991-e93b-4b28-b50f-5dcda295e6fe"
level=error ts=2021-09-09T05:15:19.961Z caller=repository.go:148 msg="[TencentCloudSDKError] Code=InvalidParameterValue, Message=there are no valid statistics type, RequestId=0c091660-0fb1-4ef1-a813-a0c289f3cf8d"
level=error ts=2021-09-09T05:15:22.941Z caller=repository.go:148 msg="[TencentCloudSDKError] Code=InvalidParameterValue, Message=there are no valid statistics type, RequestId=072e7f69-b63b-4fd4-928f-8689ca08eb91"
level=error ts=2021-09-09T05:15:23.351Z caller=repository.go:148 msg="[TencentCloudSDKError] Code=InvalidParameterValue, Message=there are no valid statistics type, RequestId=4585bd22-1f45-49c3-b1f0-f8eaef4fc4e9"
level=error ts=2021-09-09T05:15:23.676Z caller=repository.go:148 msg="[TencentCloudSDKError] Code=InvalidParameterValue, Message=there are no valid statistics type, RequestId=5830313c-0613-424f-8886-6147d45527f9"
level=error ts=2021-09-09T05:15:34.810Z caller=repository.go:148 msg="[TencentCloudSDKError] Code=InvalidParameterValue, Message=there are no valid statistics type, RequestId=fe9562fb-d8b7-496f-8ee6-6c7439e49fe9"
level=error ts=2021-09-09T05:15:39.680Z caller=repository.go:148 msg="[TencentCloudSDKError] Code=InvalidParameterValue, Message=there are no valid statistics type, RequestId=41bad52c-cbca-48b4-8029-42c13b37a9ce"
level=error ts=2021-09-09T05:15:42.682Z caller=repository.go:148 msg="[TencentCloudSDKError] Code=InvalidParameterValue, Message=there are no valid statistics type, RequestId=9861bd54-4fec-49fa-8014-a9559c730f58"
level=error ts=2021-09-09T05:15:43.253Z caller=repository.go:148 msg="[TencentCloudSDKError] Code=InvalidParameterValue, Message=there are no valid statistics type, RequestId=4167bc28-a65f-4537-9995-2aae1ce2bf75"
level=error ts=2021-09-09T05:15:43.540Z caller=repository.go:148 msg="[TencentCloudSDKError] Code=InvalidParameterValue, Message=there are no valid statistics type, RequestId=9cc0f77e-3a10-4b9d-bfa0-4426ff3e1b9b"
level=info ts=2021-09-09T05:15:46.414Z caller=collector.go:82 msg="Collect done" name=QCE/CDB duration_seconds=60.774686349

希望支持使用 tag 过滤实例列表

有几个指标是云监控项目就不支持这个维度, 但是exporter 可以在启动的时候用tag 接口请求到实例的列表, 从而生成对应的 dimension , 可以从这个角度实现用 tag 过滤实例列表?

QCE/CDB如何区分InstanceType

针对纬度是InstanceId及InstanceType两个类型的监控指标,现在默认取得是master的InstanceType,如何获取slave的?

比如针对slaveIoRunning这个指标,InstanceType不同导致返回结果存在差异
image

instance_filters配置错误问题

在使用过程中发现instance_filters 定义的是map[string]string
但是示例配置和README.md配置详情中为
instance_filters: // 可选, 在all_instances开启情况下, 根据每个实例的字段进行过滤
- ProjectId: 1
Status: 1
按理解析为[]map[string]string
测试了一些按照[]map[string]string无法解析
按照map[string]string无法过滤,希望可以修复

升级到最新版 V2.19.0 后, cdn 指标抓取失败 msg="this instance may not have metric data"

ts=2022-12-07T14:30:53.855Z caller=repository.go:200 level=debug msg="this instance may not have metric data" metric=Bandwidth dimension="map[domain:xxx projectId:xxx]"

配置文件

credential:
  region: ap-guangzhou
products:
- namespace: QCE/CDN
  only_include_metrics:
    - RequestsHitRate
    - FluxHitRate
    - HttpStatus4xxRate
    - HttpStatus403Rate
    - HttpStatus5xxRate
    - Bandwidth
    - BackOriginBandwidth
    - BackOriginHttp4xx
    - BackOriginHttp5xx
    - BackOriginHttp403
    - BackOriginHttp404
    - BackOriginRequests
  custom_query_dimensions:
  - projectId: xxx
    domain: xxx
  - projectId: xxx
    domain: xxx
  - projectId: xxx
    domain: xxx
  - projectId: xxx
    domain: xxx
rate_limit: 10

无法获取 cdn 数据

  credential:
    region: ap-beijing

  products:
  - namespace: QCE/CDN
    all_metrics: true
    all_instances: true

配置后无法抓取, metric 页面上显示抓取成功, 但是没有对应的指标, 一个也没有

【开源自荐】SolidUI 一句话生成任何图形

本人介绍

本人从事十年年大数据相关工作,做过用户增长,BI,大数据中台,知识图谱,AI中台,擅长大数据AI相关技术栈。在CSDN输出很多专栏,是CSDN博客专家,CSDN大数据领域优质创作者,2018年参与共建WeDataSphere开源社区,社区属性是数据相关综合社区,共建过DataSphereStudio(开发管理集成框架),Exchangis(数据交换工具),Streamis(流式应用开发管理系统),Apache Linkis (计算中间件) 。个人发起SolidUI数据可视化社区。Apache Asia 2022 讲师 ,Hadoop Meetup 2022 讲师,WeDataSphere Meetup 2022讲师。Apache Linkis Committer , EXIN DPO (数据保护官)。

2023年2月开始创业,全职运营SolidUI。

SolidUI介绍

一句话生成任何图形。

随着文本生成图像的语言模型兴起,SolidUI想帮人们快速构建可视化工具,可视化内容包括2D,3D,3D场景,从而快速构三维数据演示场景。SolidUI 是一个创新的项目,旨在将自然语言处理(NLP)与计算机图形学相结合,实现文生图功能。通过构建自研的文生图语言模型,SolidUI 利用 RLHF (Reinforcement Learning Human Feedback) 流程实现从文本描述到图形生成的过程。

SolidUI Gitee https://gitee.com/CloudOrc/SolidUI
SolidUI GitHub https://github.com/CloudOrc/SolidUI
SolidUI 官网地址 https://cloudorc.github.io/SolidUI-Website/
Discord https://discord.gg/NGRNu2mGeQ
SolidUI v0.3.0 发版文章 https://mp.weixin.qq.com/s/KEFseiQJgK87zvpslhAAXw
SolidUI v0.3.0 概念视频 https://www.bilibili.com/video/BV1GV411A7Wn/
SolidUI v0.3.0 教程视频 https://www.bilibili.com/video/BV1xh4y1e7j6/
SolidUI 演示环境 http://www.solidui.top/ admin/admin

QCE/COS metrics fetch fail

ts=2022-08-28T14:25:53.928Z caller=product.go:99 level=error msg="create metric series err" err="Get \"http://cos.ap-guangzhou.myqcloud.com/\": invalid character '<' looking for beginning of value" Namespace=QCE/COS name=InternalTrafficUp

经常遇到抓取超时的问题,如何正确排查

把一个exporter部署到k8s 内部了,用prometheus进行抓取,偶尔就会遇到抓取超时的,自己去抓也偶尔超时,而且响应时间都很规律,都接近30的倍数,30s,60s,之类的

有没有可能是请求腾讯云的时候超时了,过了30s重试?或者网关重试导致的?如果是这种情况,sdk会打印对应的日志吗? 我应该如何排查?

无法获取CDB从库监控数据

报错:
level=debug ts=2022-01-26T08:59:54.420Z caller=repository.go:181 msg="response data point not match series" metric=BytesSent dimension="map[InstanceId:cdbro-k2xxxxxx InstanceType:3]"
排查:
在pkg/metric/series.go文件的27行,生成TcmSeries的id时,使用的ql Labels,为InstanceId。

func NewTcmSeries(m *TcmMetric, ql Labels, ins instance.TcInstance) (*TcmSeries, error) {
	id, err := GetTcmSeriesId(m, ql)
	if err != nil {
		return nil, err
	}

	s := &TcmSeries{
		Id:          id,
		Metric:      m,
		QueryLabels: ql,
		Instance:    ins,
	}
	return s, nil

}

而在pkg/metric/repository.go的buildSamples方法中,使用的ql是从接口返回的points.Dimensions中取得。

func (repo *TcmMetricRepositoryImpl) buildSamples(
	m *TcmMetric,
	points *monitor.DataPoint,
) (*TcmSamples, map[string]string, error) {
	ql := map[string]string{}
	for _, dimension := range points.Dimensions {
		if *dimension.Value != "" {
			ql[*dimension.Name] = *dimension.Value
		}
	}
	sid, e := GetTcmSeriesId(m, ql)
	if e != nil {
		return nil, ql, fmt.Errorf("get series id fail")
	}
	s, ok := m.Series[sid]
	if !ok {
		return nil, ql, fmt.Errorf("response data point not match series")
	}
	samples, e := NewTcmSamples(s, points)
	if e != nil {
		return nil, ql, fmt.Errorf("this instance may not have metric data")
	}
	return samples, ql, nil
}

只读从库接口中返回的数据如下

  "DataPoints": [
      {
        "Dimensions": [
          {
            "Name": "InstanceType",
            "Value": "3"
          },
          {
            "Name": "InstanceId",
            "Value": "cdbro-k2xxxxxx"
          }
        ]

可以看到多了一个InstanceType,导致两次生成的TcmSeries id不匹配。

支持单个实例的所有指标输出?

现在这个exporter似乎只支持当个指标的监控,根据我们定义的tc_metric_name。然而更多的时候,包括prometheus社区的别的exporter,一般都是针对当个实例,但是支持导出这个实例的多种指标。

比如,虚拟机的exporter就支持同时吐出cpu/内存、网络、io等指标,mysql的exporter就支持同时吐出慢查询、qps等指标。
是否考虑推出针对腾讯云单个实例多指标的exporter?

个人真心觉得这个项目该继续下去

各大云商也都标榜开源的,这个项目应该继续下去。推荐腾讯云托管的Prometheus监控服务...but我这种传统用户不能友好的支持,前段时间体验了一下自己接入prometheus mongo redis elastic等指标,也不完整。还特意问了一下支持的有没有这样的exporter....希望能更新下去。现在个人也想体验一下接入.....

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.