helix,apache

Fix bugs in RoutingTableProvider metrics and enhance logs to record time spent

Issues:
CurrentStateCache updating snapshot would miss all the existing partitions that having state change.

RoutingTableProvider callback on the main event thread. Time is not accounted in log.

Implement the WAGED rebalancer

Based on the existing components (or the interfaces), we can now implement the initial version of the WAGED rebalancer.

AC:

Implement the rebalancer with unit tests.
Roughly integrate with the rebalance pipeline. The final integration will be done when all the components implementation is ready.

Fine-grained state transition throttling and respect MIN_ACTIVE replica

MIN_ACTIVE replica is only applied in DelayAutoRebalancer and not respected by throttling logic in Helix. Thus if preference list for partition is: Master, Slave, Slave and current state is Master, Slave, Offline. It will generate a state transition (ST) Offline -> Slave. Even if the MIN_ACTIVE replica number is 2, Helix will treat this Offline -> Slave to be Recovery rebalance. We should fix this.

Also we need the logic to have more fine-grained state transition instead of marking all state transition of a partition to same state transition type. For example, the preference list is Master, Slave, Slave and current state is Slave, Slave, Offline. There will be two STs: Slave -> Master and Offline -> Slave. In current computation logic, they are all Recovery STs because this partition is marked as Recovery partition. With new logic, Slave -> Master is Recovery ST and Offline -> Slave will be Load rebalance when MIN_ACTIVE rebalance set to be 2.

No stats fetched for ResourceConfig through ConfigAccessor

If you read ResourceConfig through ConfigAccessor, the stats are not pulled from ZK. Thus the stats data is incorrect with these objects.

We need to add the logic to the ConfigAccessor, not only for ResourceConfig but also other objects.

TASK: Drop all tasks whose requested states are DROPPED

Upon a Participant disconnect, the Participant would carry over from the last session. This would copy all previous task states to the current session and set their requested states as DROPPED (for INIT and RUNNING states).

It came to our attention that sometimes these Participants experience connection issues and the tasks happen to be in TASK_ERROR or COMPLETED states. These tasks would get stuck on the Participant and never be dropped. This issue proposes to add the logic that would get all tasks whose requested states are DROPPED to be dropped immediately.

See: JobDispatcher.java line 441.

Don't ignore exceptions: adding log messages to help check exceptions.

Ex.:

helix/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java

Line 846 in 3e670de

// ignore if it is not a workflow config.

     try {
        WorkflowConfig config = WorkflowConfig.fromHelixProperty(resource.getValue());
        workflowConfigMap.put(resource.getKey(), config);
      } catch (IllegalArgumentException ex) {
        // ignore if it is not a workflow config.
      }

If you catch an exception, you may not want to ignore it. Adding a log message would help engineers to understand what happens.

Implement the Cluster Model data provider for the WAGED rebalancer

Please find the design doc here:
https://github.com/apache/helix/wiki/Design-Proposal---Weight-Aware-Globally-Even-Distribute-Rebalancer

This data provider will help to convert Helix cluster status data into the rebalancer readable Cluster Model object. The result will be used to support rebalance calculation.

AC:

finish the implementation based on the design.
add unit tests for the new data provider.

TASK: Review RuntimeJobDag to prevent jobs that are not ready from being added to readyList

We observed some jobs whose start time (with an execution delay (DelayTime) set) is greater than the current time get scheduled. Investigate the following two issues (they may be related):

Jobs that have been delayed getting scheduled
Jobs whose start time is far out but the field (startTime) might be getting overwritten

TASK: Make sure task states are updated after workflow/job reaches terminal states

Investigate the title. E.g.) If workflow is stopped, do all the tasks ultimately get updated with the STOPPED state?

Don't log and throw: removing redundant logs right before throwing exception

Same log message should not be logged right while an exception is also thrown. Ex.:

helix/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java

Line 337 in 3e670de

LOG.error("Failed to delete job " + job + " from queue " + queue);

The principal "don't log and throw" would be a nice practice for handling exceptions in Java. We would like to keep the logs clean and helpful without redundancy. An exception message is enough to describe the exceptional event. And the stack trace tells you in which class, method, and line the exception was thrown.

Read failure while reading/accessing a non-existent znode

In Zookeeper client, when a znode doesn't exist (NoNodeException) a readFailure will be reported which shouldn't happen.

TASK: Execution delay is not respected for the jobs

Current implementation of Task Framework checks if a Job is ready for the scheduling or not. If not, the job is inserted in the inflight jobs without any delay time restrictions which is not desirable.

Implement POC constraint based algorithm

The issue tracks work of refining the POC work: constraint based rebalance algorithm

POC code:

https://github.com/jiajunwang/helix/tree/constraint-init

Add a cluster level default instance capacity config for the WAGED rebalancer

During the discussion with our potential customers, we noticed that a default instance capacity could be very helpful.

The default capacity will be applied to the instance when the instance level capacity map is empty or not configured.

TASK: Use CurrentState as the source of truth instead of Workflow/JobContext

When there are frequent disconnects to Participants, it is possible to have discrepancy between what's in CurrentState and what's in task contexts. This could leave the task metadata in a bad state where jobs could get stuck. Using CurrentState from all LiveInstances would solve this problem and give the controller the most up-to-date view of all tasks. No extra read would be required because CurrentState is cached.

Replace org.codehaus.jackson with FasterXML/jackson

This issue is the same as https://issues.apache.org/jira/browse/HELIX-747 (it's a bit confusing which issue list should be used for this project as the documentation on https://cwiki.apache.org/confluence/display/HELIX/Contributor+Workflow doesn't seem to be up to date):

The current json lib Helix uses is out of date. We should consider replacing it with a well-maintained lib.

FasterXML/jackson is compatible with the current lib we used. So it could be a good candidate.

The old version of Jackson that Helix currently uses contains multiple CVEs:
CVE-2017-17485, CVE-2017-7525, CVE-2017-15095

Fix unstable test in Helix REST for TestDefaultMonitoringMbeans

Current this test is disabled due to it is unstable. It is better to understand why the MBean is not always showing in the default monitoring and fix test.

Support Property Read API - helix-rest

Currently, there's no endpoint available for users to read propertystore content directly via helix-rest endpoint. Users have to rely on tools to view the data on zookeeper directly.

Change logging level in CallbackHandler

Go through the logs and change some debug logs to info logs since they might help with debugging.

Set instance_weight = 0 in config doesn't mean no partition assigned to the instance

Problem
The issue is raised by our internal client who sets the instance_weight = 0 in instance/config. Instead of expecting zero partitions assigned to it, based on our Helix controller algorithm, there's still a good chance some partitions assigned to the instance.

Code of issue
After investigation, the issue was due to the current CardDealingAlgorithm. It treats instance weight as a comparator factor or preference, rather than respecting the weight = 0.

https://github.com/apache/helix/blob/d5bf3ad410e0a54b1e79d9570d3e7897c0e2c947/helix-core/src/main/java/org/apache/helix/controller/rebalancer/strategy/crushMapping/CardDealingAdjustmentAlgorithmV2.java

Solution
There's no way to strictly follow all the weight setting of instances now. Because the algorithm is not designed for the purpose. But for the extreme case of instance weight = 0, the current workaround solution is add a simple check to skip the assignment of the instance.

Issue testing

This links to #320

Exclude ANY_INSTANCE for customized sibling checks

Current Helix HealthCheck API checks the ANY_INSTANCE resources, which is not necessary. Since ANY_INSTANCE resources only have single partition with 1 replica, there is no need to check sibling health status.

Need to improve the code style & formatting regulation

The current format file helix-style.xml has very limited configured items. This leaves too much flexibility to Helix developers. We shall add more items to it.

The known missing ones:

Wildcard Imports
Annotation wrapping
Could be more.

Please comment on this issue and we will modify the format file accordingly.

Implementation of StateModelDef access in rest 2.0

Current implementation of Rest 2.0 does not support stateModelDef modification. Here, we will implement

delete -- remove the stateModelDef with the input id.

put -- create new statemodeldef if no existing one with same input id

set -- replace the content of node with input id

Make the reservoir used in ZkClientPathMonitor metric configurable

Currently it is hardcoded as a SlidingTimeWindowArrayReservoir which uses a default interval of 1hr. While the reservoir choice is fine (but still should be configurable for upgrade), the window length of 1 hour occupies too much heap space for a busy Prod Helix client. More importantly, one really does not need 1 hour window to compute a histogram stats. Typically 1 min is used.

E.g., Our heapdump here in Uber shows as much as 4G or even higher of heap could be used for these metrics (initiated through ZkHelixPropertyStore). This creates too much memory and GC pressure. As in the manual for the SlidingTimeWindowArrayReservoir, a 1 minute window could used as much as 9M of memory already. So we believe a much lower interval length should be passed in. Even better, the reservoir should be made configurable.

removing DEFAULT_VIEW_CLUSTER_REFRESH_PERIOD from clusterconfig

Performance issues in `InstancesAccessor#getParallelStoppableInstances`

Problems
The implementation of the method

helix/helix-rest/src/main/java/org/apache/helix/rest/server/resources/helix/InstancesAccessor.java

Lines 186 to 197 in 7ac3a6a

getZoneBasedInstances(instances, orderOfZone, clusterTopology.toZoneMapping());

for (String instance : zoneBasedInstance) {

StoppableCheck stoppableCheckResult =

instanceService.getInstanceStoppableCheck(clusterId, instance, customizedInput);

if (!stoppableCheckResult.isStoppable()) {

ArrayNode failedReasonsNode = failedStoppableInstances.putArray(instance);

for (String failedReason : stoppableCheckResult.getFailedChecks()) {

failedReasonsNode.add(JsonNodeFactory.instance.textNode(failedReason));

}

} else {

stoppableInstances.add(instance);

}

has several performance issues:

It's not actually run in parallel, think about each instance stoppable check has some lags and these will pile up in a large cluster.
There're potential lots of duplicate requests because the current implementation of InstanceServiceImpl won't memorize the response of each query. In production, the performance will downgrade very badly. The duplicate requests are known mostly coming from partitions stoppable checks
E.g

helix/helix-rest/src/main/java/org/apache/helix/rest/server/service/InstanceServiceImpl.java

Lines 131 to 143 in 7ac3a6a

public StoppableCheck getInstanceStoppableCheck(String clusterId, String instanceName,

String jsonContent) throws IOException {

LOG.info("Perform instance level helix own health checks for {}/{}", clusterId, instanceName);

Map<String, Boolean> helixStoppableCheck = getInstanceHealthStatus(clusterId, instanceName,

InstanceService.HealthCheck.STOPPABLE_CHECK_LIST);

StoppableCheck result =

new StoppableCheck(helixStoppableCheck, StoppableCheck.Category.HELIX_OWN_CHECK);

if (!result.isStoppable()) {

return result;

}

LOG.info("{} passed helix side health checks", instanceName);

return performCustomInstanceChecks(clusterId, instanceName, getCustomPayLoads(jsonContent));

Solutions

Parallelism executing the instance checks
Cache the result of http request without submitting the same type of http requests again.

TASK: Remove Idealstate from task framework

Current implementation of task framework involves utilization of ideal state which is not necessary anymore. Ideal State can be removed.

Invoke rebalance by "touching" IdealState/ResourceConfig

Current HelixDataAccesor updateProperty uses ZNRecordUpdater. It's merge logic just simply adding all elements when do a merge for ZNRecord. That could cause lot of duplication of listFields.
This impact the invokeRebalanceForResourceConfig. The fix will be implementing a customized updater.
"id" : "Test",
"simpleFields" : {
},
"mapFields" : {
},
"listFields" : {
"0" : [ "1", "2", "3", "1", "2", "3", "1", "2", "3", "1", "2", "3", "1", "2", "3", "1", "2", "3", "1", "2", "3", "1", "2", "3" ]
}
}

Fix config classes' equals()

ClusterConfig, InstanceConfig, ResourceConfig's equals() methods were only comparing the IDs, which is not desirable because it is possible to have two instances of each with the same IDs but different contents/fields. This diff improves the equals() methods for these configs so that the actual contents are compared as well.

Change the way Helix triggers rebalance in purgeExpiredJobs

Currently, we use ResourceConfigs to trigger a rebalance. But this is causing Helix to write a ResourceConfig back to ZK, which is not desirable when a mixed-mode rebalance is happening (user may create a non-task ResourceConfig for custom preferenceLists).

AC:

Use an alternative way to trigger a controller rebalance.

See the code below:
if (expiredJobs.size() > 0) { // Update workflow context will be in main pipeline not here. Otherwise, it will cause // concurrent write issue. It is possible that jobs got purged but there is no event to // trigger the pipeline to clean context. HelixDataAccessor accessor = manager.getHelixDataAccessor(); List<String> resourceConfigs = accessor.getChildNames(accessor.keyBuilder().resourceConfigs()); if (resourceConfigs.size() > 0) { RebalanceScheduler.invokeRebalanceForResourceConfig(manager.getHelixDataAccessor(), resourceConfigs.get(0)); } else { LOG.warn( "No resource config to trigger rebalance for clean up contexts for" + expiredJobs); }

We have an alternative way to trigger a rebalance instead of having to "touch" a resource config. @lei-xia coded this up as part his P2P messaging fix. Let's use that instead.

Issue when client ONLY sets cluster level ANY throttle config

The current implementation has a loophole when

When client ONLY sets cluster level ANY throttle config (no specific type throttle)
The config won't be respected when charging load_rebalance or recovery_rebalance types

The root cause is the implementation relies on the counters in the hashmap, where each type including ANY is of different entry.

Improve Helix REST partition level read operation

In REST partition read, we reads partition level health status one by one. It increased the call the ZK. We should improve it with batch call.

TASK: Fix ZkClient's "Failure to delete..." for task contexts

When we delete a workflow the following log was printed repeatedly (reproducible):

2019/08/06 23:46:58.152 WARN [ZkClient] [HelixController-pipeline-task-XXX] [helix] [] Failed to delete path /XXX/PROPERTYSTORE/TaskRebalancer/WORKFLOW_NAME! org.I0Itec.zkclient.exception.ZkException: org.apache.zookeeper.KeeperException$NotEmptyException: KeeperErrorCode = Directory not empty for /XXX/PROPERTYSTORE/TaskRebalancer/WORKFLOW_NAME

We see a reproducible deletion problem with this use case, so this log might be a clue to what is exactly going on.

Leader controller loses all the callback handlers after leadership switch

A problem was found that the leader controller may lose all the callback handlers after leadership switch.
To reproduce the issue, the cluster must be using leader election mode (DistributedLeaderElection). Then frequent leadership switch caused by ZK session expiring may trigger the problem.

The symptom is that, although the leader controller exists, it won't process any ZK notification. So the cluster will not be managed.

Remove unnecessary touch logic when updating ResourceConfig

Since controller will be notified when resourceconfig, it is not necessary to do touch logic once resourceconfig is updated. Hence some of the places issuing touch logic can be removed.

Improve WorkflowConfig.fromHelixProperty()

Problem:
fromHelixProperty() throws IllegalArgumentException if the HelixProperty is not a WorkflowConfig.

helix/helix-core/src/main/java/org/apache/helix/task/WorkflowConfig.java

Lines 318 to 324 in 1063e7a

    
           public static WorkflowConfig fromHelixProperty(HelixProperty property) 
        
               throws IllegalArgumentException { 
        
             Map<String, String> configs = property.getRecord().getSimpleFields(); 
        
             if (!configs.containsKey(WorkflowConfigProperty.Dag.name())) { 
        
               throw new IllegalArgumentException( 
        
                   String.format("%s is an invalid WorkflowConfig", property.getId())); 
        
             }

And getWorkflows() uses the way of catching an exception to check whether or not the property is a WorkflowConfig.

helix/helix-core/src/main/java/org/apache/helix/task/TaskDriver.java

Lines 770 to 780 in 1063e7a

    
           public Map<String, WorkflowConfig> getWorkflows() { 
        
             Map<String, WorkflowConfig> workflowConfigMap = new HashMap<String, WorkflowConfig>(); 
        
             Map<String, ResourceConfig> resourceConfigMap = 
        
                 _accessor.getChildValuesMap(_accessor.keyBuilder().resourceConfigs()); 
        
             for (Map.Entry<String, ResourceConfig> resource : resourceConfigMap.entrySet()) { 
        
               try { 
        
                 WorkflowConfig config = WorkflowConfig.fromHelixProperty(resource.getValue()); 
        
                 workflowConfigMap.put(resource.getKey(), config); 
        
               } catch (IllegalArgumentException ex) { 
        
                 // ignore if it is not a workflow config.

Solutions:

Add boolean isWorkflowConfig(HelixProperty property) to check so catch exception is avoided.
Use Optional. Implement Optional<WorkflowConfig> fromHelixProperty(HelixProperty property)
Use Optional like below, which might look more intuitive to me than null/exception.

  public static Optional<WorkflowConfig> parseWorkflowConfig(HelixProperty property) {
    Map<String, String> configs = property.getRecord().getSimpleFields();
    if (!configs.containsKey(WorkflowConfigProperty.Dag.name())) {
      return Optional.empty();
    }
    return Optional.of(Builder.fromMap(configs).setWorkflowId(property.getId()).build());
  }

    for (Map.Entry<String, ResourceConfig> resource : resourceConfigMap.entrySet()) {
      String key = resource.getKey();
      ResourceConfig resourceConfig = resource.getValue();
      WorkflowConfig.parseWorkflowConfig(resourceConfig)
           .ifPresent(w -> workflowConfigMap.put(key, x));
    }

Previous discussions in #357:

#357 (comment)
narendly: As for parseWorkflow - I think it will be cleaner if we could have something like isWorkflowConfig() instead that returns T/F, and based on the result, include it here or not. This way, we do not return null (a code smell), and there are other places in the codebase where we could re-use isWorkflowConfig.

narendly: I don't think using Optional just for the sake of removing null is appropriate. I wouldn't think it would help with performance either. You could call isWorkflowConfig and parse if it is true, and skip if false, thereby doing minimal amount of parsing needed overall.

narendly: I think Optional would be beneficial if you want to have fluid chained-calls (similar to functional programming). Here, it is possible to avoid optionality/returning null by checking for the presence of a field. So using Optional is not necessary because you do not need to introduce null in the first place. It would also incur performance/memory overhead. -Hunter

narendly: Optional is a wrapper for its underlying objects, so it uses more memory and adds the work of boxing/unboxing. I am not claiming that this is significant, but it is a cost we do not have to incur.

#357 (comment)
i3wangyi: I'm in full support of using Optional.empty() instead of null. There're numerous articles on the internet saying NPE is a million-dollar mistake, the creation of Optional is just for solving the problem and the wrapper always reminds the developer the possibility of the object being empty. One can search for numerous supports online while the overhead of the optional object itself is simply negligible

Welcome to discuss...

Kickoff for the Weight Aware Globally Even Distribute Rebalancer

We will soon start the development of the new rebalancer. https://github.com/apache/helix/wiki/Design-Proposal---Weight-Aware-Globally-Even-Distribute-Rebalancer

This issue is created for tracking the preparation.

Create a branch and check in the initial code based on the POC.
Create all the related tasks in the form of issues and assign the work.

AutoFallbackPropertyStore make cache path to be null

Current AutoFallbackPropertyStore instantiation mark the zkCachePath to be null. So the caching functionality is not work for this type of ProertyStore.

java.lang.IllegalStateException: null is thrown when resource is disabled.

When CRUSH based rebalance strategy algorithm is used, if the related resource isn't enabled yet, Helix will throw IllegalStateException. The below is an example from one of the integration tests in Pinot code base:

2019/06/22 09:05:16.929 WARN [BestPossibleStateCalcStage] [HelixController-async_tasks-OfflineClusterIntegrationTest] Event aaadf5d1_DEFAULT : Failed to calculate best possible states for 1 resources.
2019/06/22 09:05:21.954 WARN [AutoRebalancer] [HelixController-pipeline-default-OfflineClusterIntegrationTest] Resource leadControllerResource has tag controller but no configured participants have this tag
2019/06/22 09:05:21.954 ERROR [CRUSHPlacementAlgorithm] [HelixController-pipeline-default-OfflineClusterIntegrationTest] 1 nodes of type INSTANCE were requested but the tree has only 0 nodes!
2019/06/22 09:05:21.955 ERROR [BestPossibleStateCalcStage] [HelixController-pipeline-default-OfflineClusterIntegrationTest] Event 512cdf47_DEFAULT : Error computing assignment for resource leadControllerResource. Skipping.
java.lang.IllegalStateException: null
	at org.apache.helix.controller.rebalancer.strategy.crushMapping.CRUSHPlacementAlgorithm$Selector.select(CRUSHPlacementAlgorithm.java:308) ~[helix-core-0.9.0.jar:0.9.0]
	at org.apache.helix.controller.rebalancer.strategy.crushMapping.CRUSHPlacementAlgorithm.select(CRUSHPlacementAlgorithm.java:119) ~[helix-core-0.9.0.jar:0.9.0]
	at org.apache.helix.controller.rebalancer.strategy.CrushRebalanceStrategy.doSelect(CrushRebalanceStrategy.java:174) ~[helix-core-0.9.0.jar:0.9.0]
	at org.apache.helix.controller.rebalancer.strategy.CrushRebalanceStrategy.select(CrushRebalanceStrategy.java:140) ~[helix-core-0.9.0.jar:0.9.0]
	at org.apache.helix.controller.rebalancer.strategy.CrushRebalanceStrategy.computePartitionAssignment(CrushRebalanceStrategy.java:92) ~[helix-core-0.9.0.jar:0.9.0]
	at org.apache.helix.controller.rebalancer.strategy.CrushRebalanceStrategy.computePartitionAssignment(CrushRebalanceStrategy.java:48) ~[helix-core-0.9.0.jar:0.9.0]
	at org.apache.helix.controller.rebalancer.strategy.AbstractEvenDistributionRebalanceStrategy.computePartitionAssignment(AbstractEvenDistributionRebalanceStrategy.java:89) ~[helix-core-0.9.0.jar:0.9.0]
	at org.apache.helix.controller.rebalancer.strategy.AbstractEvenDistributionRebalanceStrategy.computePartitionAssignment(AbstractEvenDistributionRebalanceStrategy.java:49) ~[helix-core-0.9.0.jar:0.9.0]
	at org.apache.helix.controller.rebalancer.AutoRebalancer.computeNewIdealState(AutoRebalancer.java:129) ~[helix-core-0.9.0.jar:0.9.0]
	at org.apache.helix.controller.rebalancer.AutoRebalancer.computeNewIdealState(AutoRebalancer.java:51) ~[helix-core-0.9.0.jar:0.9.0]
	at org.apache.helix.controller.stages.BestPossibleStateCalcStage.computeResourceBestPossibleState(BestPossibleStateCalcStage.java:245) ~[helix-core-0.9.0.jar:0.9.0]
	at org.apache.helix.controller.stages.BestPossibleStateCalcStage.compute(BestPossibleStateCalcStage.java:121) ~[helix-core-0.9.0.jar:0.9.0]
	at org.apache.helix.controller.stages.BestPossibleStateCalcStage.process(BestPossibleStateCalcStage.java:77) ~[helix-core-0.9.0.jar:0.9.0]
	at org.apache.helix.controller.pipeline.Pipeline.handle(Pipeline.java:68) ~[helix-core-0.9.0.jar:0.9.0]
	at org.apache.helix.controller.GenericHelixController.handleEvent(GenericHelixController.java:640) ~[helix-core-0.9.0.jar:0.9.0]
	at org.apache.helix.controller.GenericHelixController.access$400(GenericHelixController.java:117) ~[helix-core-0.9.0.jar:0.9.0]

This exception should be cleaned up and the behavior should just be the same as the default rebalance strategy.

Related Pinot issue: apache/pinot#4355

[HELIX-817] Support for multiple instance group tags for Job config

Currently we can set only one instance group tag for a job.

jobCfg.setInstanceGroupTag("INSTANCEGROUPTAG");

It will be really helpful to support support multiple instance group tags for one job so that the job can be run either in group A or group B. Through this, we can support more advanced scheduling use cases like in Node Affinity in Kubernetes [1]

[1] https://kubernetes.io/docs/concepts/configuration/assign-pod-node/

Integrate ChangeDetector with the resource pipeline

Create a stage after ReadClusterDataStage for cluster change detection.

Remove all unused imports

There are many unused imports in project, we should remove them to keep our code clean

CallbackHandler async re-subscribe watcher can potentially miss events

In CallbackHandler, the re-subscription of the watcher for CALLBACK happens asynchronously. If the re-subscription happens after the change of the path, then we will miss the change until the next change happens.

We should use system provided thread Id instead of using a self maintained integer

helix/helix-core/src/main/java/org/apache/helix/task/TaskStateModelFactory.java

Line 56 in aa77f5c

    
           return new Thread(r, "TaskStateModelFactory-task_thread-" + threadId.getAndIncrement());

Let's do something like this:
thread.setName("TaskStateModelFactory-task_thread" + thread.getId())

Convert LinkedList to ArrayList or ArrayDeque while possible to improve performance.

I find that some code is still using LinkedList instead of ArrayList for the List object. Ex.:

helix/helix-admin-webapp/src/main/java/org/apache/helix/webapp/resources/InstancesResource.java

Line 94 in 323fbd0

tagInstanceLists.put(tag, new LinkedList<String>());
helix/helix-core/src/main/java/org/apache/helix/common/caches/ExternalViewCache.java

Line 98 in c98dfa0

refreshProperties(accessor, new LinkedList<>(reloadKeys), new ArrayList<>(cachedKeys),

I believe ArrayList has better performance over LinkedList in most cases, especially when the dataset is big and we don't need to randomly delete data when using an iterator.

If LinkedList is used as a queue, we might want to look at Deque/ArrayDeque, which has better performance over LinkedList. ex.:

helix/helix-core/src/main/java/org/apache/helix/tools/commandtools/ZKLogFormatter.java

Line 117 in 323fbd0

LinkedList<Pair> queue = new LinkedList<Pair>();

In summary, we should avoid using LinkedList as far as possible. Maybe for small dataset, linkedlist and ArrayList have similar performance. If we only target at small dataset in this framework, it should be fine. But how about using ArrayDeque over LinkedList for a Queue implementation?

References:
https://stackoverflow.com/questions/6163166/why-is-arraydeque-better-than-linkedlist
http://brianandstuff.com/2016/12/12/java-arraydeque-vs-linkedlist/

TASK: inspect the logic for forceDelete of workflows

Write integration tests for various cases (different states workflows could be in)
Fix the logic if necessary

Removing the TODO log for computeResourceBestPossibleState function

If debugging NPE is complete, some if the logs can be removed for CRUSH based algorithm (in computeResourceBestPossibleState function).

Multiple problems with 0.9.0

We see the following issues in Pinot deployment:

Controller callback does not happen on a PARTICIPANT helix manager when leadership changes.
Other participants that host regular partitions do not get a state transition after the situation in previous step happens. The only solution is to restart the leader controller.
Task queues are stuck with multiple tasks in IN_PROGRESS state.

Add getWorkflow(long timeout) to TaskDriver

Sometimes ZK hangs in the getWorkflow() call, and we could provide an API with a timeout to handle this at the application level.

AC:

Add getWorkflow(long timeout)

	getZoneBasedInstances(instances, orderOfZone, clusterTopology.toZoneMapping());
	for (String instance : zoneBasedInstance) {
	StoppableCheck stoppableCheckResult =
	instanceService.getInstanceStoppableCheck(clusterId, instance, customizedInput);
	if (!stoppableCheckResult.isStoppable()) {
	ArrayNode failedReasonsNode = failedStoppableInstances.putArray(instance);
	for (String failedReason : stoppableCheckResult.getFailedChecks()) {
	failedReasonsNode.add(JsonNodeFactory.instance.textNode(failedReason));
	}
	} else {
	stoppableInstances.add(instance);
	}

	public StoppableCheck getInstanceStoppableCheck(String clusterId, String instanceName,
	String jsonContent) throws IOException {
	LOG.info("Perform instance level helix own health checks for {}/{}", clusterId, instanceName);
	Map<String, Boolean> helixStoppableCheck = getInstanceHealthStatus(clusterId, instanceName,
	InstanceService.HealthCheck.STOPPABLE_CHECK_LIST);

	StoppableCheck result =
	new StoppableCheck(helixStoppableCheck, StoppableCheck.Category.HELIX_OWN_CHECK);
	if (!result.isStoppable()) {
	return result;
	}
	LOG.info("{} passed helix side health checks", instanceName);
	return performCustomInstanceChecks(clusterId, instanceName, getCustomPayLoads(jsonContent));

	public static WorkflowConfig fromHelixProperty(HelixProperty property)
	throws IllegalArgumentException {
	Map<String, String> configs = property.getRecord().getSimpleFields();
	if (!configs.containsKey(WorkflowConfigProperty.Dag.name())) {
	throw new IllegalArgumentException(
	String.format("%s is an invalid WorkflowConfig", property.getId()));
	}

	public Map<String, WorkflowConfig> getWorkflows() {
	Map<String, WorkflowConfig> workflowConfigMap = new HashMap<String, WorkflowConfig>();
	Map<String, ResourceConfig> resourceConfigMap =
	_accessor.getChildValuesMap(_accessor.keyBuilder().resourceConfigs());

	for (Map.Entry<String, ResourceConfig> resource : resourceConfigMap.entrySet()) {
	try {
	WorkflowConfig config = WorkflowConfig.fromHelixProperty(resource.getValue());
	workflowConfigMap.put(resource.getKey(), config);
	} catch (IllegalArgumentException ex) {
	// ignore if it is not a workflow config.

apache / helix Goto Github PK

helix's People

Stargazers

Watchers

Forkers

helix's Issues

Recommend Projects

Recommend Topics

Recommend Org