The Task System

A task system is designed as a framework in order to do multi-step work automatically. Its goal is to make the developers focus on the business logic rather than the scheduling bits. One example is to run an experiment of multiple steps: starting the servers, uploading data, and then running the clients. In this case, the developer could declare these sub-steps of the experiment within a task definition and then focus how to start the servers, upload data and run the clients, while the task system takes care of how to execute or schedule the sub-steps of the declared task.

The task system is implemented as an independent Java library. It exposes five interfaces to the developers: ICommand, IContext, IExecution, IExecutionResult and ITaskManager. A task is declared in one XML file. So developers only need to learn these five interfaces and the schema of the task definition.

To provide a preliminary image of the task system before the system design, here is a succinct but complete example, showing how to define a task, submit a task and finally retrieve information of the submitted task.

The task in the example is to stop the Java applications in a list of virtual machines and then save the logs of these Java applications. Within each virtual machine, the Java application should be stopped before saving the logs. But it is reasonable to do this parallelly in all the virtual machines in the list, if the applications and virtual machines are isolated from each other.

As analyzed above, the task could be declared as below in Fig.1, called StopJavaApp. This task depends on another task called CopyFiles.

Fig.1. Definition of task StopJavaAppFig.1. Definition of task StopJavaApp

As shown in Fig.1, the task first validates the parameters, and then stops Java applications before saving logs by iterating the virtual machines parallelly.

In this task, there are two sub-steps with the tag step. And the attribute executor in each step refers to a Java class which implements the exposed interface ICommand. Through the executor, developers could write Java codes to do whatever they want, on condition that the Java codes should be terminable in some time.

Below is an implementation, called RunRemoteShell (Fig.2), of the executor in the step KillingJavaProc(Fig.1). (The implementatin of the step ValidateParameters is quite similar with this one.) Through a ssh conntion to the virtual machine, this implementation executes a Linux shell command to kill the process of the Java application. If the Java application listens on a socket port for control commands, it would be more reliable to use a socket connection to send a stop command to the application, instead of killing the Java process directly. How to stop the Java application is the main concern of and determined by the developers.

As shown in Fig.2, in a Java executor, the normal procedure is retrieving parameters from the context, doing some business logic, writting logs, and returning an exit code. The interfaces ICommand and IContext are involved here.

Fig.2. Java executor RunRemoteShellFig.2. Java executor RunRemoteShell

Until now, the definition of the task StopJavaApp is completed. To define a task, developers need to learn the XML schema of task definition and the interfaces ICommand & IContext.

The Fig.3 below shows how to submit such a task to the task system. The normal procedure is preparing parameters, creating a map of parameters, and finally submitting the task. Developers should make sure that the map of parameters should be consistent with parameter definitions, those attributes valueRef, in the task definition XML file. To submit a task to the task system, developers need to know the definition of the task and the interface ITaskManager.

Fig.3. Submitting a StopJavaApp taskFig.3. Submitting a StopJavaApp task

After submitting a task, the returned uuids could be used to retrieve information about the execution progress of the task. To retrieve or show the execution progress or result of a task, developers need to learn the interfaces IExecution, IExecutionResult and ITaskManager.

The following figures show how a complex task could be traced in a webpage in one real experiment of a chord-based ditributed datastore. The task is called DynamicChordTest, which will trigger another task called StartChordApp.

Fig.4. Webpage for one instance of task DynamicChordTestFig.4. Webpage for one instance of task DynamicChordTest

In Fig.4 above, statuses (RUNNING, SUCCESS, FAILURE or ERROR), starting time and duration of the task and its sub-steps are presented in two tables. It is simple but clear enough to show the progress of the task.

When clicking on the link of step CleanUpStatusDir in Fig.4, the webpage will show the detailed information of that step in Fig.5 below. The information includes the exit code and logs(both std output and error output) if the step is completed.

Fig.5. Webpage for step CleanUpStatusDirFig.5. Webpage for step CleanUpStatusDir

Similarly, the webpage would show detailed information of subTask StartChordApp and loop RunChordServerApps, if the corresponding links in Fig.4 are clicked.

Fig.6. Webpage for one instance of task StartChordAppFig.6. Webpage for one instance of task StartChordApp Fig.7. Webpage for loop RunChordServerAppsFig.7. Webpage for loop RunChordServerApps

If you have finished reading the paragraphs above, you should understand the functionality or effect of the task system and how to use it for task definition, submission and trace.