Task queue is used widely in software development. It is a mechanism to distribute work across threads or machines. It is also a way to coordinate workers to perform tasks. The scenario could be a web application that needs to process a large number of requests, a data pipeline that needs to process a large amount of data or a scheduled job that needs to run periodically. I have used task queues in different languages. In this post I will recall my experience with task queues that implemented in Python, Java, Node.js and Golang.

Python

I have used Celery for asynchronous tasks in real-time web application built with fastapi. It’s fairly easy to setup and use. I deployed Celery with Redis as broker and backend as we already have Redis in our infrastructure. The task logic is implemented in the Worker which is running in a separate process or container. In the web request handler we call task.delay() to enqueue a task and the task will be enqueued in Redis. The worker will poll the Redis queue and execute the task. The result will be stored in Redis and can be retrieved by calling AsyncResult.get(). It looks simple and straight, but as I experienced other languages and frameworks, I found it is a bit complicated consider the task we were handling, as if you use a relatively effient language like golang or java, you probabaly just need to start a goroutine or a thread to handle the task.

However, you can use celery for other tasks like scheduled jobs or periodic tasks, as Python is still the dominant language in data science and machine learning which has a lot of use cases for scheduled jobs. Celery is still quite popular in these areas.

The competitors of Celery includes Airflow and Rq. Airflow is more like a workflow management system and it is more suitable for data pipeline. It is relatively complicated to setup and use. But the UI is quite handy for many steps workflow. Rq is as its name suggests, a Redis based task queue. It is quite simple and easy to use. If you are using Redis as your default broker and backend, Rq should be a good choice.

Java

Everything about Java seems boring nowadays, but the truth is boring stuff lasts longer. We use a framework called xxl-job. It is a distributed task scheduling framework opensourced 6 years ago. xxl-job only relies on mysql to do the task scheduling so it’s quite simply to set up. It is used by numerous companies including the one I’m serving at. If you check its github issues there are more than 1k issues outstanding but the author very reluctantly reply them. You might wonder if it is a stable library, but in reality hundreds of companies are using it in production. The company I’m serving using it too on very critical tasks. I also know some companys run hundreds of work nodes using this framework.

It somehow reflects why I guess Java is still perhaps the most important language in the industry. A well designed framework like xxl-job can last for years and years. I don’t believe Java is outdated and I would still recommend Java for serious enterprise software development.

Node.js

In my previous post I introduced bullmq which is a Redis based queue for Node.js. I would say it’s a bit like rq in python. Both rely on Redis as broker and backend. I run a scheduler job with bullmq as the library I have to use is written in Node.js. It is quite easy to set up with Pm2 as the process manager to start both the main application and the worker. However I found in some cases the worker process would hung with CPU usage 100%, I’m not so sure if it is an issue of bullmq or nodejs. Same as Python, for running efficiency, I don’t think Node.js is the best choice for task queue unless you have to use it for some reason.

Golang

Finally comes my recently favorite language for small projects- Golang, I thought golang was best for k8s related tools or projects. I found it could also be a good choice for task queue. The idea comes from my research on miniflux which is a self-hosted rss reader written in golang. It can support dozens if not hundreds of feed refreshing in a short time. All it uses if you check the source code is a goroutine pool with postgres as the database. The goroutine pool in miniflux is like

package worker // import "miniflux.app/v2/internal/worker"

import (
	"miniflux.app/v2/internal/model"
	"miniflux.app/v2/internal/storage"
)

// Pool handles a pool of workers.
type Pool struct {
	queue chan model.Job
}

// Push send a list of jobs to the queue.
func (p *Pool) Push(jobs model.JobList) {
	for _, job := range jobs {
		p.queue <- job
	}
}

// NewPool creates a pool of background workers.
func NewPool(store *storage.Storage, nbWorkers int) *Pool {
	workerPool := &Pool{
		queue: make(chan model.Job),
	}

	for i := 0; i < nbWorkers; i++ {
		worker := &Worker{id: i, store: store}
		go worker.Run(workerPool.queue)
	}

	return workerPool
}

As you can see, the pool is a struct which holds a channel of jobs. Staring a new pool would register a number of workers to the pool and the worker will be running in the background. whe a new job joins in there would be one free worker picks it up and process. Perhaps due to the intuitive design of golang, I just found it elegant and easy to understand.

In terms of performance you can checkout the image below

go vs node

The second to last is the miniflux service running feeds refreshing every minutes and the last line is a node service running a bullmq with only one task(not running atm) the red rectangle captures the runing cpu and memory usage of the two. It’s a ten times difference. I can see no reason in such scenario to use nodejs instead of go.

Wrap up

In this post I recalled my experience with task queues in different languages. If I can conclude based on my limited experience, I would say if you have to use Python or Nodejs one async tasks like machine learning/AI with Python or chrome driver related ones with JS, go to a broker based framework like celery, rq and bullmq. It would help reach an acceptable balance on convenience and effiency. For general purpose tasks which needs both efficiency and easy mantainance, Java is always a safe choice. If you need some delightful coding experience without sacrificing performance, just try golang with goroutine.