Overview
The taskqueue package uses PostgreSQL to manage tasks,
projects, and workers. This vignette shows how to install and configure
PostgreSQL on Ubuntu for HPC environments.
PostgreSQL should be installed on a server that all worker nodes can
access.
Why PostgreSQL?
PostgreSQL is chosen for taskqueue because:
-
Concurrent Access: Handles large numbers of
concurrent requests from multiple workers
-
ACID Compliance: Ensures data integrity for task
status updates
-
Reliability: Proven track record for production
workloads
-
Performance: Efficient handling of read/write
operations
-
Open Source: Free and widely supported
Installation on Ubuntu
# Update and install PostgreSQL
sudo apt update
sudo apt install postgresql postgresql-contrib
# Start PostgreSQL
sudo systemctl start postgresql
sudo systemctl enable postgresql
Create Database and User
# Switch to postgres user and create database
sudo -u postgres psql
# Run these commands in the PostgreSQL prompt:
CREATE USER taskqueue_user WITH PASSWORD 'your_password';
CREATE DATABASE taskqueue_db OWNER taskqueue_user;
GRANT ALL PRIVILEGES ON DATABASE taskqueue_db TO taskqueue_user;
\q
Allow worker nodes to connect to PostgreSQL.
1. Edit postgresql.conf
# Edit configuration file (adjust version number if needed)
sudo nano /etc/postgresql/14/main/postgresql.conf
# Find and change:
listen_addresses = '*'
2. Edit pg_hba.conf
# Edit authentication file
sudo nano /etc/postgresql/14/main/pg_hba.conf
# Add this line:
# For specific HPC network (recommended):
host taskqueue_db taskqueue_user 10.0.0.0/8 md5
# Or for all IPs (less secure):
host taskqueue_db taskqueue_user 0.0.0.0/0 md5
3. Restart PostgreSQL
sudo systemctl restart postgresql
4. Open Firewall if Needed
# Allow PostgreSQL port from HPC network
sudo ufw allow from 10.0.0.0/8 to any port 5432
# Or allow from all IPs (less secure):
sudo ufw allow 5432/tcp
R Configuration
On all machines (daily working machines, login nodes and compute
nodes), add these environment variables to ~/.Renviron:
PGHOST=your.database.server.com
PGPORT=5432
PGUSER=taskqueue_user
PGPASSWORD=your_password
PGDATABASE=taskqueue_db
Edit .Renviron:
nano ~/.Renviron
# Add the variables above, then save and exit
Restart R after editing .Renviron.
Install R Packages
# or install the latest development version from GitHub
remotes::install_github("byzheng/taskqueue")
Security Notes
- Use strong passwords
- Restrict IP ranges in
pg_hba.conf to your HPC network
only
- Protect
.Renviron:
chmod 600 ~/.Renviron
Clean Database
If needed, remove all taskqueue data:
The package will recreate tables automatically when needed.