Getting Started

Floodwater is based on using Anaconda as an environment manager. This allows the user to have multiple installations on their system without interfering with each other.

Installation

The installation of Floodwater uses a single Python script to manage the environment creation within Anaconda. The script is called install.py and is located in the root directory of the repository.

Pre-requisites

The following are pre-requisites for installing Floodwater:

A system with a Linux distribution (e.g. Ubuntu, Debian, Rocky, etc.)
Anaconda (https://www.anaconda.com/distribution/)
Git (https://git-scm.com/downloads)
A job manager (e.g. SLURM, PBS, etc.)
Access to a MetGet instance.

Note

If you do not have access to an instance, you can contact zcobell@thewaterinstitute.org or begin running your own instance by following the instructions on the metget-server GitHub page.

In the case that your system is based on ARM processors, you will need to build the ecFlow package from source. This is recommended only for experienced users. We have shown that the system works well on both ARM and x86_64 architectures, however, ECMWF only provides pre-built binaries for x86_64.

Installation Steps

The first step to installing Floodwater is to clone the repository:

git clone https://github.com/waterinstitute/floodwater.git

This will create a directory called floodwater in the current working directory. You should then change into the floodwater directory:

cd floodwater

The installation should use the install.py file. The script allows the user to enable/disable certain features of Floodwater. For example, you may only want to have the code of Floodwater installed on your local system so that you can run the GUI.

Full Installation

python install.py --conda-name my-floodwater-env

Warning

For all the good things about Anaconda, the default solver is extraordinarily slow. We highly recommend using the Mamba solver. This can be done by following the instructions here.

Partial Installation

A common use case is to install the full package on a remote system and then install only the essential packages on the local system. The local system can access the remote system over SSH which will allow a much more snappy experience.

python install.py --conda-name my-floodwater-env --minimal

Warning

It is also possible to forward an X11 connection over SSH so that the GUI can be used remotely, however, this will be slow, particularly if the remote system is not on the same network.

Note

Developers will need to re-install the Floodwater package (not the full Anaconda environment) if they make changes to the code. This can be done by running the pip install . from the root directory. This helps to maintain environment isolation during development and allows the user to maintain multiple active Floodwater environments.

Checking the Installation

To confirm that the installation was successful, you should activate the newly create conda environment:

conda activate my-floodwater-env

You can then check that the Floodwater package is installed by running:

floodwater --help

ecFlow Server Setup

Systems based on the ecFlow workflow manager require that a server daemon is running. The server is responsible for managing the workflows and the jobs that are submitted. Note that many of the jobs within the system which aren’t submitted to the job scheduler (i.e. SLURM or PBS) are run directly on the server which is running the daemon. This means that users should be considerate on shared systems and work with system administrators in the event that there are too many concurrent processes on the system. In the past, we have worked with system administrators on large NSF funded resources to provide a virtual machine which is dedicated to running the server daemon.

Environment Variables

To avoid manually specifying the server host and port number during each Floodwater command, we recommend setting the following environment variables:

export ECF_HOST=hostname
export ECF_PORT=port_number

The ECF_HOST variable should be set to the hostname of the system which is running the server daemon. The ECF_PORT variable should be set to the port number which the server. Note that the ECF_HOST variable should be a fully qualified domain name (FQDN) which can be resolved by the client system.

Starting the Server

The server can be started by running the following command:

floodwater server start [--port xxxx]

The port number is optional. If it is not supplied on the command line, the system will search for an environment variable named ECF_PORT. If this variable is not set, you will receive an error.

You can check that the server has become active by running:

floodwater ping

This will ping the server and return the time it took to respond. This command uses environment variables above to establish the connection to the server.

The system will return something that looks like the following:

2023-10-13T08:56:07CDT :: INFO :: floodwater_cli_funcs.py :: floodwater_ping_ecflow :: Success! Ping ecFlow server (myhost.local:3121) succeeded in ~4 ms

Stopping the Server

The server can be stopped by running the following command:

floodwater server stop

Running Your First Job

The system comes packaged with a simple example which runs the ec95d ADCIRC model. This is a very coarse model and should only be used for testing.

The example files are contained within: floodwater/examples/ec95d

You will need to make minor modifications to these files, however, they are designed to be as generic as possible for the purposes of getting started.

Make sure an ecflow server instance is started on your system. Then, the first step is to copy the example files to a new directory:

mkdir my_first_job
cp -r floodwater/examples/ec95d/* my_first_job

Then, modify the yaml and .h files as necessary to fit your systems specification.

Then, to load the job, you can run:

floodwater load adcirc_gfs.yaml

This will load the job into the server. You can check that the job has been loaded by running:

floodwater status

Then, to start the job, you can run:

floodwater start adcirc_gfs

Note that you can load and start the job in a single command by running:

floodwater run adcirc_gfs.yaml

The job will appear in the GUI and you can monitor the progress of the job using the GUI or the status command on the command line.

Tips and Tricks

The ecFlow client can show running instances from a remote system. The best performance is usually archived by running the GUI on your local machine and allowing the client to connect to the remote server over SSH. This is only possible for Mac and Linux systems. For Windows, you will need to use X11 forwarding to view the GUI. This can be done using MobaXterm or PuTTY coupled with an X11 server such as Xming. Note that you will need to be on the same network (virtual or physical) as the remote system which may require running a VPN client.

While not strictly necessary, to use a local ecFlow GUI, you can simply install the Anaconda package for Floodwater and use the command:

floodwater gui

Note

The GUI is not required to run Floodwater. It is only used for monitoring the status of the jobs.

Sometimes, it is necessary to connect to a machine which is behind a load balancer or distribution framework of some kind. This is often the case with HPC systems which have offered to run a virtual machine to manage our ecFlow server instances. The login node of the HPC system will be accessible from the outside world, however, the virtual machine will only be accessible from the login node. There are some tricks that can be done using SSH to forward the connection to the virtual machine.

To open a SSH tunnel to the virtual machine, you can run the following command:

ssh -J username@login.hpc.edu username@vmname -C -N -L 4121:vmname:3121

This will forward the connection to the virtual machine on port 3121 to your local machine on port 4121. You can then use the GUI to connect to the server by setting the host to localhost and the port to 4121. Note that by forwarding to different ports, in this case 4121, you can connect to multiple servers at the same time.

Similar techniques can be used for servers that require an SSH key to connect. You can simliarly open an SSH tunnel to the server and then connect to the server using the GUI.

ssh -i /path/to/key.pem -N -L 3122:login.hpc.edu:3121 username@login.hpc.edu