Storage

User Guide for the BeeGFS-Filesystem

The BeeGFS-Filesystem is mounted at /work. Users can store data on it using so-called workspaces. Each workspace belongs to one user and has an expiration date of a maximum of 100 days in the future. After the expiration date is reached, the workspace and all its data will be deleted (it is recoverable after deletion for 28 days by the admins). This ensures that unused data is not unnecessarily using up all the storage space. Notifications will be sent to users weekly 30 days before the expiration date and daily 7 days before the expiration date so that the user can extend the expiration date of the workspace. The expiration date can be extended 100 times, so users can store data for an almost unlimited time in the same workspace.

The official documentation of the workspace tool can be found here: https://github.com/holgerBerger/hpc-workspace/blob/master/user-guide.md and is summarized below.

Creating a new workspace

A new workspace can be created with

[UID@ui ~]$ ws_allocate <ws-name> <days> -m <email>

where <ws-name> is the name of the created workspace, <days> the remaining number of days until the expiration date, and <email> the email address to which notifications about soon-to-be-expired workspaces will be sent to. The email address needs to be one from the University of Freiburg, i.e. it needs to end with "@physik.uni-freiburg.de", or, for Master and Bachelor students "@<planet>.uni-freiburg.de". The maximum number of remaining days is 100, the default value is 1. After you create the workspace, you will get some information about the newly created workspace:

[UID@ui ~]$ ws_allocate <ws-name> 100 -m <my-mail>
Info: creating workspace.
/work/ws/atlas/<UID>-<ws-name>
remaining extensions  : 99
remaining time in days: 100

You can now access the workspace at /work/ws/atlas/<UID>-<ws-name>

If you don't want to specify the email for notifications every time with the -m flag, you can also create a file at ~/.ws_user.conf with the following content:

mail: <my-mail>

Listing all workspaces

You can list all your workspaces with

[UID@ui ~]$ ws_list

Extending the expiration date

The expiration date can be extended 99 times by a maximum of 100 days per extension.
To extend the workspace, use the following command:

[UID@ui ~]$ ws_extend <ws-name> <days>

Since the maximum number of days until the expiration date of the workspace is always 100, it does not work to extend the expiration date several times in a row. This has to be done every 100 days. In the near future, there will be the possibility to sign up for e-mail notifications which will notify you if a workspace is about to expire.

Deleting a workspace

A workspace can be deleted with

[UID@ui ~]$ ws_release <ws-name>

If you are sure that you don't need the data stored in the workspace, you can delete data beforehand with

[UID@ui ~]$ rm -r /work/ws/atlas/<UID>-<ws-name>/*

Otherwise, the deleted data will be recoverable by the admins within the next four weeks (but the data will also make use of your quota).

Automatic deletion of a workspace

When a workspace has 30 remaining days or less left a notification will be sent to the user (first weekly, for the last 7 days daily) until the workspace is extended. If the user does not take any action the workspace will be deleted automatically. There is a grace period of four weeks, in which the admins can still recover the data. After this, the data is lost permanently.

Sharing data with other users

Data in workspaces can be shared with colleagues. It is recommended to use ACL (Access Control Lists).

Best practices with respect to ACL usage:

Take into account that ACL take precedence over standard Unix access rights
Use a single set of rules at the level of a workspace
Make the entire workspace either read-only or read-write for individual co-workers
Optional: Make the entire workspace read-only for your group, e.g. for large input data
If a more granular set of rules is necessary, consider using additional workspaces
The owner of a workspace is responsible for its content and management

Please note that ls (list directory contents) shows ACLs on directories and files only when run as
ls -l (in long format), as a "plus" sign after the standard Unix access rights.

Examples with regard to "my_workspace":

Command	Action
`getfacl $(ws_find my_workspace)`	List access rights on the workspace named "my_workspace"
`setfacl -Rm u:xy1001:rX,d:u:xy1001:rX $(ws_find my_workspace)`	Grant user "xy1001" read-only access to the workspace named "my_workspace"
`setfacl -Rm u:xy1001:rwX,d:u:xy1001:rwX $(ws_find my_workspace)`	Grant user "xy1001" read and write access to the workspace named "my_workspace"
`setfacl -Rm g:atlXXX:rX,d:g:atlXXX:rX $(ws_find my_workspace)`	Grant group "atlXXX" read-only access to the workspace named "my_workspace"
`setfacl -Rb $(ws_find my_workspace)`	Remove all ACL rights. Standard Unix access rights apply again.

You can also use the ACL to make your /work/ws/atlas workspace readable to user accounts on NEMO. For this, you need to find out the user ID of your NEMO user account and share the workspace with that user.

# 1. Find out your NEMO uid
# do this on NEMO
[fr_<username>@login1 ~]$ getent passwd $USER
fr_<username>:*:<nemo_uid>:<nemo_gid>:<username>@uni-freiburg.de:/home/...

# 2. set ACL to allow RW access to your /work/ws/atlas/... directory
# do this on the BFG
[<username>@ui2 ~]$ setfacl -Rm u:<nemo_uid>:rwX,d:u:<nemo_uid>:rwX $(ws_find <your_atlas_beegfs_work_share>)

Retrieving your current usage

You can obtain your current usage with

beegfs-ctl --getquota --uid <uid>

LOCALGROUPDISK

For large data sets and long-term archiving of valuable data, ATLAS users should use the UNI-FREIBURG_LOCALGROUPDISK, which is part of the dCache storage at the ATLAS-BFG.

Storing data on the LOCALGROUPDISK

The easiest way to replicate (copy) an existing dataset to the LOCALGROUPDISK is using the
Rucio Web Interface at https://rucio-ui.cern.ch/.

To upload new files to the LOCALGROUPDISK follow the instructions at CERN TWiki: Getting Datasets.

Checking your current usage

Use the following command to check how much space you use on the LOCALGROUPDISK:

rucio list-account-usage <cern-username> --rse UNI-FREIBURG_LOCALGROUPDISK

Deleting data from the LOCALGROUPDISK

You can list all datasets which are stored on the LOCALGROUPDISK with the following command.

rucio list-rules --account <cern-username> | grep UNI-FREIBURG_LOCALGROUPDISK > usage.txt

This will create a list "usage.txt" which contains all rules. To delete the datasets from the LOCALGROUPDISK you need the rule ID, which is the 32 digit long hexadecimal number in the first column. If you have a lot of datasets stored you may want to filter the "usage.txt" list you generated in the first step. This can be done for example with

grep <your-expression> usage.txt | cut -d ' ' -f 1 > rucio_rules_final.txt

First the rules match "<your-expression>" are selected. Then only the rule IDs of these datasets are taken and written to "rucio_rules_final.txt".

Now you can start deleting the rules with:

for i in $(cat rucio_rules_final.txt | sort); do echo "$i"; rucio delete-rule $i; done

Calling "rucio delete-rule" will essentially set the expiration date of the rule to now + 1 hour. After this time the datasets are deleted from the LOCALGROUPDISK. If the datasets are available on another storage endpoint they can be still accessed from there. However, if this was the only replica of the dataset, the dataset is lost irrecoverably.

You can find more information about rucio in the ATLAS documentation at https://twiki.cern.ch/twiki/bin/viewauth/AtlasComputing/SoftwareTutorialGettingDatasets.