Synchronizing local and remote directories
Ever since deep learning models are on the rise, the compute capability required to train such models is exponentially increasing. Most ML developers often use remote systems to build, test and deploy these huge models. Building and debugging such models remotely can be time-consuming. There are alternatives such as remote development using VSCode, or you can make local changes and deploy them erratically to test and run your models. When I was using PyCharm, it had this incredible feature where any changes in the local system would reflect on the remote server synchronously. It also had an option to set the default interpreter to a remote interpreter. So every time you run the model, it will run it in the remote system. This was comfortable for me. I loved this idea, I didn’t have to run custom models on my system anymore, except for one problem. PyCharm is not a lightweight IDE. I switched to VSCode, and Sublime but none of them had this feature.
In this article, I talk about the a bash function that enables you to automatically deploy changes in your local directory to a remote server. And this doesn’t need any fancy IDE’s.
Prerequisites
- Bash shell
- Remote server must have
rsync
inotify-tools
in local system.
1. Set up your ~/.ssh/config
file
Host aws_gpu
User random
IdentityFile /path/to/your/file/ec2_instance.pem
LocalForward 8888 localhost:8888
HostName ec1-23-456-789.eu-west-2.compute.amazonaws.com
2. Create remote_sync.sh
file
You can give any name to the file.
This file will have our bash function.
# Make sure remote desktop has rsync
syncRemote() {
clear_vars() {
unset SYNC_DESTINATION
unset TEMP_VAR
unset SYNC_SOURCE
}
usage(){
echo 'usage: syncRemote --src /your/path --dst remote:/your/dest/path --ignore "*.pt" --max-size 200m'
}
clear_vars
sync_MAX_SIZE=200m
# ` symbol- command substitution. The `command` construct makes available the output of command for assignment to a variable.
# This is also known as backquotes or backticks.
TEMP_VAR=`getopt -o d:s:, --long dst:,src:,max-size:,ignore: -- "$@"`
eval set -- "$TEMP_VAR"
while true; do
case "$1" in
--src)
case "$2" in
"--"* )
shift 2;;
"")
shift 2;;
*)
SYNC_SOURCE=$2
shift 2;;
esac ;;
--dst)
case "$2" in
"--"* )
echo "Destination cannot be empty ";
echo "Source 2: ""$2";
clear_vars
return 1;;
"")
echo "Destination cannot be empty ";
echo "Source 2: ""$2";
clear_vars
return 1;;
*)
SYNC_DESTINATION=$2
shift 2;;
esac ;;
--max-size)
case "$2" in
"--"* )
echo " Using default max-size: ""$sync_MAX_SIZE"
echo $TEMP_VAR;
shift 1;;
"")
echo " Using default max-size: ""$sync_MAX_SIZE"
echo $TEMP_VAR;
shift 1;;
*)
sync_MAX_SIZE=$2
shift 2;;
esac ;;
--) shift ; break ;;
*)
echo "Unknown Options: ""$@" ;
return 1;;
esac
done
if [ -z "$SYNC_DESTINATION" ]
then
echo "Destination cannot be empty!!"
clear_vars
return 1
fi
if [ -z "$SYNC_SOURCE" ]
then
echo "Source is Empty, using ""$(pwd)"" as source."
$SYNC_SOURCE=$(pwd)
fi
echo 'Watches established in folder -->' "$SYNC_SOURCE"
echo "Maximum size : ""$sync_MAX_SIZE"
while inotifywait -r -e modify,create,delete,move $SYNC_SOURCE; do
rsync -avz -- --max-size="$sync_MAX_SIZE" "$SYNC_SOURCE""/" "$SYNC_DESTINATION"
done
}
3. Source our file
source /path/to/our/file/remote_sync.sh
# or
echo "source /path/to/remote_sync.sh" >> ~/.bashrc
## Be extra careful with >> is append, > is write.
For using this function without sourcing, you can just add
Demo
Sample Usage
syncRemote --src /your/source/directory --dst aws_gpu:/your/destination/directory
# replace aws_gpu and source and destination directory
Limitations
- There’s no proper way to kill the process, so I use
Ctrl+C
to kill the foreground process. It is advisable not to usenohup
with this command because it could be painful to kill the process later. - Old files in remote system persist even if original file is renamed/deleted in local directory.