Data Handling in AWS
Computer vision and data scientists work with considerable computational resources including large file sizes requiring large amounts of disk space and powerful GPU processors. When we work on AI projects we often offload compute power and data to cloud resources such as AWS.
A crucial part of working with AWS is to upload and manage the data on the instance. Most of the time, the disk space in the instance does not have enough capacity. To handle large amounts of data we can use Elastic Black Storage (EBS). EBS is an independent drive, that can be connected to the instance.
In Part Two of our five-part series we walked you through the steps to establish a graphical interface and remote desktop connection while using AWS.
In this post, we will create an EBS volume, connect it to the instance, and upload data to it. We should mention that AWS has other storage solutions such as S3, however, since we are going to use this data for training deep-learning models we need quick access to the data and EBS is a better option.
Part 3: Data Handling in AWS
Step 1. Under the menu item ‘Elastic Block Store’ choose ‘Volumes’ and click ‘Create Volume.’
Step 2. Choose the storage size and the same availability zone as the instance we want to connect to. Click ‘Create Volume.’
Step 3. Return to the list of volumes and highlight the new volume, go to the drop-down menu ‘Actions’ and select ‘Attach volume.’ Then choose the instance for which you want to attach the volume.
AWS runs a volume-status check every five minutes. When the status checks are in progress you might get an "insufficient data" notification under the volume status. When the status check is completed the volume status should go back to being "Okay." If this is not the case, you might need to stop and restart the instance for the selected volume.
Step 4. Connect to the instance via SSH and type the following commands.
lsblk
Among the list of devices, you should also see the EBS volume.
1. Create a file system for the volume.
sudo mkfs -t xfs /dev/xvdf
2. Create a folder as a mounting point.
sudo mkdir /aws_example_data
3. Mount the EBS volume to that folder.
sudo mount /dev/xvdf /aws_example_data
4. Add write permissions to the folder.
sudo chmod ugo+w aws_example_data/
Step 5. Copy data from your local machine (Linux only). In your local machine type:
chmod 400 path_to_private_key
rsync -av -e "ssh -I path_to_private_key" path_to_folderubuntu@xx.xxx.xx.xx:/aws_example_data
This will copy the data in the folder ‘path_to_folder’ from your local machine to AWS.
In part 4 of this blog series, we will cover how our data scientists utilize the computational resources on the AWS instance.
Comments