Welcome to AWS S3 Tools documentation!¶
Introduction¶
AWS S3 Tools is a Python package to make it easier to deal with S3 objects, where you can:
List S3 buckets’ content
Check if S3 objects exist
Read from S3 objects to Python variables
Write from Python variables to S3 objects
Upload from local files to S3
Download from S3 to local files
Delete S3 objects
Move S3 objects
The AWS authentication is done via boto3 package, click here.
Installation¶
You can install AWS S3 Tools from PyPi with pip or your favorite package manager:
pip install aws-s3-tools
Add the -U
switch to update to the current version, if AWS S3 Tools is already installed.
Usage¶
from s3_tools import object_exists
if object_exists("my-bucket", "s3-prefix/object.data"):
# Do magic
pass
else:
print("Object not found")
AWS S3 Modules¶
Delete Module¶
Delete objects from S3 bucket.
-
s3_tools.delete.
delete_keys
(bucket: str, keys: List[str], dry_run: bool = True) → None[source]¶ Delete all objects in the keys list from S3 bucket.
- Parameters
bucket (str) – AWS S3 bucket where the objects are stored.
keys (List[str]) – List of object keys.
dry_run (bool) – If True will not delete the objects.
Examples
>>> delete_keys( ... bucket="myBucket", ... keys=[ ... "myData/myMusic/awesome.mp3", ... "myData/myDocs/paper.doc" ... ], ... dry_run=False ... )
-
s3_tools.delete.
delete_object
(bucket: str, key: str) → None[source]¶ Delete a given object from S3 bucket.
- Parameters
bucket (str) – AWS S3 bucket where the object is stored.
key (str) – Key for the object that will be deleted.
Examples
>>> delete_object(bucket="myBucket", key="myData/myFile.data")
-
s3_tools.delete.
delete_prefix
(bucket: str, prefix: str, dry_run: bool = True) → Optional[List[str]][source]¶ Delete all objects under the given prefix from S3 bucket.
- Parameters
bucket (str) – AWS S3 bucket where the objects are stored.
prefix (str) – Prefix where the objects are under.
dry_run (bool) – If True will not delete the objects.
- Returns
List of S3 keys to be deleted if dry_run True, else None.
- Return type
List[str]
Examples
>>> delete_prefix(bucket="myBucket", prefix="myData") [ "myData/myMusic/awesome.mp3", "myData/myDocs/paper.doc" ]
>>> delete_prefix(bucket="myBucket", prefix="myData", dry_run=False)
Download Module¶
Download S3 objects to files.
-
s3_tools.download.
download_key_to_file
(bucket: str, key: str, local_filename: str) → bool[source]¶ Retrieve one object from AWS S3 bucket and store into local disk.
- Parameters
bucket (str) – AWS S3 bucket where the object is stored.
key (str) – Key where the object is stored.
local_filename (str) – Local file where the data will be downloaded to.
- Returns
True if the local file exists.
- Return type
bool
Examples
>>> read_object_to_file( ... bucket="myBucket", ... key="myData/myFile.data", ... local_filename="theFile.data" ... ) True
-
s3_tools.download.
download_keys_to_files
(bucket: str, keys_paths: List[Tuple[str, str]], threads: int = 5) → List[Tuple[str, str, Any]][source]¶ Download list of objects to specific paths.
- Parameters
bucket (str) – AWS S3 bucket where the objects are stored.
keys_paths (List[Tuple[str, str]]) – List with a tuple of S3 key to be downloaded and local path to be stored. e.g. [(“S3_Key”, “Local_Path”), (“S3_Key”, “Local_Path”)]
threads (int) – Number of parallel downloads, by default 5.
- Returns
A list with tuples formed by the “S3_Key”, “Local_Path”, and the result of the download. If successful will have True, if not will contain the error message. Attention, the output list may not follow the same input order.
- Return type
list of tuples
Examples
>>> download_keys_to_files( ... bucket="myBucket", ... keys_paths=[ ... ("myData/myFile.data", "MyFiles/myFile.data"), ... ("myData/myMusic/awesome.mp3", "MyFiles/myMusic/awesome.mp3"), ... ("myData/myDocs/paper.doc", "MyFiles/myDocs/paper.doc") ... ] ... ) [ ("myData/myMusic/awesome.mp3", "MyFiles/myMusic/awesome.mp3", True), ("myData/myDocs/paper.doc", "MyFiles/myDocs/paper.doc", True), ("myData/myFile.data", "MyFiles/myFile.data", True) ]
-
s3_tools.download.
download_prefix_to_folder
(bucket: str, prefix: str, folder: str, search_str: Optional[str] = None, remove_prefix: bool = True, threads: int = 5) → List[Tuple[str, str, Any]][source]¶ Download objects to local folder.
Function to retrieve all files under a prefix on S3 and store them into local folder.
- Parameters
bucket (str) – AWS S3 bucket where the objects are stored.
prefix (str) – Prefix where the objects are under.
folder (str) – Local folder path where files will be stored.
search_str (str) – Basic search string to filter out keys on result (uses Unix shell-style wildcards), by default is None. For more about the search check “fnmatch” package.
remove_prefix (bool) – If True will remove the the prefix when writing to local folder. The remaining “folders” on the key will be created on the local folder.
threads (int) – Number of parallel downloads, by default 5.
- Returns
A list with tuples formed by the “S3_Key”, “Local_Path”, and the result of the download. If successful will have True, if not will contain the error message.
- Return type
list of tuples
Examples
>>> download_prefix_to_folder( ... bucket="myBucket", ... prefix="myData", ... folder="myFiles" ... ) [ ("myData/myFile.data", "MyFiles/myFile.data", True), ("myData/myMusic/awesome.mp3", "MyFiles/myMusic/awesome.mp3", True), ("myData/myDocs/paper.doc", "MyFiles/myDocs/paper.doc", True) ]
List Module¶
List S3 bucket objects.
-
s3_tools.list.
list_objects
(bucket: str, prefix: str = '', search_str: Optional[str] = None, max_keys: int = 1000) → list[source]¶ Retrieve the list of objects from AWS S3 bucket under a given prefix and search string.
- Parameters
bucket (str) – AWS S3 bucket where the objects are stored.
prefix (str) – Prefix where the objects are under.
search_str (str) – Basic search string to filter out keys on result (uses Unix shell-style wildcards), by default is None. For more about the search check “fnmatch” package.
max_keys (int) – Max number of keys to have pagination.
- Returns
List of keys inside the bucket, under the path, and filtered.
- Return type
list
Examples
>>> list_objects(bucket="myBucket", prefix="myData") [ "myData/myFile.data", "myData/myMusic/awesome.mp3", "myData/myDocs/paper.doc" ]
>>> list_objects(bucket="myBucket", prefix="myData", search_str="*paper*") [ "myData/myDocs/paper.doc" ]
Move Module¶
Move S3 objects.
-
s3_tools.move.
move_keys
(source_bucket: str, source_keys: List[str], destination_bucket: str, destination_keys: List[str], threads: int = 5) → None[source]¶ Move a list of S3 objects from source bucket to destination.
- Parameters
source_bucket (str) – S3 bucket where the objects are stored.
source_keys (List[str]) – S3 keys where the objects are referenced.
destination_bucket (str) – S3 destination bucket.
destination_keys (List[str]) – S3 destination keys.
threads (int, optional) – Number of parallel uploads, by default 5.
- Raises
IndexError – When the source_keys and destination_keys have different length.
ValueError – When the keys list is empty.
Examples
>>> move_keys( ... source_bucket='bucket', ... source_keys=[ ... 'myFiles/song.mp3', ... 'myFiles/photo.jpg' ... ], ... destination_bucket='bucket', ... destination_keys=[ ... 'myMusic/song.mp3', ... 'myPhotos/photo.jpg' ... ] ... )
-
s3_tools.move.
move_object
(source_bucket: str, source_key: str, destination_bucket: str, destination_key: str) → None[source]¶ Move S3 object from source bucket and key to destination.
- Parameters
source_bucket (str) – S3 bucket where the object is stored.
source_key (str) – S3 key where the object is referenced.
destination_bucket (str) – S3 destination bucket.
destination_key (str) – S3 destination key.
Examples
>>> move_object( ... source_bucket='bucket', ... source_key='myFiles/song.mp3', ... destination_bucket='bucket', ... destination_key='myMusic/song.mp3' ... )
Read Module¶
Read S3 objects into variables.
-
s3_tools.read.
read_object_to_bytes
(bucket: str, key: str) → bytes[source]¶ Retrieve one object from AWS S3 bucket as a byte array.
- Parameters
bucket (str) – AWS S3 bucket where the object is stored.
key (str) – Key where the object is stored.
- Returns
Object content as bytes.
- Return type
bytes
- Raises
KeyError – When the key ‘Body’ is missing on the response.
Examples
>>> read_object_to_bytes( ... bucket="myBucket", ... key="myData/myFile.data" ... ) b"The file content"
-
s3_tools.read.
read_object_to_dict
(bucket: str, key: str) → Dict[source]¶ Retrieve one object from AWS S3 bucket as a dictionary.
- Parameters
bucket (str) – AWS S3 bucket where the object is stored.
key (str) – Key where the object is stored.
- Returns
Object content as dictionary.
- Return type
dict
Examples
>>> read_object_to_dict( ... bucket="myBucket", ... key="myData/myFile.json" ... ) {"key": "value", "1": "text"}
-
s3_tools.read.
read_object_to_text
(bucket: str, key: str) → str[source]¶ Retrieve one object from AWS S3 bucket as a string.
- Parameters
bucket (str) – AWS S3 bucket where the object is stored.
key (str) – Key where the object is stored.
- Returns
Object content as string.
- Return type
str
Examples
>>> read_object_to_text( ... bucket="myBucket", ... key="myData/myFile.data" ... ) "The file content"
Upload Module¶
Upload files to S3 bucket.
-
s3_tools.upload.
upload_file_to_key
(bucket: str, key: str, local_filename: str) → str[source]¶ Upload one file from local disk and store into AWS S3 bucket.
- Parameters
bucket (str) – AWS S3 bucket where the object will be stored.
key (str) – Key where the object will be stored.
local_filename (str) – Local file from where the data will be uploaded.
- Returns
The S3 full URL to the file.
- Return type
str
Examples
>>> write_object_from_file( ... bucket="myBucket", ... key="myFiles/music.mp3", ... local_filename="files/music.mp3" ... ) http://s3.amazonaws.com/myBucket/myFiles/music.mp3
-
s3_tools.upload.
upload_files_to_keys
(bucket: str, paths_keys: List[Tuple[str, str]], threads: int = 5) → List[Tuple[str, str, Any]][source]¶ Upload list of files to specific objects.
- Parameters
bucket (str) – AWS S3 bucket where the objects will be stored.
paths_keys (List[Tuple[str, str]]) – List with a tuple of local path to be uploaded and S3 key destination. e.g. [(“Local_Path”, “S3_Key”), (“Local_Path”, “S3_Key”)]
threads (int, optional) – Number of parallel uploads, by default 5.
- Returns
A list with tuples formed by the “Local_Path”, “S3_Key”, and the result of the upload. If successful will have True, if not will contain the error message. Attention, the output list may not follow the same input order.
- Return type
List[Tuple[str, str, Any]]
Examples
>>> upload_files_to_keys( ... bucket="myBucket", ... paths_keys=[ ... ("MyFiles/myFile.data", "myData/myFile.data"), ... ("MyFiles/myMusic/awesome.mp3", "myData/myMusic/awesome.mp3"), ... ("MyFiles/myDocs/paper.doc", "myData/myDocs/paper.doc") ... ] ... ) [ ("MyFiles/myMusic/awesome.mp3", "myData/myMusic/awesome.mp3", True), ("MyFiles/myDocs/paper.doc", "myData/myDocs/paper.doc", True), ("MyFiles/myFile.data", "myData/myFile.data", True) ]
-
s3_tools.upload.
upload_folder_to_prefix
(bucket: str, prefix: str, folder: str, search_str: str = '*', threads: int = 5) → List[Tuple[str, str, Any]][source]¶ Upload local folder to a S3 prefix.
Function to upload all files for a given folder (recursive) and store them into a S3 bucket under a prefix. The local folder structure will be replicated into S3.
- Parameters
bucket (str) – AWS S3 bucket where the object will be stored.
prefix (str) – Prefix where the objects will be under.
folder (str) – Local folder path where files are stored. Prefer to use the full path for the folder.
search_str (str.) – A match string to select all the files to upload, by default “*”. The string follows the rglob function pattern from the pathlib package.
threads (int, optional) – Number of parallel uploads, by default 5
- Returns
A list with tuples formed by the “Local_Path”, “S3_Key”, and the result of the upload. If successful will have True, if not will contain the error message.
- Return type
List[Tuple[str, str, Any]]
Examples
>>> upload_folder_to_prefix( ... bucket="myBucket", ... prefix="myFiles", ... folder="/usr/files", ... ) [ ("/usr/files/music.mp3", "myFiles/music.mp3", True), ("/usr/files/awesome.wav", "myFiles/awesome.wav", True), ("/usr/files/data/metadata.json", "myFiles/data/metadata.json", True) ]
Utils Module¶
General utilities.
-
s3_tools.utils.
object_exists
(bucket: str, key: str) → bool[source]¶ Check if an object exists for a given bucket and key.
- Parameters
bucket (str) – Bucket name where the object is stored.
key (str) – Full key for the object.
- Returns
True if the object exists, otherwise False.
- Return type
bool
Example
>>> object_exists("myBucket", "myFiles/music.mp3") True
Write Module¶
Write variables into S3 objects.
-
s3_tools.write.
write_object_from_bytes
(bucket: str, key: str, data: bytes) → str[source]¶ Upload a bytes object to an object into AWS S3 bucket.
- Parameters
bucket (str) – AWS S3 bucket where the object will be stored.
key (str) – Key where the object will be stored.
data (bytes) – The object data to be uploaded to AWS S3.
- Returns
The S3 full URL to the file.
- Return type
str
- Raises
TypeError – If data is not a bytes type.
Examples
>>> data = bytes("String to bytes", "utf-8") >>> write_object_from_bytes( ... bucket="myBucket", ... key="myFiles/file.data", ... data=data ... ) http://s3.amazonaws.com/myBucket/myFiles/file.data
-
s3_tools.write.
write_object_from_dict
(bucket: str, key: str, data: Dict) → str[source]¶ Upload a dictionary to an object into AWS S3 bucket.
- Parameters
bucket (str) – AWS S3 bucket where the object will be stored.
key (str) – Key where the object will be stored.
data (dict) – The object data to be uploaded to AWS S3.
- Returns
The S3 full URL to the file.
- Return type
str
- Raises
TypeError – If data is not a dict type.
Examples
>>> data = {"key": "value", "1": "text"} >>> write_object_from_dict( ... bucket="myBucket", ... key="myFiles/file.json", ... data=data ... ) http://s3.amazonaws.com/myBucket/myFiles/file.json
-
s3_tools.write.
write_object_from_text
(bucket: str, key: str, data: str) → str[source]¶ Upload a string to an object into AWS S3 bucket.
- Parameters
bucket (str) – AWS S3 bucket where the object will be stored.
key (str) – Key where the object will be stored.
data (str) – The object data to be uploaded to AWS S3.
- Returns
The S3 full URL to the file.
- Return type
str
- Raises
TypeError – If data is not a str type.
Examples
>>> data = "A very very not so long text" >>> write_object_from_text( ... bucket="myBucket", ... key="myFiles/file.txt", ... data=data ... ) http://s3.amazonaws.com/myBucket/myFiles/file.txt