File System¶
DataRobot’s file system uses containers or “buckets” to store one or more files using a key-value storage approach, where the file’s path is the key and its contents the value. Each container is listed as an item under Data Assets (Data Catalog). We refer to the container as a catalog item.
The following should be kept in mind when working with the DataRobot file system:
- Permissions are attached to the catalog item containing the files. All files inside a catalog item share the same permissions.
- Since the DR file system uses key-value pairs to store files inside containers, directory structures are simulated and may change due to their contents. Most operations in the DataRobot file system support directory paths.
- DR file system does not support empty directories.
- To create directory
Xsimply upload a file to a path that contains the directory name, e.g.X/file.txt. - A directory will be deleted if all files inside a directory are deleted.
- While the DR file system does not support empty directories, a catalog item may be empty.
- The DR file system simulates a top-level directory structure by giving each catalog item its own directory named according to its id. Files inside the catalog item will appear as paths inside its directory.
class datarobot.fs.file_system.DataRobotFileSystem¶
Bases: AbstractFileSystem
fsspec implementation of DataRobot’s file system.
File paths are of the form:
: dr://<catalog_item_id>/path/to/file.txt or <catalog_item_id>/path/to/file.txt
- Variables:
- protocol (
str) – The protocol prefix for the DataRobot file system. Can be removed with_strip_protocol(). - root_marker (
str) – The root path of the DataRobot file system.
- protocol (
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
List all catalog items in the file system:
>>> fs.ls("")
['696935d6d5a04a752419cf6d/', '69691fc3d5a04a752419cf5c/']
Create a new catalog item to hold your files:
>>> catalog_id = fs.create_catalog_item_dir()
>>> fs.put_file("local/path/to/file.txt", f"dr://{catalog_id}/file.txt")
>>> fs.ls(f"dr://{catalog_id}/")
['file.txt']
Find all PDF files you’ve uploaded to your catalog item:
>>> fs.glob(f"dr://{catalog_id}/**/*.pdf")
['696935d6d5a04a752419cf6d/file.pdf', '696935d6d5a04a752419cf6d/finance/fy-2024/budgets/Q2_budget_2024.pdf']
Copy, move or delete your files:
>>> fs.copy(f"dr://{catalog_id}/file.txt", f"dr://{catalog_id}/file_copy.txt")
>>> fs.move(f"dr://{catalog_id}/file_copy.txt", f"dr://{catalog_id}/file_moved.txt")
>>> fs.rm(f"dr://{catalog_id}/file_moved.txt")
Open files for reading or writing:
>>> with fs.open(f"dr://{catalog_id}/new_file.txt", mode="w") as f:
... f.write("Hello, world!")
>>> with fs.open(f"dr://{catalog_id}/new_file.txt", mode="r") as f:
... data = f.read()
... print(data)
Hello, world!
classmethod _strip_protocol(path)¶
Turn path from fully-qualified to DR file system specific.
- Parameters:
path (
str) – File path in the DataRobot file system. - Returns: Validated file path without the protocol prefix.
- Return type:
str
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> DataRobotFileSystem._strip_protocol("dr://12345/path/to/file.txt")
'12345/path/to/file.txt'
>>> DataRobotFileSystem._strip_protocol("dr://12345/path/")
'12345/path/'
>>> DataRobotFileSystem._strip_protocol("dr:///12345/")
'12345/'
>>> DataRobotFileSystem._strip_protocol("dr://")
''
_split_path(path)¶
Split the given path into catalog ID and internal file path. Internal paths can be empty.
- Parameters:
path (
str) – File path in the DataRobot file system. - Returns: A tuple of catalog ID and the internal file path.
- Return type:
Tuple[str,str] - Raises: ValueError – If the path format is invalid.
Examples
>>> fs = DataRobotFileSystem()
>>> fs._split_path("dr://12345/path/to/file.txt")
('12345', 'path/to/file.txt')
>>> fs._split_path("dr:///12345/")
('12345', '')
>>> fs._split_path("12345/folder/")
('12345', 'folder/')
ls(path, detail=True, **kwargs)¶
List files and folders at the given directory path. Use
info() for information about
a specific file.
If detail is True, returns a list of dictionaries with file details including name (path), size and type.
If detail is False, returns a list of file and folder paths as strings.
- Parameters:
- path (
str) – Path in the DataRobot file system to list. - detail (
bool) – Whether to return detailed information. - kwargs (
Any) – Additional keyword arguments for future proofing. - version_id (
str) – Version ID of the catalog item to target. If not provided, the latest version is used.
- path (
- Returns: paths – List of dicts with file and folder details if detail is True, otherwise list of paths.
- Return type:
List[FileInfo]orList[str] - Raises: FileNotFoundError – If the specified path does not exist.
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.ls("dr://", detail=False)
['696935d6d5a04a752419cf6d/', 'abcdef1234567890abcdef12/']
>>> fs.ls("dr://696935d6d5a04a752419cf6d/finance/")
[
{
'name': '696935d6d5a04a752419cf6d/finance/fy-2024/',
'size': 0,
'type': 'directory',
'format': None
},
{
'name': '696935d6d5a04a752419cf6d/finance/employee-list.csv',
'size': 2048,
'type': 'file',
'format': 'csv'
},
]
SEE ALSO¶
info(path, **kwargs)¶
Get details about a file or directory.
For info about a directory path append a forward slash (/) at the end of the path. Paths without a trailing slash can return info about files or directories. If both a file and directory share the same path, the file info is returned.
- Parameters:
- path (
str) – Path in the DataRobot file system to get information about. - version_id – Optional version ID of the catalog item to target. If not provided, the latest version is used.
- kwargs (
Any) – Additional keyword arguments passed tols().
- path (
- Returns: info – A dictionary with file or directory details including name (path), size and type.
- Return type:
FileInfo - Raises:
- FileNotFoundError – If the specified path does not exist.
- ValueError – If the path is invalid. Root path is not allowed.
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.info("dr://696935d6d5a04a752419cf6d/finance/employee-list.csv")
{
'name': '696935d6d5a04a752419cf6d/finance/employee-list.csv',
'size': 2048,
'type': 'file',
'format': 'csv',
'created_at': datetime.datetime(2026, 3, 6, 10, 5, 16, 805655)
}
>>> fs.info("dr://696935d6d5a04a752419cf6d/finance/")
{
'name': '696935d6d5a04a752419cf6d/finance/',
'size': 0,
'type': 'directory',
'format': None,
'created_at': None
}
>>> fs.info("dr://696935d6d5a04a752419cf6d/my_folder")
{
'name': '696935d6d5a04a752419cf6d/my_folder/',
'size': 0,
'type': 'directory',
'format': None,
'created_at': None
}
created(path)¶
Return the created timestamp of a file as a datetime.datetime object.
- Parameters:
path (
str) – Path in the DataRobot file system to get information about. - Returns: The timestamp of when the file was created or None if a directory.
- Return type:
datetime.datetimeorNone - Raises:
- FileNotFoundError – If the specified path does not exist.
- ValueError – If the path is invalid. Root path is not allowed.
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.created("dr://696935d6d5a04a752419cf6d/finance/employee-list.csv")
datetime.datetime(2026, 3, 6, 10, 5, 16, 805655)
du(path, total=True, maxdepth=None, withdirs=False, **kwargs)¶
Retrieve space used by files and optionally directories at a path.
Notes
Directory size does not include the size of its contents and is set to zero.
- Parameters:
- path (
str) – The path to retrieve file space usage for. - total (
bool) – Whether to sum all file sizes. - maxdepth (
Optional[int]) – Maximum number of directory levels to descend when searching for files. UseNonefor unlimited. - withdirs (
bool) – Whether to include directory paths in the output. - kwargs (
Any) – Additional keyword arguments passed tofind().
- path (
- Returns: If total is True, the number of bytes of all files in the path. If total is False, a dictionary mapping paths to their size.
- Return type:
intorDict[str,int]
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.du("dr://696935d6d5a04a752419cf6d/finance/yellow.txt")
2048
>>> fs.du("dr://696935d6d5a04a752419cf6d/", total=False)
{'696935d6d5a04a752419cf6d/file.txt': 102, '696935d6d5a04a752419cf6d/finance/yellow.txt': 2048}
>>> fs.du("dr://696935d6d5a04a752419cf6d/", total=False, maxdepth=1, withdirs=True)
{'696935d6d5a04a752419cf6d/file.txt': 102, '696935d6d5a04a752419cf6d/finance/': 0}
find(path, maxdepth=None, withdirs=False, detail=False, **kwargs)¶
List all files below path. If withdirs is True, include directories as well.
Like posix find command without conditions
- Parameters:
- path (
str) – The path to search from. Note that unlike the glob method, this method does not support glob patterns and treats the path as a literal directory path to search under or a filename to match. - maxdepth (
Optional[int]) – If not None, the maximum number of levels to descend - withdirs (
bool) – Whether to include directory paths in the output. - kwargs (
Any) – Passed tols
- path (
- Returns: If detail is False, a list of file (and optionally directory) paths. If detail is True, a dictionary mapping paths to their info dictionaries.
- Return type:
List[str]orDict[str,Dict[str,FileInfo]]
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.find("dr://696935d6d5a04a752419cf6d/", withdirs=True)
[
'696935d6d5a04a752419cf6d/',
'696935d6d5a04a752419cf6d/finance/',
'696935d6d5a04a752419cf6d/finance/budgets/',
'696935d6d5a04a752419cf6d/finance/budgets/Q2_budget_2024.pdf',
'696935d6d5a04a752419cf6d/finance/employee-list.csv'
]
>>> fs.find("dr://696935d6d5a04a752419cf6d/finance/", maxdepth=1)
['696935d6d5a04a752419cf6d/finance/employee-list.csv']
>>> fs.find("dr://696935d6d5a04a752419cf6d/finance", maxdepth=1, withdirs=True, detail=True)
{
'696935d6d5a04a752419cf6d/finance/': {
'name': '696935d6d5a04a752419cf6d/finance/',
'size': 0,
'type': 'directory',
'format': None,
'created_at': None
},
'696935d6d5a04a752419cf6d/finance/employee-list.csv': {
'name': '696935d6d5a04a752419cf6d/finance/employee-list.csv',
'size': 2048,
'type': 'file',
'format': 'csv',
'created_at': datetime.datetime(2026, 3, 6, 10, 5, 16, 805655)
},
'696935d6d5a04a752419cf6d/finance/budgets/': {
'name': '696935d6d5a04a752419cf6d/finance/budgets/',
'size': 0,
'type': 'directory',
'format': None,
'created_at': None
},
}
glob(path, maxdepth=None, detail=False, **kwargs)¶
Find files by glob-matching.
Pattern matching capabilities for finding files that match the given pattern.
- Parameters:
- path (
str) – The glob pattern to match against. - maxdepth (
Optional[int]) – Maximum depth for ‘**’ patterns. Applied on the first ‘**’ found. Must be at least 1 if provided. - detail (
bool) – Whether to return detailed information. - kwargs (
Any) – Additional arguments passed tofind.
- path (
- Returns: If detail is False, a list of file and directory paths. If detail is True, a dictionary mapping paths to their info dictionaries.
- Return type:
List[str]orDict[str,FileInfo]
Notes
Supported patterns:
- ‘*’: Matches any sequence of characters within a single directory level
- ‘**’: Matches any number of directory levels (must be an entire path component)
- ‘?’: Matches exactly one character
- ‘[abc]’: Matches any character in the set
- ‘[a-z]’: Matches any character in the range
- ‘[!cat]’: Matches any character NOT in the set {c, a, t}
Special behaviors:
- If the path ends with ‘/’, only folders are returned
- Consecutive ‘*’ characters are compressed into a single ‘*’
- Empty brackets ‘[]’ never match anything
- Negated empty brackets ‘[!]’ match any single character
- Special characters in character classes are escaped properly
Limitations:
- ‘**’ must be a complete path component (e.g., ‘a/**/b’, not ‘a**b’)
- No brace expansion (‘{a, b}.txt’)
- No extended glob patterns (‘+(pattern)’, ‘!(pattern)’)
SEE ALSO¶
Examples
Find all files and directories directly under the specified path.
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.glob("dr://696935d6d5a04a752419cf6d/finance/*", detail=False)
[
'696935d6d5a04a752419cf6d/finance/budgets/',
'696935d6d5a04a752419cf6d/finance/employee-list.csv'
]
Find only directories directly under the specified path.
>>> fs.glob("dr://696935d6d5a04a752419cf6d/finance/*/", detail=False)
['696935d6d5a04a752419cf6d/finance/budgets/']
Find any budget directories with a 4-digit year in their name.
>>> fs.glob("dr://696935d6d5a04a752419cf6d/finance/budgets/*-202[0-9]/", detail=False)
[
'696935d6d5a04a752419cf6d/finance/budgets/fy-2024/',
'696935d6d5a04a752419cf6d/finance/budgets/fy-2023/'
]
Find all .csv files at a maximum depth of 2 levels.
>>> fs.glob("dr://696935d6d5a04a752419cf6d/**/*.csv", maxdepth=2, detail=False)
[
'696935d6d5a04a752419cf6d/finance/employee-list.csv',
'696935d6d5a04a752419cf6d/sales/data.csv'
]
tree(path='', recursion_limit=2, max_display=25, display_size=False, prefix='', is_last=True, first=True, indent_size=4)¶
Return a tree-like structure string of the DataRobot file system from the given path.
- Parameters:
- path (
str) – Path in the DataRobot file system to display the tree from. - recursion_limit (
int) – Maximum depth of directory traversal. - max_display (
int) – Maximum number of items to display per directory. - display_size (
bool) – Whether to display file sizes. - prefix (
str) – Current line prefix for visual tree structure. - is_last (
bool) – Whether the current item is last in its level. - first (
bool) – Whether this is the first call (displays root path). - indent_size (
int) – Number of spaces by indent.
- path (
- Returns: tree_str – A string representing the tree structure of the file system.
- Return type:
str
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> print(fs.tree("dr://696935d6d5a04a752419cf6d/", recursion_limit=5))
696935d6d5a04a752419cf6d/
└── finance/
├── fy-2024/
│ └── budgets/
│ └── Q2_budget_2024.pdf
└── employee-list.csv
SEE ALSO¶
cat_file(path, start=None, end=None, **kwargs)¶
Fetch a single file’s contents.
- Parameters:
- path (
str) – File path in the DataRobot file system to read. - start (
Optional[int]) – Optional starting byte position to read from. If negative, counts from the end of the file. - end (
Optional[int]) – Optional ending byte position to read to. If negative, counts from the end of the file. - kwargs (
Any) – Keyword arguments passed toDataRobotFileSystem.open().
- path (
- Returns: The contents of the file as bytes.
- Return type:
bytes
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.cat_file("dr://696935d6d5a04a752419cf6d/finance/report.txt")
b'Q2 Financial Report...'
Read a range of bytes from a file:
>>> fs.cat_file("dr://696935d6d5a04a752419cf6d/finance/report.txt", start=10, end=20)
b'Financial Report...'
cat(path, recursive=False, on_error='raise', **kwargs)¶
Fetch (potentially multiple) path’s contents.
- Parameters:
- path (
Union[str,List[str]]) – File or directory path(s) in the DataRobot file system to read. Can include glob patterns. - recursive (
bool) – If True, assume the path(s) are directories, and get contents of all contained files. - on_error (
Union[Literal['raise'],Literal['omit'],Literal['return']]) – If raise, an underlying exception will be raised (converted to KeyError if the type is in self.missing_exceptions); if omit, keys with exception will simply not be included in the output; if “return”, all keys are included in the output, but the value will be bytes or an exception instance. - kwargs (
Any) – Additional keyword arguments passed tocat_file().
- path (
- Returns: If a single file path is provided, returns the file contents as bytes. If multiple paths are provided or the path is otherwise expanded, returns a dictionary mapping each path to its contents as bytes or an exception instance if on_error is set to “return”.
- Return type:
bytesorDict[str,bytes]orDict[str,Union[bytes,Exception]]
Examples
Read a single file:
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.cat("dr://696935d6d5a04a752419cf6d/finance/report.txt")
b'Q2 Financial Report...'
Read multiple files and all files in a directory:
>>> fs.cat(
... ["dr://696935d6d5a04a752419cf6d/finance/summary.txt", "dr://696935d6d5a04a752419cf6d/reports/"],
... recursive=True
... )
{
'696935d6d5a04a752419cf6d/finance/summary.txt': b'Summary...',
'696935d6d5a04a752419cf6d/reports/report_2024.txt': b'2024 Report...',
'696935d6d5a04a752419cf6d/reports/report_2025.txt': b'2025 Report...'
}
Read all CSV files matching a glob pattern:
>>> fs.cat("dr://696935d6d5a04a752419cf6d/data/**/*.csv")
{
'696935d6d5a04a752419cf6d/data/sales.csv': b'date,amount\n2024-01-01,1000\n...',
'696935d6d5a04a752419cf6d/data/archive/old_sales.csv': b'date,amount\n2023-01-01,950\n...'
}
sign(path, expiration=100, version_id=None, **kwargs)¶
Create a signed URL for the given file path. Optionally specify a version ID to retrieve a signed URL for an earlier version of the file from that version of the catalog directory.
- Parameters:
- path (
str) – File path in the DataRobot file system to sign. - expiration (
int) – Number of seconds until the signed URL expires. - version_id (
Optional[str]) – Version ID of the catalog directory to target. If not provided, the latest version is used. - kwargs (
Any) – Additional keyword arguments for future proofing.
- path (
- Returns: A signed URL granting temporary access to the file.
- Return type:
str - Raises:
- FileNotFoundError – If the specified file does not exist.
- IsADirectoryError – If the specified path is a directory.
- ValueError – If the path format is invalid.
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> signed_url = fs.sign(
... "dr://696935d6d5a04a752419cf6d/finance/budgets/Q2_budget_2024.pdf",
... expiration=300,
... )
cp_file(path1, path2, overwrite_strategy=FilesOverwriteStrategy.RENAME, max_wait=600, wait_for_completion=True, **kwargs)¶
Copy a file or directory from path1 to path2.
Copies directories recursively. Specify an overwrite strategy to handle file naming conflicts at the target location. Note that copying between catalog item directories is an asynchronous operation. Cannot create a new catalog item directory by copying files into a non-existent catalog item directory.
- Parameters:
- path1 (
str) – Source file or directory path. Directory paths should end with a forward slash (/). - path2 (
str) – Target file or directory path. Directory paths should end with a forward slash (/). - overwrite_strategy (
FilesOverwriteStrategy) – Strategy to handle naming conflicts at the target location. - max_wait (
int) – Maximum time in seconds to wait for the copy operation to complete when copying between catalog items. - wait_for_completion (
bool) – Whether to wait for the copy operation to complete before returning when copying between catalog items. - kwargs (
Any) – Additional keyword arguments for future proofing.
- path1 (
- Raises:
- FileNotFoundError: – If the source path does not exist or either catalog item directory does not exist.
- ValueError: – If attempting to copy a directory to a file path.
- FileExistsError: – If the target file or directory already exists and overwrite strategy is set to ERROR.
- Return type:
None
Examples
Copy a file to a new file path:
>>> from datarobot.fs import DataRobotFileSystem
>>> from datarobot.enums import FilesOverwriteStrategy
>>> fs = DataRobotFileSystem()
>>> fs.cp_file(
... "dr://696935d6d5a04a752419cf6d/fy-2024/budgets/Q2_budget_2024.pdf",
... "dr://69691fc3d5a04a752419cf5/fy-2024/budgets-copy.pdf",
... )
Copy file into a directory, replace existing file if present:
>>> fs.cp_file(
... "dr://696935d6d5a04a752419cf6d/fy-2024/budgets/Q2_budget_2024.pdf",
... "dr://69691fc3d5a04a752419cf5/fy-2024/budgets/",
... overwrite_strategy=FilesOverwriteStrategy.OVERWRITE,
... )
Copy the contents of a directory into another directory:
>>> fs.cp_file(
... "dr://696935d6d5a04a752419cf6d/fy-2024/budgets/",
... "dr://69691fc3d5a04a752419cf5/archive/budgets-2024/",
... )
SEE ALSO¶
cp_directory(path1, path2, overwrite_strategy=FilesOverwriteStrategy.RENAME, max_wait=600, wait_for_completion=True, **kwargs)¶
Copy a directory recursively from path1 to path2.
Validates that both paths are directories by checking for trailing slashes (/).
Calls cp_file() internally.
- Parameters:
- path1 (
str) – Source directory path. Must end with a forward slash (/). - path2 (
str) – Target directory path. Must end with a forward slash (/). - overwrite_strategy (
FilesOverwriteStrategy) – Strategy to handle naming conflicts at the target location. - max_wait (
int) – Maximum time in seconds to wait for the copy operation to complete when copying between catalog items. - wait_for_completion (
bool) – Whether to wait for the copy operation to complete before returning when copying between catalog items. - kwargs (
Any) – Additional keyword arguments passed tocp_file().
- path1 (
- Return type:
None
SEE ALSO¶
copy(path1, path2, recursive=False, maxdepth=None, on_error=None, **kwargs)¶
Copy files or directories between two locations in the DataRobot file system.
- Parameters:
- path1 (
Union[str,List[str]]) – Source file or directory path(s). Supports glob pattern. If specifying a directory, recursive should beTrue. - path2 (
Union[str,List[str]]) – Target file or directory path(s). - recursive (
bool) – Whether to copy directory contents recursively. - maxdepth (
Optional[int]) – Maximum depth to recurse when finding files to copy. - on_error (
Optional[Literal['raise','ignore']]) – If"raise", any file not found exceptions will be raised. If"ignore", any file not found exceptions will be skipped and ignored. Defaults to"raise"unless recursive isTrue, where the default is"ignore". - kwargs (
Any) – Additional keyword arguments passed tocp_file(). - overwrite_strategy (
FilesOverwriteStrategy) – Strategy to handle naming conflicts at the target location. Passed tocp_file().
- path1 (
- Raises:
FileNotFoundError – If any of the source paths do not exist or cannot find files and
on_erroris"raise". - Return type:
None
Examples
Copy a single file to a new path:
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.copy(
... "dr://696935d6d5a04a752419cf6d/finance/employee-list.csv",
... "dr://696935d6d5a04a752419cf6d/finance/employee-list-backup.csv",
... )
Copy more than one file or directory:
>>> fs.copy(
... [
... "dr://696935d6d5a04a752419cf6d/finance/employee-list.csv",
... "dr://696935d6d5a04a752419cf6d/finance/employee-list-copy.csv",
... ],
... [
... "dr://696935d6d5a04a752419cf6d/finance/employee-list-copy.csv",
... "dr://696935d6d5a04a752419cf6d/finance/employee-list-copy-2.csv",
... ],
... )
Copy a single file into a directory:
>>> fs.copy(
... "dr://696935d6d5a04a752419cf6d/finance/report.pdf",
... "dr://696935d6d5a04a752419cf6d/archive/",
... )
Recursively copy the contents of a directory to another directory:
>>> fs.copy(
... "dr://696935d6d5a04a752419cf6d/budgets/",
... "dr://696935d6d5a04a752419cf6d/archive/budgets-2024/",
... recursive=True,
... )
Copy all CSV files in a directory and its subdirectories up to a maximum depth of 2:
>>> fs.copy(
... "dr://696935d6d5a04a752419cf6d/data/**/*.csv",
... "dr://696935d6d5a04a752419cf6d/archive/data-2024/",
... recursive=True,
... maxdepth=2,
... )
Copy all text files in a directory into a new directory:
>>> fs.copy(
... "dr://696935d6d5a04a752419cf6d/data/*.txt",
... "dr://696935d6d5a04a752419cf6d/archive/",
... recursive=True,
... )
Copy a directory recursively, skipping files that already exist at the target:
>>> from datarobot.enums import FilesOverwriteStrategy
>>> fs.copy(
... "dr://696935d6d5a04a752419cf6d/budgets/",
... "dr://696935d6d5a04a752419cf6d/archive/",
... recursive=True,
... overwrite_strategy=FilesOverwriteStrategy.SKIP,
... )
rm_file(path, **kwargs)¶
Delete a file or directory at the given path(s). Completes silently if the file does not exist.
- Parameters:
- path (
Union[str,List[str]]) – Path(s) of the file(s) to delete. Paths ending with a forward slash (/) are treated as directories and deleted recursively. - kwargs (
Any) – Additional keyword arguments for future proofing.
- path (
- Return type:
None
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.rm_file("dr://696935d6d5a04a752419cf6d/finance/employee-list.csv")
>>> fs.rm_file([
... "dr://696935d6d5a04a752419cf6d/finance/employee-list.csv",
... "dr://696935d6d5a04a752419cf6d/finance/fy-2024/budgets/Q2_budget_2024.pdf"
... ])
rm_directory(path, **kwargs)¶
Recursively delete a directory at the given path(s). Completes silently if the directory does not exist.
Uses rm_file() internally.
Soft-deletes catalog item directory when requested. Use Files.un_delete() if you need to restore a deleted catalog item.
- Parameters:
- path (
Union[str,List[str]]) – One or more directory paths to delete recursively. Paths must end with a forward slash (/) to be treated as directories and deleted recursively. - kwargs (
Any) – Additional keyword arguments for future proofing.
- path (
- Raises: ValueError: – If any of the provided paths do not end with a forward slash (/).
- Return type:
None
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.rm_directory("dr://696935d6d5a04a752419cf6d/finance/fy-2024/")
>>> fs.rm_directory([
... "dr://696935d6d5a04a752419cf6d/finance/fy-2024/",
... "dr://696935d6d5a04a752419cf6d/"
... ])
rm(path, recursive=False, maxdepth=None, **kwargs)¶
Delete files or directories. Completes silently if the file or directory does not exist.
Soft-deletes catalog item directory when requested. Use Files.un_delete() if you need to restore a deleted catalog item.
If all files in a directory are deleted, the directory itself is also deleted implicitly
as DataRobot file system does not support empty directories.
- Parameters:
- path (
Union[str,List[str]]) – One or more file or directory paths to delete. Paths ending with a forward slash (/) are treated as directories. - recursive (
bool) – Whether to recurse into directories when targeting files to delete. If False only deletes files targeted. - maxdepth (
Optional[int]) – Depth to pass tofind()andglob()when targeting files for deletion. Used to limit recursion in directories when finding files to delete. If None, no limit is applied. - kwargs (
Any) – Additional keyword arguments for future proofing. Passed torm_file().
- path (
- Return type:
None
Examples
Delete file:
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.rm("dr://696935d6d5a04a752419cf6d/finance/employee-list.csv")
Delete directory recursively:
>>> fs.rm("dr://696935d6d5a04a752419cf6d/finance/fy-2024/", recursive=True)
Delete contents of catalog item folder recursively up to a maximum depth of 2:
>>> fs.rm("dr://696935d6d5a04a752419cf6d/", recursive=True, maxdepth=2)
Delete catalog item folder:
>>> fs.rm("dr://696935d6d5a04a752419cf6d/")
Delete .csv files in a directory and its subdirectories up to a maximum depth of 3:
>>> fs.rm("dr://696935d6d5a04a752419cf6d/finance/**/*.csv", recursive=True, maxdepth=3)
create_catalog_item_dir(**kwargs)¶
Create a new empty catalog item directory and return its id.
- Parameters:
kwargs (
Any) – Additional keyword arguments for future proofing. - Returns: The id of the newly created catalog item.
- Return type:
str
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> catalog_id = fs.create_catalog_item_dir()
>>> fs.ls(f"dr://{catalog_id}/")
[]
mv_file(path1, path2, , overwrite_strategy=FilesOverwriteStrategy.REPLACE, **kwargs)¶
Move a single file or directory from path1 to path2.
- Parameters:
- path1 (
str) – Source path. Format:dr://<catalog_id>/path. Directories should end with /. - path2 (
str) – Destination path. Format:dr://<catalog_id>/path. Directories should end with /. - overwrite_strategy (
FilesOverwriteStrategy) – Strategy for overwriting existing paths. Defaults to REPLACE, inline with fsspec. - kwargs (
Any) – Additional keyword arguments passed tocp_file()andrm_file()when moving across catalogs.
- path1 (
- Return type:
None
mv(path1, path2, recursive=False, maxdepth=None, , overwrite_strategy=FilesOverwriteStrategy.REPLACE, **kwargs)¶
Move files or directories from path1 to path2. path1 may contain glob patterns.
- Parameters:
- path1 (
Union[str,List[str]]) – Source path(s). Format:dr://<catalog_id>/path. A string (file, directory, or glob pattern) or a list of explicit paths. - path2 (
Union[str,List[str]]) – Destination path(s). Format:dr://<catalog_id>/path. A single path when path1 is a string. When path1 is a list, either a single directory (ending with /; each source maps to path2/basename) or a list of paths. When both are lists, truncates to the shorter length (matches fsspec). - recursive (
bool) – If True, move directories recursively. - maxdepth (
Optional[int]) – If not None, maximum directory depth when resolving path1. None means no limit. - overwrite_strategy (
FilesOverwriteStrategy) – Strategy for overwriting existing paths. Defaults to REPLACE, inline with fsspec. - kwargs (
Any) – Additional keyword arguments passed toexpand_pathwhen resolving paths and tomv_file()when performing the move.
- path1 (
- Raises: ValueError: – If multiple sources are moved to a single file destination (not a directory).
- Return type:
None
clone_catalog_item_dir(path_or_id, files_to_omit=None, **kwargs)¶
Clone a catalog item directory (copy all contents) and return the ID of the cloned catalog item.
- Parameters:
- path_or_id (
str) – Path or ID of the catalog item directory to clone. - files_to_omit (
Optional[List[str]]) – List of files to omit when cloning. Provide paths relative to the root of the catalog item directory. - kwargs (
Any) – Additional keyword arguments passed toFiles.clone().
- path_or_id (
- Returns: The ID of the cloned catalog item.
- Return type:
str
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.ls("dr://696935d6d5a04a752419cf6d/", detail=False)
['696935d6d5a04a752419cf6d/folder/', '696935d6d5a04a752419cf6d/file.txt']
>>> clone_id = fs.clone_catalog_item_dir("dr://696935d6d5a04a752419cf6d/")
>>> clone_id
"696935d6d5a04a752419cf6d-clone"
>>> fs.ls(f"dr://{clone_id}/", detail=False)
['696935d6d5a04a752419cf6d-clone/folder/', '696935d6d5a04a752419cf6d-clone/file.txt']
Clone a catalog item directory and omit a file:
>>> fs.clone_catalog_item_dir("dr://696935d6d5a04a752419cf6d/", files_to_omit=["file.txt"])
"696935d6d5a04a752419cf6d-clone"
>>> fs.ls(f"dr://696935d6d5a04a752419cf6d-clone/", detail=False)
['696935d6d5a04a752419cf6d-clone/folder/']
put_from_url(path, url, unpack_archive_files=True, overwrite_strategy=FilesOverwriteStrategy.RENAME, , upload_timeout=600, wait_for_completion=True, **kwargs)¶
Load file(s) from a URL into a directory in the DataRobot file system.
- Parameters:
- path (
str) – DataRobot path to the directory (catalog root or a folder inside it). - url (
str) – The URL of the file or archive to load. Must be accessible by the DataRobot server. - unpack_archive_files (
bool) – If True, extract archive contents into the directory. If False, upload the file as-is. Defaults to True. - upload_timeout (
int) – Maximum time in seconds to wait for the upload to complete. - wait_for_completion (
bool) – If True, block until the upload completes. Defaults to True. - overwrite_strategy (
FilesOverwriteStrategy) – How to handle name conflicts with existing files. Defaults toFilesOverwriteStrategy.RENAME. - kwargs (
Any) – Additional keyword arguments for future proofing.
- path (
- Return type:
None
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> catalog_id = fs.create_catalog_item_dir()
>>> fs.put_from_url(f"dr://{catalog_id}/data/", "https://example.com/file.png")
>>> fs.ls(f"dr://{catalog_id}/data/")
[{'name': 'file.png', 'size': 12345, 'type': 'file', ...}]
- Raises:
- AsyncTimeoutError – If
wait_for_completionis True and the upload takes longer thanupload_timeoutseconds. - FileExistsError – If
overwrite_strategyisFilesOverwriteStrategy.ERRORand a file with the same name already exists.
- AsyncTimeoutError – If
put_from_data_source(path, data_source_id, credential_id=None, credential_data=None, unpack_archive_files=True, overwrite_strategy=FilesOverwriteStrategy.RENAME, , upload_timeout=600, wait_for_completion=True, **kwargs)¶
Upload one or more files from a data source into a directory in the DataRobot file system.
- Parameters:
- path (
str) – Directory path to upload files under. Cannot be root directory. - data_source_id (
str) – The ID of theDataSourceto use as the source of data. - credential_id (
Optional[str]) – The ID of theCredentialto use for authentication. - credential_data (
Optional[Dict[str,str]]) – The credentials to authenticate with the database, to use instead of credential ID. - unpack_archive_files (
bool) – Whether to unpack archive files (zip, tar, tar.gz, tgz) upon upload. - overwrite_strategy (
FilesOverwriteStrategy) – Strategy to handle naming conflicts when writing to a path where a file already exists. UseFilesOverwriteStrategy.RENAMEto rename and uploaded file using the “(n).ext” pattern. Use FilesOverwriteStrategy.REPLACEto overwrite the existing file. UseFilesOverwriteStrategy.SKIPto skip uploading if a file already exists at the target path. UseFilesOverwriteStrategy.ERRORto raise FileExistsError if a file already exists at the target path. - upload_timeout (
int) – Maximum time in seconds to wait for the upload to complete. - wait_for_completion (
bool) – If True, block until the upload completes. If False, return after starting the upload. - kwargs (
Any) – Additional keyword arguments for future proofing.
- path (
- Raises:
- ValueError – If the directory path is invalid.
- FileNotFoundError – If the directory path does not exist.
- AsyncTimeoutError – If
wait_for_completionis True and the upload takes longer thanupload_timeoutseconds.
- Return type:
None
Examples
Upload file or folder from Google Drive.
Note: GDrive paths must use drive, folder and file IDs.
Example: /<drive_id>/<folder_id>/<file_id> or /<drive_id>/<folder_id> if folder.
>>> import datarobot as dr
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> gcp_cred = dr.Credential.create_gcp(
... name='GDrive Credentials',
... gcp_key={ # Or load from keyfile
... "type": "service_account",
... "private_key_id": "...",
... "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n",
... "client_email": "user@project.iam.gserviceaccount.com",
... "client_id": "...",
... },
... )
>>> gdrive_connector = next(
... c for c in dr.Connector.list() if c.connector_type == "gdrive"
... )
>>> gdrive_datastore = dr.DataStore.create(
... data_store_type=dr.enums.DataStoreTypes.DR_CONNECTOR_V1,
... canonical_name='GDrive DataStore',
... fields=[{'id': 'gdrive.drive_name', 'name': 'Drive Name', 'value': 'My Drive'}],
... connector_id=gdrive_connector.id,
... )
>>> path = "/<drive_id>/<folder_id>/<file_id>" # or "/<drive_id>/<folder_id>" for a folder
>>> gdrive_datasource = dr.DataSource.create(
... data_source_type=dr.enums.DataStoreTypes.DR_CONNECTOR_V1,
... canonical_name='GDrive DataSource for my documents',
... params=dr.DataSourceParameters(data_store_id=gdrive_datastore.id, path=path),
... )
>>> fs.put_from_data_source(
... "dr://<catalog-id>/my_gdrive_documents/",
... gdrive_datasource.id,
... credential_id=gcp_cred.credential_id, # Can omit if using default credentials setup with DataStore
... )
>>> print(fs.ls(f"dr://<catalog-id>/my_gdrive_documents/", detail=False))
['<catalog-id>/my_gdrive_documents/file.txt']
Upload file or folder from AWS S3 bucket:
>>> import datarobot as dr
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> cred = dr.Credential.create_s3(
... name="AWS S3 Credentials",
... aws_access_key_id="...",
... aws_secret_access_key="...",
... aws_session_token="...",
... )
>>> s3_connector = next(
... c for c in dr.Connector.list() if c.connector_type == "s3"
... )
>>> s3_datastore = dr.DataStore.create(
... data_store_type=dr.enums.DataStoreTypes.DR_CONNECTOR_V1,
... canonical_name='S3 DataStore',
... fields=[
... {"id": "fs.defaultFS", "name": "Bucket Name", "value": "my-bucket-name"},
... {"id": "fs.rootDirectory", "name": "Prefix", "value": "/"},
... {"id": "fs.s3.awsRegion", "name": "S3 Bucket Region", "value": "us-east-1"},
... ],
... connector_id=s3_connector.id,
... )
>>> s3_datasource = dr.DataSource.create(
... data_source_type=dr.enums.DataStoreTypes.DR_CONNECTOR_V1,
... canonical_name='S3 DataSource for my files',
... params=dr.DataSourceParameters(
... data_store_id=s3_datastore.id,
... path="path/to/my/file.txt", # or "path/to/my/folder/"
... ),
... )
>>> fs.put_from_data_source(
... "dr://<catalog-id>/my_s3_files/",
... s3_datasource.id,
... credential_id=cred.credential_id, # Can omit if using default credentials setup with DataStore
... )
>>> print(fs.ls(f"dr://<catalog-id>/my_s3_files/", detail=False))
['<catalog-id>/my_s3_files/file.txt']
Upload file or folder from SharePoint:
Note: Sharepoint paths must use the following format:
/<HOSTNAME>,<SITE_COLLECTION_ID>,<SITE_ID/WEB_ID>/<DRIVE_ID>/<FILE_OR_FOLDER_ITEM_ID>.
Example: /mydomain.sharepoint.com,4732d...8b01b0,eb0d3...e42f/b!8tQyRyn.....TowMA13__nTU/01MAJ...EYJTAOR6/
>>> import datarobot as dr
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> cred = dr.Credential.create_azure_service_principal(
... name="Azure Service Principal Credential for Sharepoint",
... client_id="...",
... client_secret="...",
... azure_tenant_id="...",
... )
>>> sharepoint_connector = next(
... c for c in dr.Connector.list() if c.connector_type == "sharepoint"
... )
>>> sharepoint_datastore = dr.DataStore.create(
... data_store_type=dr.enums.DataStoreTypes.DR_CONNECTOR_V1,
... canonical_name='Sharepoint DataStore',
... fields=[],
... connector_id=sharepoint_connector.id,
... )
>>> path = "/<HOSTNAME>,<SITE_COLLECTION_ID>,<SITE_ID/WEB_ID>/<DRIVE_ID>/<FILE_OR_FOLDER_ITEM_ID>"
>>> sharepoint_datasource = dr.DataSource.create(
... data_source_type=dr.enums.DataStoreTypes.DR_CONNECTOR_V1,
... canonical_name='Sharepoint DataSource',
... params=dr.DataSourceParameters(
... data_store_id=sharepoint_datastore.id,
... path=path,
... ),
... )
>>> fs.put_from_data_source(
... "dr://<catalog-id>/my_sharepoint_files/",
... sharepoint_datasource.id,
... credential_id=cred.credential_id,
... )
>>> print(fs.ls(f"dr://<catalog-id>/my_sharepoint_files/", detail=False))
['<catalog-id>/my_sharepoint_files/my_file.txt']
SEE ALSO¶
open(path, mode='rb', block_size=None, cache_options=None, compression=None, overwrite_strategy=FilesOverwriteStrategy.REPLACE, unpack_archive_files=False, upload_timeout=600, **kwargs)¶
Open a file in the DataRobot file system. Supports read modes ‘r’, ‘rb’ and write modes ‘w’, ‘wb’, ‘xb’.
- Parameters:
- path (
str) – Path in the DataRobot file system to open. - mode (
str) – Mode to open the file in. ‘r’ or ‘rb’ for reading, ‘w’, ‘wb’ or ‘xb’ for writing. - block_size (
Optional[int]) – Buffer size in bytes for reading and writing. - cache_options (
Optional[Dict[str,Any]]) – Extra arguments to pass through the cache. - compression (
Optional[str]) – If given, open file using compression codec. Can either be a compression name (a key in fsspec.compression.compr) or “infer” to guess the compression from the filename suffix. - overwrite_strategy (
FilesOverwriteStrategy) – Strategy to handle naming conflicts when writing to a path where a file already exists. UseFilesOverwriteStrategy.RENAMEto rename and uploaded file using the “(n).ext” pattern. Use FilesOverwriteStrategy.REPLACEto overwrite the existing file. UseFilesOverwriteStrategy.SKIPto skip uploading if a file already exists at the target path. UseFilesOverwriteStrategy.ERRORto raise FileExistsError if a file already exists at the target path. - unpack_archive_files (
bool) – If True, automatically unpack archive files (zip, tar, tar.gz, tgz) upon upload. - upload_timeout (
int) – Maximum time in seconds to wait for file upload to complete. - kwargs (
Any) – Additional keyword arguments passed toDataRobotFileorTextFileWrapper.
- path (
- Raises:
- IsADirectoryError – If attempting to open a directory for reading.
- FileNotFoundError – If attempting to open a non-existent file for reading.
- ValueError – If an unsupported file mode is provided, an invalid path is passed, or if file is too big to download.
- FileExistsError – If attempting to write to a path where a file already exists and overwrite strategy is set to
FilesOverwriteStrategy.ERRORor mode is set to ‘xb’.
- Returns: A file-like object for reading or writing.
- Return type:
DataRobotFile
Examples
Open a file for reading:
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> with fs.open("dr://696935d6d5a04a752419cf6d/notes/agenda.txt", mode="r") as f:
... data = f.read()
Read first 20 bytes from a file then skip to byte 100 and read the next 30 bytes:
>>> with fs.open("dr://696935d6d5a04a752419cf6d/figures/plot.png", mode="rb") as f:
... first_20_bytes = f.read(20)
... f.seek(100)
... next_30_bytes = f.read(30)
touch(path, truncate=True, **kwargs)¶
Create an empty file at the given path.
DataRobotFileSystem does not support updating timestamps of existing files.
- Parameters:
- path (
str) – Path to the file to create. - truncate (
bool) – Whether to replace the existing file with an empty one. This must always be set to True. - kwargs (
Any) – Additional keyword arguments passed toopen().
- path (
- Raises: NotImplementedError – If attempting to update the timestamp of an existing file with truncate set to False.
- Return type:
None
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.touch("dr://696935d6d5a04a752419cf6d/notes/agenda.txt")
read_block(fn, offset, length, delimiter=None)¶
Read a block of bytes from a file.
Starting at offset of the file, read length bytes. If
delimiter is set then we ensure that the read starts and stops at
delimiter boundaries that follow the locations offset and offset
+ length. If offset is zero then we start at zero. The
bytestring returned WILL include the end delimiter string.
If offset+length is beyond the eof, reads to eof.
- Parameters:
- fn (
str) – Filepath to read from. - offset (
int) – Byte offset to start read from. - length (
Optional[int]) – Number of bytes to read. If None, read to end of file. - delimiter (
Optional[bytes]) – Ensure reading starts and stops at delimiter bytestring.
- fn (
- Return type:
bytes
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.read_block("dr://696935d6d5a04a752419cf6d/data/file.txt", 0, 13)
b'Alice, 100\nBo'
>>> fs.read_block("dr://696935d6d5a04a752419cf6d/data/file.txt", 0, 13, delimiter=b'\n')
b'Alice, 100\nBob, 200\n'
Use length=None to read to the end of the file.
>>> fs.read_block("dr://696935d6d5a04a752419cf6d/data/file.txt", 0, None, delimiter=b'\n')
b'Alice, 100\nBob, 200\nCharlie, 300'
put_file(lpath, rpath, callback=, mode='overwrite', raise_error_on_directory=True, **kwargs)¶
Upload a single file from local to DataRobot file system.
- Parameters:
- lpath (
str) – Local file path. - rpath (
str) – DataRobot file system path. - callback (
Callback) – Callback to track progress of the file transfer. Not supported as DataRobotFileSystem does not support buffered uploads. - mode (
str) – Mode to open the file in: ‘overwrite’ or ‘create’. - raise_error_on_directory (
bool) – Whether to raise an exception if the local path is a directory. DataRobot file system does not support creating empty directories. If False, the function does nothing and returns silently. - kwargs (
Any) – Keyword arguments passed toopen().
- lpath (
- Raises:
- FileExistsError – If the file already exists and mode is set to ‘create’.
- NotImplementedError – If attempting to upload a directory and raise_error_on_directory is True.
- ValueError – If attempting to upload a file to an invalid path.
- Return type:
None
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.put_file(
... "/Users/username/local/path/to/file.txt",
... "dr://696935d6d5a04a752419cf6d/my/new/file_copy.txt",
... )
put(lpath, rpath, recursive=False, callback=, maxdepth=None, **kwargs)¶
Upload local file(s) to DataRobot file system.
Copies a specific file or tree of files (if recursive=True). If rpath
ends with a “/”, it will be assumed to be a directory, and target files
will go within. If lpath ends with a “/”, it will be assumed to be a directory
and will target files inside the directory. Calls
put_file()
for each source path or uses
FilesStage to upload multiple files at once
if upload can be optimized.
- Parameters:
- lpath (
Union[str,List[str]]) – Local file path or list of local file paths to upload. - rpath (
Union[str,List[str]]) – DataRobot file system path or list of DataRobot file system paths to upload to. - recursive (
bool) – Whether to recursively target local files to upload. - callback (
Callback) – Callback to track progress of the file transfer. Not supported as DataRobotFileSystem does not support buffered uploads. - maxdepth (
Optional[int]) – Maximum depth to recurse when targeting local files to upload. - kwargs (
Any) – Additional keyword arguments passed toput_file(). - raise_error_on_directory – Whether to raise an exception for local directory paths. DataRobot file system does not support creating
empty directories. Defaults to False so invocations of
put_file()for local directory paths do nothing and return silently. - overwrite_strategy – How to handle name conflicts with existing files. Defaults to
FilesOverwriteStrategy.RENAME.
- lpath (
- Return type:
None
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.put(
... "/Users/username/local/path/to/file.txt",
... "dr://696935d6d5a04a752419cf6d/my/new/file_copy.txt",
Upload a directory recursively:
>>> fs.put(
... "/Users/username/local/path/to/directory",
... "dr://696935d6d5a04a752419cf6d/my/new/directory/",
... recursive=True,
... )
Upload all PDF files in a directory:
>>> fs.put(
... "/Users/username/local/my/documents/**/*.pdf",
... "dr://696935d6d5a04a752419cf6d/my-pdf-documents/",
... recursive=True,
... )
Upload multiple files at once:
>>> fs.put(
... ["/Users/username/local/path/to/file1.txt", "/Users/username/local/path/to/file2.txt"],
... ["dr://696935d6d5a04a752419cf6d/my/new/file1.txt", "dr://696935d6d5a04a752419cf6d/my/new/file2.txt"],
... )
get_mapper(root='', missing_exceptions=None)¶
Create a key/value mutable store based on this file-system.
Creates a MutableMapping interface to the DataRobot file system at the given root path.
- Parameters:
- root (
str) – Path in the DataRobot file system to use as the root for the map. - missing_exceptions (
Optional[Tuple[Type[Exception],...]]) – Exceptions to convert to KeyError if raised when working with the file system.
- root (
- Returns: A key/value mutable store based on this file-system.
- Return type:
DataRobotFSMap
Examples
>>> from datarobot.fs import DataRobotFileSystem, DataRobotFSMap
>>> fs = DataRobotFileSystem()
>>> root_map = fs.get_mapper()
>>> map = fs.get_mapper("dr://696935d6d5a04a752419cf6d/")
Retrieve file contents from file system using map:
>>> map["file.txt"]
b"Hello, world!"
>>> "folder/path/file.txt" in map
True
>>> file_count = len(map)
>>> file_count
3
>>> [file for file in map]
["file.txt", "folder/path/file.txt", "another/folder/file.txt"]
>>> map.getitems(["file.txt", "folder/path/file.txt", "another/folder/file.txt"])
{
"file.txt": b"Hello, world!",
"folder/path/file.txt": b"Hello, world!",
"another/folder/file.txt": b"Hello, world!",
}
Set file contents in file system using map:
>>> map["file.txt"] = b"Hello, world!"
>>> map["folder/path/new_file.txt"] = b"This is a new file!"
>>> map.setitems({
"another/folder/file.txt": b"Hello, world!",
"folder/path/new_file.txt": b"This is a new file!",
})
Delete files from file system using map:
>>> del map["file.txt"]
>>> map.delitems(["folder/path/new_file.txt", "another/folder/file.txt"])
>>> map.pop("file.txt", "default_value_if_file_does_not_exist")
b'Hello, world!'
>>> map.pop("folder/path/non_existent_file.txt", "default_value_if_file_does_not_exist")
'default_value_if_file_does_not_exist'
Clear all files under the map root. This may have unintended consequences as DataRobot file system does not support empty directories:
>>> map.clear()
>>> len(map)
0
pipe_file(path, value, mode='overwrite', **kwargs)¶
Set the bytes of a given file.
- Parameters:
- path (
str) – Path to the file to set the bytes of. - value (
bytes) – Bytes to set the file to. - mode (
str) – Mode to use when writing to the file. Defaults to “overwrite”. Use create to only write if the file does not exist. - kwargs (
Any) – Additional keyword arguments passed toopen().
- path (
- Return type:
None
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.pipe_file("dr://696935d6d5a04a752419cf6d/my/new/file.txt", b"Hello, world!")
pipe(path, value=None, **kwargs)¶
Put value into path.
Counterpart to cat().
Calls put_file().
- Parameters:
- path (
Union[str,Dict[str,bytes]]) – Path to write the value to. If a string, a single remote location to putvaluebytes. If a dict, a mapping of{path: bytesvalue}. - value (
Optional[bytes]) – Value to put into the path. If using a single path, these are bytes to put there. Ignored if path is a dict. - kwargs (
Any) – Additional keyword arguments passed toput_file().
- path (
- Raises: ValueError – If path is not a string or dict.
- Return type:
None
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.pipe("dr://696935d6d5a04a752419cf6d/my/new/file.txt", b"Hello, world!")
>>> fs.pipe({"dr://696935d6d5a04a752419cf6d/my/new/file.txt": b"Hello, world!"})
>>> fs.pipe({
... "dr://696935d6d5a04a752419cf6d/my/new/file.txt": b"Hello, world!",
... "dr://696935d6d5a04a752419cf6d/my/new/file2.txt": b"Hello, world2!",
... })
checksum(path)¶
Unique value for the content of a file at the given path.
If the checksum is the same from one moment to another, the contents are guaranteed to be the same. If the checksum changes, the contents might have changed.
- Parameters:
path (
str) – Path in the DataRobot file system to get the checksum of. - Returns: The checksum of the file at the given path.
- Return type:
int
expand_path(path, recursive=False, maxdepth=None, **kwargs)¶
Turn one or more paths (can be globs or directory paths) into a list of all matching paths to files and directories.
- Parameters:
- path (
Union[str,List[str]]) – Path or list of paths to expand. - recursive (
bool) – Whether to search recursively when expanding paths. - maxdepth (
Optional[int]) – Maximum depth to search when expanding paths. - kwargs (
Any) – Additional keyword arguments passed tofind()orglob(), which may in turn callls().
- path (
- Returns: List of all matching paths.
- Return type:
List[str]
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.expand_path("dr://696935d6d5a04a752419cf6d/finance/", recursive=True, maxdepth=1)
[
'dr://696935d6d5a04a752419cf6d/finance/',
'dr://696935d6d5a04a752419cf6d/finance/budgets/',
'dr://696935d6d5a04a752419cf6d/finance/employee-list.csv',
]
Expand a glob pattern with no max depth:
>>> fs.expand_path("dr://696935d6d5a04a752419cf6d/finance/**/*.csv", recursive=True)
[
'dr://696935d6d5a04a752419cf6d/finance/employee-list.csv',
'dr://696935d6d5a04a752419cf6d/finance/budgets/Q2_budget_2024.csv',
'dr://696935d6d5a04a752419cf6d/finance/budgets/archive/Q3_budget_2000.csv',
]
Expand a list of paths:
>>> fs.expand_path([
... "dr://696935d6d5a04a752419cf6d/finance/budgets/*.csv",
... "dr://696935d6d5a04a752419cf6d/finance/employee-list.csv",
... ])
[
'dr://696935d6d5a04a752419cf6d/finance/budgets/Q2_budget_2024.csv',
'dr://696935d6d5a04a752419cf6d/finance/employee-list.csv',
]
get(rpath, lpath, recursive=False, callback=, maxdepth=None, **kwargs)¶
Download file(s) from the DataRobot file system to the local file system.
Copies a specific file or tree of files (if recursive``=True). If ``lpath
ends with a “/”, it will be assumed to be a directory, and target files
will go within. Can submit a list of paths, which may be glob-patterns
and will be expanded.
Calls get_file() for each file.
- Parameters:
- rpath (
Union[str,List[str]]) – Path or list of paths to download from the DataRobot file system. - lpath (
Union[str,List[str]]) – Path or list of paths to download to the local file system. - recursive (
bool) – Whether to recursively target files to download inside directories. - callback (
Callback) – Callback to track progress of the file transfer. - maxdepth (
Optional[int]) – Maximum depth to recurse when targeting files to download inside directories. - kwargs (
Any) – Additional keyword arguments passed toget_file().
- rpath (
- Return type:
None
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.get(
... "dr://696935d6d5a04a752419cf6d/finance/budgets/Q2_budget_2024.csv",
... "/Users/username/local/path/to/download/Q2_budget_2024.csv",
... )
Download a directory recursively:
>>> fs.get(
... "dr://696935d6d5a04a752419cf6d/finance/budgets/",
... "/Users/username/local/path/to/download/budgets/",
... recursive=True,
... )
Download all PDF files in a directory:
>>> fs.get(
... "dr://696935d6d5a04a752419cf6d/finance/budgets/**/*.pdf",
... "/Users/username/local/path/to/download/budgets/",
... recursive=True,
... )
Download multiple files at once:
>>> fs.get(
... [
... "dr://696935d6d5a04a752419cf6d/finance/budgets/Q2_budget_2024.csv",
... "dr://696935d6d5a04a752419cf6d/finance/employee-list.csv"
... ],
... [
... "/Users/username/local/path/to/download/Q2_budget_2024.csv",
... "/Users/username/local/path/to/download/employee-list.csv"
... ],
... )
get_file(rpath, lpath, callback=, outfile=None, **kwargs)¶
Download a single file from the DataRobot file system to the local file system.
- Parameters:
- rpath (
str) – Path to download from the DataRobot file system. - lpath (
str) – Path to download to the local file system. - callback (
Callback) – Callback to track progress of the file transfer. - outfile (
Optional[IOBase]) – File-like object to write to. The user is responsible for closing it when they are done. - kwargs (
Any) – Additional keyword arguments passed toopen().
- rpath (
- Return type:
None
Examples
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> fs.get_file(
... "dr://696935d6d5a04a752419cf6d/finance/budgets/Q2_budget_2024.csv",
... "/Users/username/local/path/to/download/Q2_budget_2024.csv",
... )
>>> from datarobot.fs import DataRobotFileSystem
>>> fs = DataRobotFileSystem()
>>> with open("/Users/username/local/path/to/download/Q2_budget_2024.csv", "wb") as f:
... fs.get_file("dr://696935d6d5a04a752419cf6d/finance/budgets/Q2_budget_2024.csv", f)
mkdir(*args, **kwargs)¶
Not supported as DataRobotFileSystem does not support empty directories.
- Return type:
None
makedirs(*args, **kwargs)¶
Not supported as DataRobotFileSystem does not support empty directories.
- Return type:
None
rmdir(*args, **kwargs)¶
Not supported as DataRobotFileSystem does not support empty directories.
- Return type:
None
modified(*args, **kwargs)¶
DataRobotFileSystem does not currently expose file modification timestamp.
- Return type:
datetime
cat_ranges(paths, starts, ends, max_gap=None, on_error='return', **kwargs)¶
Get the contents of byte ranges from one or more files
- Parameters:
- paths (
list) – A list of of filepaths on this filesystems - starts (
intorlist) – Bytes limits of the read. If using a single int, the same value will be used to read all the specified files. - ends (
intorlist) – Bytes limits of the read. If using a single int, the same value will be used to read all the specified files.
- paths (
cp(path1, path2, **kwargs)¶
Alias of AbstractFileSystem.copy.
classmethod current()¶
Return the most recently instantiated FileSystem
If no instance has been created, then create one with defaults
delete(path, recursive=False, maxdepth=None)¶
Alias of AbstractFileSystem.rm.
disk_usage(path, total=True, maxdepth=None, **kwargs)¶
Alias of AbstractFileSystem.du.
download(rpath, lpath, recursive=False, **kwargs)¶
Alias of AbstractFileSystem.get.
exists(path, **kwargs)¶
Is there a file at the given path
static from_dict(dct)¶
Recreate a filesystem instance from dictionary representation.
See .to_dict() for the expected structure of the input.
- Parameters:
dct (
Dict[str,Any]) - Return type:
file system instance,not necessarilyofthis particular class.
WARNING¶
This can import arbitrary modules (as determined by the cls key).
Make sure you haven’t installed any modules that may execute malicious code
at import time.
static from_json(blob)¶
Recreate a filesystem instance from JSON representation.
See .to_json() for the expected structure of the input.
- Parameters:
blob (
str) - Return type:
file system instance,not necessarilyofthis particular class.
WARNING¶
This can import arbitrary modules (as determined by the cls key).
Make sure you haven’t installed any modules that may execute malicious code
at import time.
head(path, size=1024)¶
Get the first size bytes from file
isdir(path)¶
Is this entry directory-like?
isfile(path)¶
Is this entry file-like?
lexists(path, **kwargs)¶
If there is a file at the given path (including broken links)
listdir(path, detail=True, **kwargs)¶
Alias of AbstractFileSystem.ls.
makedir(path, create_parents=True, **kwargs)¶
Alias of AbstractFileSystem.mkdir.
mkdirs(path, exist_ok=False)¶
Alias of AbstractFileSystem.makedirs.
move(path1, path2, **kwargs)¶
Alias of AbstractFileSystem.mv.
read_bytes(path, start=None, end=None, **kwargs)¶
Alias of AbstractFileSystem.cat_file.
read_text(path, encoding=None, errors=None, newline=None, **kwargs)¶
Get the contents of the file as a string.
- Parameters:
- path (
str) – URL of file on this filesystems - encoding (
same asopen.\) - errors (
same asopen.\) - newline (
same asopen.\)
- path (
rename(path1, path2, **kwargs)¶
Alias of AbstractFileSystem.mv.
size(path)¶
Size in bytes of file
sizes(paths)¶
Size in bytes of each file in a list of paths
stat(path, **kwargs)¶
Alias of AbstractFileSystem.info.
tail(path, size=1024)¶
Get the last size bytes from file
to_dict(, include_password=True)¶
JSON-serializable dictionary representation of this filesystem instance.
- Parameters:
include_password (
bool, defaultTrue) – Whether to include the password (if any) in the output. - Return type:
dict[str,Any] - Returns:
Dictionary with keys ``cls`` (the python location\ofthis class),protocol (text nameofthis class's protocol,first one in case ofmultiple),args(positional args,usually empty), andall otherkeyword arguments as their own keys.
WARNING¶
Serialized filesystems may contain sensitive information which have been passed to the constructor, such as passwords and tokens. Make sure you store and send them in a secure environment!
to_json(, include_password=True)¶
JSON representation of this filesystem instance.
- Parameters:
include_password (
bool, defaultTrue) – Whether to include the password (if any) in the output. - Return type:
str - Returns:
JSON string with keys ``cls`` (the python location\ofthis class),protocol (text nameofthis class's protocol,first one in case ofmultiple),args(positional args,usually empty), andall otherkeyword arguments as their own keys.
WARNING¶
Serialized filesystems may contain sensitive information which have been passed to the constructor, such as passwords and tokens. Make sure you store and send them in a secure environment!
ukey(path)¶
Hash of file properties, to tell if it has changed
unstrip_protocol(name)¶
Format FS-specific path to generic, including protocol
- Return type:
str
upload(lpath, rpath, recursive=False, **kwargs)¶
Alias of AbstractFileSystem.put.
walk(path, maxdepth=None, topdown=True, on_error='omit', **kwargs)¶
Return all files under the given path.
List all files, recursing into subdirectories; output is iterator-style,
like os.walk(). For a simple list of files, find() is available.
When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames; this can be used to prune the search, impose a specific order of visiting, or even to inform walk() about directories the caller creates or renames before it resumes walk() again. Modifying dirnames when topdown is False has no effect. (see os.walk)
Note that the “files” outputted will include anything that is not a directory, such as links.
- Parameters:
- path (
str) – Root to recurse into - maxdepth (
int) – Maximum recursion depth. None means limitless, but not recommended on link-based file-systems. - topdown (
bool (True)) – Whether to walk the directory tree from the top downwards or from the bottom upwards. - on_error (
"omit","raise",a callable) – if omit (default), path with exception will simply be empty; If raise, an underlying exception will be raised; if callable, it will be called with a single OSError instance as argument - kwargs (
passedtols)
- path (
write_bytes(path, value, **kwargs)¶
Alias of AbstractFileSystem.pipe_file.
write_text(path, value, encoding=None, errors=None, newline=None, **kwargs)¶
Write the text to the given file.
An existing file will be overwritten.
- Parameters:
- path (
str) – URL of file on this filesystems - value (
str) – Text to write. - encoding (
same asopen.\) - errors (
same asopen.\) - newline (
same asopen.\)
- path (
class datarobot.fs.file_system.DataRobotFile¶
Bases: AbstractBufferedFile
File-like object for reading and writing files in the DataRobot file system.
Supports read modes ‘r’, ‘rb’ and write modes ‘w’, ‘wb’, ‘xb’. DataRobot file system buffers writes in memory only before uploading on close.
- Variables:
- path (
str) – File path in the DataRobot file system. - mode (
str) – File mode, either ‘rb’, ‘wb’, or ‘xb’. - fs (
DataRobotFileSystem) – The DataRobot file system instance. - blocksize (
int) – Block size for reading files. - autocommit (
bool) – Whether to automatically commit changes on close. - loc (
int) – Current position in the file. - closed (
bool) – Whether the file is closed. - forced (
bool) – Whether the file is in forced mode. - offset (
Optional[int]) – Content length of the file. - buffer (
io.BytesIO) – In-memory buffer when writing. - overwrite_strategy – Strategy to handle file naming conflicts when writing files.
- unpack_archive_files – Whether to unpack archive files (zip, tar, tar.gz, tgz) upon upload.
- upload_timeout – Maximum time in seconds to wait for file upload to complete.
- path (
SEE ALSO¶
write(data)¶
Write data to buffer.
- Parameters:
data (
bytes) – Data to write as bytes. - Returns: Number of bytes written.
- Return type:
int - Raises: ValueError – If the file is not in write mode, is closed, or has been force-flushed.
flush(force=False)¶
Write the buffered data to the DataRobot file system if force is True.
Notes
Since DataRobot file system does not support multipart uploads, calling flush without force does not upload any data.
- Parameters:
force (
bool) – Whether to force flush and upload data. Disallows further writing to this file. - Raises: ValueError – If the file is closed or if force flush has already been called.
- Return type:
None
upload()¶
Alias of flush(force=True).
- Return type:
None
close()¶
Close file. Finalizes writes, discards cache.
- Return type:
None
property url : str¶
A signed URL for the file.
commit()¶
Move from temp to final destination
discard()¶
Throw away temporary file
fileno()¶
Returns underlying file descriptor if one exists.
OSError is raised if the IO object does not use a file descriptor.
info()¶
File information about this path
isatty()¶
Return whether this is an ‘interactive’ stream.
Return False if it can’t be determined.
read(length=-1)¶
Return data from cache, or fetch pieces as necessary
- Parameters:
length (
int (-1)) – Number of bytes to read; if <0, all remaining bytes.
readable()¶
Whether opened for reading
readinto(b)¶
mirrors builtin file’s readinto method
https://docs.python.org/3/library/io.html#io.RawIOBase.readinto
readline()¶
Read until and including the first occurrence of newline character
Note that, because of character encoding, this is not necessarily a true line ending.
readlines()¶
Return all data, split by the newline character, including the newline character
readuntil(char=b'\n', blocks=None)¶
Return data between current position and first occurrence of char
char is included in the output, except if the end of the tile is encountered first.
- Parameters:
- char (
bytes) – Thing to find - blocks (
Noneorint) – How much to read in each go. Defaults to file blocksize - which may mean a new read on every call.
- char (
seek(loc, whence=0)¶
Set current file location
- Parameters:
- loc (
int) – byte location - whence (
{0, 1, 2}) – from start of file, current location or end of file, resp.
- loc (
seekable()¶
Whether is seekable (only in read mode)
tell()¶
Current file location
truncate()¶
Truncate file to size bytes.
File pointer is left unchanged. Size defaults to the current IO position as reported by tell(). Returns the new size.
property use_range_headers : bool¶
Whether to use range headers when reading data from file URL.
writable()¶
Whether opened for writing
writelines(lines,)¶
Write a list of lines to stream.
Line separators are not added, so it is usual for each of the lines provided to have a line separator at the end.
property is_datarobot_url_for_read : bool¶
Whether the file URL is a DataRobot URL.
property read_client : Session¶
Session client to use for reading data from file URL. Supports unauthenticated clients for URLs outside DataRobot with embedded authentication.
class datarobot.fs.file_system.DataRobotFSMap¶
Bases: FSMap
Wrap a DataRobotFileSystem
instance as a mutable mapping.
The keys of the mapping become files under the given root, and the values (which must be bytes) the contents of those files.
- Parameters:
- root (
str) – The root path in the DataRobot file system to create the mapper for. - fs (
DataRobotFileSystem) – The DataRobot file system instance. - missing_exceptions (
Optional[Tuple[Type[Exception],]]) – Exceptions to convert to KeyError when accessing the file system.
- root (
Examples
>>> from datarobot.fs import DataRobotFileSystem, DataRobotFSMap
>>> fs = DataRobotFileSystem()
>>> map = DataRobotFSMap("dr://696935d6d5a04a752419cf6d/", fs)
Retrieve file contents from file system using map:
>>> map["file.txt"]
b"Hello, world!"
>>> "folder/path/file.txt" in map
True
>>> file_count = len(map)
>>> file_count
3
>>> [file for file in map]
["file.txt", "folder/path/file.txt", "another/folder/file.txt"]
>>> map.getitems(["file.txt", "folder/path/file.txt", "another/folder/file.txt"])
{
"file.txt": b"Hello, world!",
"folder/path/file.txt": b"Hello, world!",
"another/folder/file.txt": b"Hello, world!",
}
Set file contents in file system using map:
>>> map["file.txt"] = b"Hello, world!"
>>> map["folder/path/new_file.txt"] = b"This is a new file!"
>>> map.setitems({
"another/folder/file.txt": b"Hello, world!",
"folder/path/new_file.txt": b"This is a new file!",
})
Delete files from file system using map:
>>> del map["file.txt"]
>>> map.delitems(["folder/path/new_file.txt", "another/folder/file.txt"])
>>> map.pop("file.txt", "default_value_if_file_does_not_exist")
b'Hello, world!'
>>> map.pop("folder/path/non_existent_file.txt", "default_value_if_file_does_not_exist")
'default_value_if_file_does_not_exist'
Clear all files under the map root directory. This may have unintended consequences as DataRobot file system does not support empty directories:
>>> map.clear()
>>> len(map)
0
delitems(keys)¶
Remove multiple keys from the store
property dirfs¶
dirfs instance that can be used with the same keys as the mapper
get(k) → D[k] if k in D, else d. d defaults to None.¶
getitems(keys, on_error='raise')¶
Fetch multiple items from the store
If the backend is async-able, this might proceed concurrently
- Parameters:
- keys (
list(str)) – They keys to be fetched - on_error (
"raise","omit","return") – If raise, an underlying exception will be raised (converted to KeyError if the type is in self.missing_exceptions); if omit, keys with exception will simply not be included in the output; if “return”, all keys are included in the output, but the value will be bytes or an exception instance.
- keys (
- Return type:
dict(key,bytes|exception)
items() → a set-like object providing a view on D's items¶
keys() → a set-like object providing a view on D's keys¶
pop(key, default=None)¶
Pop data
popitem() → (k, v), remove and return some (key, value) pair¶
as a 2-tuple; but raise KeyError if D is empty.
setdefault(k) → D.get(k,d), also set D[k]=d if k not in D¶
setitems(values_dict)¶
Set the values of multiple items in the store
- Parameters:
values_dict (
dict(str,bytes))
update(**F) → None. Update D from mapping/iterable E and F.¶
If E present and has a .keys() method, does: for k in E: D[k] = E[k] If E present and lacks .keys() method, does: for (k, v) in E: D[k] = v In either case, this is followed by: for k, v in F.items(): D[k] = v
values() → an object providing a view on D's values¶
clear()¶
Remove all keys below root. Empties out the mapping.
Notes
May delete more directories than expected as DataRobot file system does not support empty directories.
- Return type:
None
Enum and Helpers¶
class datarobot.fs.file_system.FileInfo¶
Information about a file or directory in DataRobot File System.
- Variables:
- name – The path of the file or directory. Does not include the protocol prefix.
- size – The size of the file in bytes. For directories, this is 0.
- type – The type of the item, either ‘file’ or ‘directory’.
- format – The file format (e.g., ‘csv’, ‘pdf’) if the item is a file; None for directories.
- created_at – The file creation timestamp if the item is a file; None for directories.
class datarobot.enums.FilesOverwriteStrategy¶
Strategy to handle naming conflicts when writing to a path where a file already exists.
RENAME = 'rename'¶
Rename an uploaded file using “
REPLACE = 'replace'¶
Overwrite the existing file.
SKIP = 'skip'¶
Skip uploading if a file already exists at the target path.
ERROR = 'error'¶
Raise FileExistsError if a file already exists at the target path.