feat(agent/workspace): Add GCS and S3 FileWorkspace providers (#6485)

* refactor: Rename FileWorkspace to LocalFileWorkspace and create FileWorkspace abstract class - Rename `FileWorkspace` to `LocalFileWorkspace` to provide a more descriptive name for the class that represents a file workspace that works with local files. - Create a new base class `FileWorkspace` to serve as the parent class for `LocalFileWorkspace`. This allows for easier extension and customization of file workspaces in the future. - Update import statements and references to `FileWorkspace` throughout the codebase to use the new naming conventions. * feat: Add S3FileWorkspace + tests + test setups for CI and Docker - Added S3FileWorkspace class to provide an interface for interacting with a file workspace and storing files in an S3 bucket. - Updated pyproject.toml to include dependencies for boto3 and boto3-stubs. - Implemented unit tests for S3FileWorkspace. - Added MinIO service to Docker CI to allow testing S3 features in CI. - Added autogpt-test service config to docker-compose.yml for local testing with MinIO. * ci(docker): tee test output instead of capturing * fix: Improve error handling in S3FileWorkspace.initialize() - Do not tolerate all `botocore.exceptions.ClientError`s - Raise the exception anyways if the error is not "NoSuchBucket" * feat: Add S3 workspace backend support and S3Credentials - Added support for S3 workspace backend in the Autogpt configuration - Added a new sub-config `S3Credentials` to store S3 credentials - Modified the `.env.template` file to include variables related to S3 credentials - Added a new `s3_credentials` attribute on the `Config` class to store S3 credentials - Moved the `unmasked` method from `ModelProviderCredentials` to the parent `ProviderCredentials` class to handle unmasking for S3 credentials * fix(agent/tests): Fix S3FileWorkspace initialization in test_s3_file_workspace.py - Update the S3FileWorkspace initialization in the test_s3_file_workspace.py file to include the required S3 Credentials. * refactor: Remove S3Credentials and add get_workspace function - Remove `S3Credentials` as boto3 will fetch the config from the environment by itself - Add `get_workspace` function in `autogpt.file_workspace` module - Update `.env.template` and tests to reflect the changes * feat(agent/workspace): Make agent workspace backend configurable - Modified `autogpt.file_workspace.get_workspace` function to either take a workspace `id` or `root_path`. - Modified `FileWorkspaceMixin` to use the `get_workspace` function to set up the workspace. - Updated the type hints and imports accordingly. * feat(agent/workspace): Add GCSFileWorkspace for Google Cloud Storage - Added support for Google Cloud Storage as a storage backend option in the workspace. - Created the `GCSFileWorkspace` class to interface with a file workspace stored in a Google Cloud Storage bucket. - Implemented the `GCSFileWorkspaceConfiguration` class to handle the configuration for Google Cloud Storage workspaces. - Updated the `get_workspace` function to include the option to use Google Cloud Storage as a workspace backend. - Added unit tests for the new `GCSFileWorkspace` class. * fix: Unbreak use of non-local workspaces in AgentProtocolServer - Modify the `_get_task_agent_file_workspace` method to handle both local and non-local workspaces correctly
2025-12-17 14:04:27 +01:00 · 2023-12-07 14:46:08 +01:00
parent fdd7f8e5f9
commit 1f40d72081
20 changed files with 1426 additions and 153 deletions
--- a/.github/workflows/autogpt-ci.yml
+++ b/.github/workflows/autogpt-ci.yml
@@ -83,6 +83,15 @@ jobs:
      matrix:
        python-version: ["3.10"]
    services:
      minio:
        image: minio/minio:edge-cicd
        ports:
          - 9000:9000
        options: >
          --health-interval=10s --health-timeout=5s --health-retries=3
          --health-cmd="curl -f http://localhost:9000/minio/health/live"
    steps:
      - name: Checkout repository
        uses: actions/checkout@v3
@@ -154,8 +163,11 @@ jobs:
            tests/unit tests/integration
        env:
          CI: true
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          PLAIN_OUTPUT: True
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          S3_ENDPOINT_URL: http://localhost:9000
          AWS_ACCESS_KEY_ID: minioadmin
          AWS_SECRET_ACCESS_KEY: minioadmin
      - name: Upload coverage reports to Codecov
        uses: codecov/codecov-action@v3
--- a/.github/workflows/autogpt-docker-ci.yml
+++ b/.github/workflows/autogpt-docker-ci.yml
@@ -89,6 +89,15 @@ jobs:
  test:
    runs-on: ubuntu-latest
    timeout-minutes: 10
    services:
      minio:
        image: minio/minio:edge-cicd
        options: >
          --name=minio
          --health-interval=10s --health-timeout=5s --health-retries=3
          --health-cmd="curl -f http://localhost:9000/minio/health/live"
    steps:
      - name: Check out repository
        uses: actions/checkout@v3
@@ -124,23 +133,25 @@ jobs:
          CI: true
          PLAIN_OUTPUT: True
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
          S3_ENDPOINT_URL: http://minio:9000
          AWS_ACCESS_KEY_ID: minioadmin
          AWS_SECRET_ACCESS_KEY: minioadmin
        run: |
          set +e
          test_output=$(
          docker run --env CI --env OPENAI_API_KEY \
            --network container:minio \
            --env S3_ENDPOINT_URL --env AWS_ACCESS_KEY_ID --env AWS_SECRET_ACCESS_KEY \
            --entrypoint poetry ${{ env.IMAGE_NAME }} run \
            pytest -v --cov=autogpt --cov-branch --cov-report term-missing \
            --numprocesses=4 --durations=10 \
-              tests/unit tests/integration 2>&1
+            tests/unit tests/integration 2>&1 | tee test_output.txt
          )
          test_failure=$?
-          echo "$test_output"
+          test_failure=${PIPESTATUS[0]}
          cat << $EOF >> $GITHUB_STEP_SUMMARY
          # Tests $([ $test_failure = 0 ] && echo '✅' || echo '❌')
          \`\`\`
-          $test_output
+          $(cat test_output.txt)
          \`\`\`
          $EOF
--- a/autogpts/autogpt/.env.template
+++ b/autogpts/autogpt/.env.template
@@ -8,9 +8,32 @@ OPENAI_API_KEY=your-openai-api-key
 ## EXECUTE_LOCAL_COMMANDS - Allow local command execution (Default: False)
 # EXECUTE_LOCAL_COMMANDS=False
 ### Workspace ###
 ## RESTRICT_TO_WORKSPACE - Restrict file operations to workspace ./data/agents/<agent_id>/workspace (Default: True)
 # RESTRICT_TO_WORKSPACE=True
 ## DISABLED_COMMAND_CATEGORIES - The list of categories of commands that are disabled (Default: None)
 # DISABLED_COMMAND_CATEGORIES=
 ## WORKSPACE_BACKEND - Choose a storage backend for workspace contents
 ## Options: local, gcs, s3
 # WORKSPACE_BACKEND=local
 ## WORKSPACE_STORAGE_BUCKET - GCS/S3 Bucket to store workspace contents in
 # WORKSPACE_STORAGE_BUCKET=autogpt
 ## GCS Credentials
 # see https://cloud.google.com/storage/docs/authentication#libauth
 ## AWS/S3 Credentials
 # see https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
 ## S3_ENDPOINT_URL - If you're using non-AWS S3, set your endpoint here.
 # S3_ENDPOINT_URL=
 ### Miscellaneous ###
 ## USER_AGENT - Define the user-agent used by the requests library to browse website (string)
 # USER_AGENT="Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"
@@ -29,12 +52,6 @@ OPENAI_API_KEY=your-openai-api-key
 ## EXIT_KEY - Key to exit AutoGPT
 # EXIT_KEY=n
 ## PLAIN_OUTPUT - Plain output, which disables the spinner (Default: False)
 # PLAIN_OUTPUT=False
 ## DISABLED_COMMAND_CATEGORIES - The list of categories of commands that are disabled (Default: None)
 # DISABLED_COMMAND_CATEGORIES=
 ################################################################################
 ### LLM PROVIDER
 ################################################################################
@@ -201,5 +218,5 @@ OPENAI_API_KEY=your-openai-api-key
 ## Note: Log file output is disabled if LOG_FORMAT=structured_google_cloud.
 # LOG_FILE_FORMAT=simple
-## PLAIN_OUTPUT - Disables animated typing in the console output.
+## PLAIN_OUTPUT - Disables animated typing and the spinner in the console output. (Default: False)
 # PLAIN_OUTPUT=False
--- a/autogpts/autogpt/autogpt/agents/features/file_workspace.py
+++ b/autogpts/autogpt/autogpt/agents/features/file_workspace.py
@@ -5,11 +5,15 @@ from typing import TYPE_CHECKING
 if TYPE_CHECKING:
    from pathlib import Path
-    from ..base import BaseAgent
+    from ..base import BaseAgent, Config
-from autogpt.file_workspace import FileWorkspace
+from autogpt.file_workspace import (
    FileWorkspace,
    FileWorkspaceBackendName,
    get_workspace,
 )
-from ..base import AgentFileManager, BaseAgentConfiguration
+from ..base import AgentFileManager, BaseAgentSettings
 class FileWorkspaceMixin:
@@ -22,32 +26,36 @@ class FileWorkspaceMixin:
        # Initialize other bases first, because we need the config from BaseAgent
        super(FileWorkspaceMixin, self).__init__(**kwargs)
        config: BaseAgentConfiguration = getattr(self, "config")
        if not isinstance(config, BaseAgentConfiguration):
            raise ValueError(
                "Cannot initialize Workspace for Agent without compatible .config"
            )
        file_manager: AgentFileManager = getattr(self, "file_manager")
        if not file_manager:
            return
-        self.workspace = _setup_workspace(file_manager, config)
+        self._setup_workspace()
    def attach_fs(self, agent_dir: Path):
        res = super(FileWorkspaceMixin, self).attach_fs(agent_dir)
-        self.workspace = _setup_workspace(self.file_manager, self.config)
+        self._setup_workspace()
        return res
    def _setup_workspace(self) -> None:
        settings: BaseAgentSettings = getattr(self, "state")
        assert settings.agent_id, "Cannot attach workspace to anonymous agent"
        app_config: Config = getattr(self, "legacy_config")
        file_manager: AgentFileManager = getattr(self, "file_manager")
-def _setup_workspace(file_manager: AgentFileManager, config: BaseAgentConfiguration):
+        ws_backend = app_config.workspace_backend
-    workspace = FileWorkspace(
+        local = ws_backend == FileWorkspaceBackendName.LOCAL
-        file_manager.root / "workspace",
+        workspace = get_workspace(
-        restrict_to_root=not config.allow_fs_access,
+            backend=ws_backend,
            id=settings.agent_id if not local else "",
            root_path=file_manager.root / "workspace" if local else None,
        )
        if local and settings.config.allow_fs_access:
            workspace._restrict_to_root = False  # type: ignore
        workspace.initialize()
-    return workspace
+        self.workspace = workspace
 def get_agent_workspace(agent: BaseAgent) -> FileWorkspace | None:
--- a/autogpts/autogpt/autogpt/app/agent_protocol_server.py
+++ b/autogpts/autogpt/autogpt/app/agent_protocol_server.py
@@ -33,7 +33,11 @@ from autogpt.commands.system import finish
 from autogpt.commands.user_interaction import ask_user
 from autogpt.config import Config
 from autogpt.core.resource.model_providers import ChatModelProvider
-from autogpt.file_workspace import FileWorkspace
+from autogpt.file_workspace import (
    FileWorkspace,
    FileWorkspaceBackendName,
    get_workspace,
 )
 from autogpt.models.action_history import ActionErrorResult, ActionSuccessResult
 logger = logging.getLogger(__name__)
@@ -340,7 +344,7 @@ class AgentProtocolServer:
        else:
            file_path = os.path.join(relative_path, file_name)
-        workspace = get_task_agent_file_workspace(task_id, self.agent_manager)
+        workspace = self._get_task_agent_file_workspace(task_id, self.agent_manager)
        await workspace.write_file(file_path, data)
        artifact = await self.db.create_artifact(
@@ -361,7 +365,7 @@ class AgentProtocolServer:
                file_path = os.path.join(artifact.relative_path, artifact.file_name)
            else:
                file_path = artifact.relative_path
-            workspace = get_task_agent_file_workspace(task_id, self.agent_manager)
+            workspace = self._get_task_agent_file_workspace(task_id, self.agent_manager)
            retrieved_artifact = workspace.read_file(file_path, binary=True)
        except NotFoundError:
            raise
@@ -376,24 +380,33 @@ class AgentProtocolServer:
            },
        )
    def _get_task_agent_file_workspace(
        self,
        task_id: str | int,
        agent_manager: AgentManager,
    ) -> FileWorkspace:
        use_local_ws = (
            self.app_config.workspace_backend == FileWorkspaceBackendName.LOCAL
        )
        agent_id = task_agent_id(task_id)
        workspace = get_workspace(
            backend=self.app_config.workspace_backend,
            id=agent_id if not use_local_ws else "",
            root_path=agent_manager.get_agent_dir(
                agent_id=agent_id,
                must_exist=True,
            )
            / "workspace"
            if use_local_ws
            else None,
        )
        workspace.initialize()
        return workspace
 def task_agent_id(task_id: str | int) -> str:
    return f"AutoGPT-{task_id}"
 def get_task_agent_file_workspace(
    task_id: str | int,
    agent_manager: AgentManager,
 ) -> FileWorkspace:
    return FileWorkspace(
        root=agent_manager.get_agent_dir(
            agent_id=task_agent_id(task_id),
            must_exist=True,
        )
        / "workspace",
        restrict_to_root=True,
    )
 def fmt_kwargs(kwargs: dict) -> str:
    return ", ".join(f"{n}={repr(v)}" for n, v in kwargs.items())
--- a/autogpts/autogpt/autogpt/config/config.py
+++ b/autogpts/autogpt/autogpt/config/config.py
@@ -20,6 +20,7 @@ from autogpt.core.resource.model_providers.openai import (
    OPEN_AI_CHAT_MODELS,
    OpenAICredentials,
 )
 from autogpt.file_workspace import FileWorkspaceBackendName
 from autogpt.logs.config import LoggingConfig
 from autogpt.plugins.plugins_config import PluginsConfig
 from autogpt.speech import TTSConfig
@@ -51,10 +52,19 @@ class Config(SystemSettings, arbitrary_types_allowed=True):
    chat_messages_enabled: bool = UserConfigurable(
        default=True, from_env=lambda: os.getenv("CHAT_MESSAGES_ENABLED") == "True"
    )
    # TTS configuration
    tts_config: TTSConfig = TTSConfig()
    logging: LoggingConfig = LoggingConfig()
    # Workspace
    workspace_backend: FileWorkspaceBackendName = UserConfigurable(
        default=FileWorkspaceBackendName.LOCAL,
        from_env=lambda: FileWorkspaceBackendName(v)
        if (v := os.getenv("WORKSPACE_BACKEND"))
        else None,
    )
    ##########################
    # Agent Control Settings #
    ##########################
--- a/autogpts/autogpt/autogpt/core/resource/model_providers/schema.py
+++ b/autogpts/autogpt/autogpt/core/resource/model_providers/schema.py
@@ -172,24 +172,10 @@ class ModelProviderCredentials(ProviderCredentials):
    api_version: SecretStr | None = UserConfigurable(default=None)
    deployment_id: SecretStr | None = UserConfigurable(default=None)
    def unmasked(self) -> dict:
        return unmask(self)
    class Config:
        extra = "ignore"
 def unmask(model: BaseModel):
    unmasked_fields = {}
    for field_name, field in model.__fields__.items():
        value = getattr(model, field_name)
        if isinstance(value, SecretStr):
            unmasked_fields[field_name] = value.get_secret_value()
        else:
            unmasked_fields[field_name] = value
    return unmasked_fields
 class ModelProviderUsage(ProviderUsage):
    """Usage for a particular model from a model provider."""
--- a/autogpts/autogpt/autogpt/core/resource/schema.py
+++ b/autogpts/autogpt/autogpt/core/resource/schema.py
@@ -1,7 +1,7 @@
 import abc
 import enum
-from pydantic import SecretBytes, SecretField, SecretStr
+from pydantic import BaseModel, SecretBytes, SecretField, SecretStr
 from autogpt.core.configuration import (
    SystemConfiguration,
@@ -39,6 +39,9 @@ class ProviderBudget(SystemConfiguration):
 class ProviderCredentials(SystemConfiguration):
    """Struct for credentials."""
    def unmasked(self) -> dict:
        return unmask(self)
    class Config:
        json_encoders = {
            SecretStr: lambda v: v.get_secret_value() if v else None,
@@ -47,6 +50,17 @@ class ProviderCredentials(SystemConfiguration):
        }
 def unmask(model: BaseModel):
    unmasked_fields = {}
    for field_name, _ in model.__fields__.items():
        value = getattr(model, field_name)
        if isinstance(value, SecretStr):
            unmasked_fields[field_name] = value.get_secret_value()
        else:
            unmasked_fields[field_name] = value
    return unmasked_fields
 class ProviderSettings(SystemSettings):
    resource_type: ResourceType
    credentials: ProviderCredentials | None = None
--- a/autogpts/autogpt/autogpt/file_workspace/init.py
+++ b/autogpts/autogpt/autogpt/file_workspace/init.py
@@ -1,5 +1,46 @@
-from .file_workspace import FileWorkspace
+import enum
 from pathlib import Path
 from typing import Optional
 from .base import FileWorkspace
 class FileWorkspaceBackendName(str, enum.Enum):
    LOCAL = "local"
    GCS = "gcs"
    S3 = "s3"
 def get_workspace(
    backend: FileWorkspaceBackendName, *, id: str = "", root_path: Optional[Path] = None
 ) -> FileWorkspace:
    assert bool(root_path) != bool(id), "Specify root_path or id to get workspace"
    if root_path is None:
        root_path = Path(f"workspaces/{id}")
    match backend:
        case FileWorkspaceBackendName.LOCAL:
            from .local import FileWorkspaceConfiguration, LocalFileWorkspace
            config = FileWorkspaceConfiguration.from_env()
            config.root = root_path
            return LocalFileWorkspace(config)
        case FileWorkspaceBackendName.S3:
            from .s3 import S3FileWorkspace, S3FileWorkspaceConfiguration
            config = S3FileWorkspaceConfiguration.from_env()
            config.root = root_path
            return S3FileWorkspace(config)
        case FileWorkspaceBackendName.GCS:
            from .gcs import GCSFileWorkspace, GCSFileWorkspaceConfiguration
            config = GCSFileWorkspaceConfiguration.from_env()
            config.root = root_path
            return GCSFileWorkspace(config)
 __all__ = [
    "FileWorkspace",
    "FileWorkspaceBackendName",
    "get_workspace",
 ]
--- a/autogpts/autogpt/autogpt/file_workspace/file_workspace.py
+++ b/autogpts/autogpt/autogpt/file_workspace/file_workspace.py
@@ -3,18 +3,23 @@ The FileWorkspace class provides an interface for interacting with a file worksp
 """
 from __future__ import annotations
 import inspect
 import logging
 from abc import ABC, abstractmethod
 from pathlib import Path
-from typing import Any, Callable, Optional
+from typing import Any, Callable, Literal, Optional, overload
 from autogpt.core.configuration.schema import SystemConfiguration
 logger = logging.getLogger(__name__)
-class FileWorkspace:
+class FileWorkspaceConfiguration(SystemConfiguration):
-    """A class that represents a file workspace."""
+    restrict_to_root: bool = True
    root: Path = Path("/")
-    NULL_BYTES = ["\0", "\000", "\x00", "\u0000"]
+
 class FileWorkspace(ABC):
    """A class that represents a file workspace."""
    on_write_file: Callable[[Path], Any] | None = None
    """
@@ -24,22 +29,55 @@ class FileWorkspace:
        Path: The path of the file that was written, relative to the workspace root.
    """
    def __init__(self, root: str | Path, restrict_to_root: bool):
        self._root = self._sanitize_path(root)
        self._restrict_to_root = restrict_to_root
    @property
    @abstractmethod
    def root(self) -> Path:
-        """The root directory of the file workspace."""
+        """The root path of the file workspace."""
        return self._root
    @property
-    def restrict_to_root(self):
+    @abstractmethod
-        """Whether to restrict generated paths to the root."""
+    def restrict_to_root(self) -> bool:
-        return self._restrict_to_root
+        """Whether to restrict file access to within the workspace's root path."""
    @abstractmethod
    def initialize(self) -> None:
-        self.root.mkdir(exist_ok=True, parents=True)
+        """
        Calling `initialize()` should bring the workspace to a ready-to-use state.
        For example, it can create the resource in which files will be stored, if it
        doesn't exist yet. E.g. a folder on disk, or an S3 Bucket.
        """
    @abstractmethod
    def open_file(self, path: str | Path, mode: str = "r"):
        """Open a file in the workspace."""
    @overload
    @abstractmethod
    def read_file(self, path: str | Path, binary: Literal[False] = False) -> str:
        """Read a file in the workspace as text."""
        ...
    @overload
    @abstractmethod
    def read_file(self, path: str | Path, binary: Literal[True] = True) -> bytes:
        """Read a file in the workspace as binary."""
        ...
    @abstractmethod
    def read_file(self, path: str | Path, binary: bool = False) -> str | bytes:
        """Read a file in the workspace."""
    @abstractmethod
    async def write_file(self, path: str | Path, content: str | bytes) -> None:
        """Write to a file in the workspace."""
    @abstractmethod
    def list_files(self, path: str | Path = ".") -> list[Path]:
        """List all files in a directory in the workspace."""
    @abstractmethod
    def delete_file(self, path: str | Path) -> None:
        """Delete a file in the workspace."""
    def get_path(self, relative_path: str | Path) -> Path:
        """Get the full path for an item in the workspace.
@@ -50,44 +88,7 @@ class FileWorkspace:
        Returns:
            Path: The resolved path relative to the workspace.
        """
-        return self._sanitize_path(
+        return self._sanitize_path(relative_path, self.root)
            relative_path,
            root=self.root,
            restrict_to_root=self.restrict_to_root,
        )
    def open_file(self, path: str | Path, mode: str = "r"):
        """Open a file in the workspace."""
        full_path = self.get_path(path)
        return open(full_path, mode)
    def read_file(self, path: str | Path, binary: bool = False):
        """Read a file in the workspace."""
        with self.open_file(path, "rb" if binary else "r") as file:
            return file.read()
    async def write_file(self, path: str | Path, content: str | bytes):
        """Write to a file in the workspace."""
        with self.open_file(path, "wb" if type(content) is bytes else "w") as file:
            file.write(content)
        if self.on_write_file:
            path = Path(path)
            if path.is_absolute():
                path = path.relative_to(self.root)
            res = self.on_write_file(path)
            if inspect.isawaitable(res):
                await res
    def list_files(self, path: str | Path = "."):
        """List all files in a directory in the workspace."""
        full_path = self.get_path(path)
        return [str(file) for file in full_path.glob("*") if file.is_file()]
    def delete_file(self, path: str | Path):
        """Delete a file in the workspace."""
        full_path = self.get_path(path)
        full_path.unlink()
    @staticmethod
    def _sanitize_path(
@@ -113,8 +114,7 @@ class FileWorkspace:
        # Posix systems disallow null bytes in paths. Windows is agnostic about it.
        # Do an explicit check here for all sorts of null byte representations.
-        for null_byte in FileWorkspace.NULL_BYTES:
+        if "\0" in str(relative_path) or "\0" in str(root):
            if null_byte in str(relative_path) or null_byte in str(root):
            raise ValueError("embedded null byte")
        if root is None:
--- a/autogpts/autogpt/autogpt/file_workspace/gcs.py
+++ b/autogpts/autogpt/autogpt/file_workspace/gcs.py
@@ -0,0 +1,91 @@
 """
 The GCSWorkspace class provides an interface for interacting with a file workspace, and
 stores the files in a Google Cloud Storage bucket.
 """
 from __future__ import annotations
 import inspect
 import logging
 from pathlib import Path
 from google.cloud import storage
 from autogpt.core.configuration.schema import UserConfigurable
 from .base import FileWorkspace, FileWorkspaceConfiguration
 logger = logging.getLogger(__name__)
 class GCSFileWorkspaceConfiguration(FileWorkspaceConfiguration):
    bucket: str = UserConfigurable("autogpt", from_env="WORKSPACE_STORAGE_BUCKET")
 class GCSFileWorkspace(FileWorkspace):
    """A class that represents a Google Cloud Storage workspace."""
    _bucket: storage.Bucket
    def __init__(self, config: GCSFileWorkspaceConfiguration):
        self._bucket_name = config.bucket
        self._root = config.root
        self._gcs = storage.Client()
        super().__init__()
    @property
    def root(self) -> Path:
        """The root directory of the file workspace."""
        return self._root
    @property
    def restrict_to_root(self):
        """Whether to restrict generated paths to the root."""
        return True
    def initialize(self) -> None:
        self._bucket = self._gcs.get_bucket(self._bucket_name)
    def get_path(self, relative_path: str | Path) -> Path:
        return super().get_path(relative_path).relative_to(Path("/"))
    def open_file(self, path: str | Path, mode: str = "r"):
        """Open a file in the workspace."""
        path = self.get_path(path)
        blob = self._bucket.blob(str(path))
        return blob
    def read_file(self, path: str | Path, binary: bool = False) -> str | bytes:
        """Read a file in the workspace."""
        blob = self.open_file(path, "r")
        file_content = (
            blob.download_as_text() if not binary else blob.download_as_bytes()
        )
        return file_content
    async def write_file(self, path: str | Path, content: str | bytes):
        """Write to a file in the workspace."""
        blob = self.open_file(path, "w")
        blob.upload_from_string(content) if isinstance(
            content, str
        ) else blob.upload_from_file(content)
        if self.on_write_file:
            path = Path(path)
            if path.is_absolute():
                path = path.relative_to(self.root)
            res = self.on_write_file(path)
            if inspect.isawaitable(res):
                await res
    def list_files(self, path: str | Path = ".") -> list[Path]:
        """List all files in a directory in the workspace."""
        path = self.get_path(path)
        blobs = self._bucket.list_blobs(prefix=str(path))
        return [Path(blob.name) for blob in blobs if not blob.name.endswith("/")]
    def delete_file(self, path: str | Path) -> None:
        """Delete a file in the workspace."""
        path = self.get_path(path)
        blob = self._bucket.blob(str(path))
        blob.delete()
--- a/autogpts/autogpt/autogpt/file_workspace/local.py
+++ b/autogpts/autogpt/autogpt/file_workspace/local.py
@@ -0,0 +1,67 @@
 """
 The LocalFileWorkspace class implements a FileWorkspace that works with local files.
 """
 from __future__ import annotations
 import inspect
 import logging
 from pathlib import Path
 from .base import FileWorkspace, FileWorkspaceConfiguration
 logger = logging.getLogger(__name__)
 class LocalFileWorkspace(FileWorkspace):
    """A class that represents a file workspace."""
    def __init__(self, config: FileWorkspaceConfiguration):
        self._root = self._sanitize_path(config.root)
        self._restrict_to_root = config.restrict_to_root
        super().__init__()
    @property
    def root(self) -> Path:
        """The root directory of the file workspace."""
        return self._root
    @property
    def restrict_to_root(self):
        """Whether to restrict generated paths to the root."""
        return self._restrict_to_root
    def initialize(self) -> None:
        self.root.mkdir(exist_ok=True, parents=True)
    def open_file(self, path: str | Path, mode: str = "r"):
        """Open a file in the workspace."""
        full_path = self.get_path(path)
        return open(full_path, mode)
    def read_file(self, path: str | Path, binary: bool = False):
        """Read a file in the workspace."""
        with self.open_file(path, "rb" if binary else "r") as file:
            return file.read()
    async def write_file(self, path: str | Path, content: str | bytes):
        """Write to a file in the workspace."""
        with self.open_file(path, "wb" if type(content) is bytes else "w") as file:
            file.write(content)
        if self.on_write_file:
            path = Path(path)
            if path.is_absolute():
                path = path.relative_to(self.root)
            res = self.on_write_file(path)
            if inspect.isawaitable(res):
                await res
    def list_files(self, path: str | Path = "."):
        """List all files in a directory in the workspace."""
        full_path = self.get_path(path)
        return [str(file) for file in full_path.glob("*") if file.is_file()]
    def delete_file(self, path: str | Path):
        """Delete a file in the workspace."""
        full_path = self.get_path(path)
        full_path.unlink()
--- a/autogpts/autogpt/autogpt/file_workspace/s3.py
+++ b/autogpts/autogpt/autogpt/file_workspace/s3.py
@@ -0,0 +1,122 @@
 """
 The S3Workspace class provides an interface for interacting with a file workspace, and
 stores the files in an S3 bucket.
 """
 from __future__ import annotations
 import contextlib
 import inspect
 import logging
 import os
 from pathlib import Path
 from typing import TYPE_CHECKING, Optional
 import boto3
 import botocore.exceptions
 from pydantic import SecretStr
 from autogpt.core.configuration.schema import UserConfigurable
 from .base import FileWorkspace, FileWorkspaceConfiguration
 if TYPE_CHECKING:
    import mypy_boto3_s3
 logger = logging.getLogger(__name__)
 class S3FileWorkspaceConfiguration(FileWorkspaceConfiguration):
    bucket: str = UserConfigurable("autogpt", from_env="WORKSPACE_STORAGE_BUCKET")
    s3_endpoint_url: Optional[SecretStr] = UserConfigurable(
        from_env=lambda: SecretStr(v) if (v := os.getenv("S3_ENDPOINT_URL")) else None
    )
 class S3FileWorkspace(FileWorkspace):
    """A class that represents an S3 workspace."""
    _bucket: mypy_boto3_s3.service_resource.Bucket
    def __init__(self, config: S3FileWorkspaceConfiguration):
        self._bucket_name = config.bucket
        self._root = config.root
        # https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html
        self._s3 = boto3.resource(
            "s3",
            endpoint_url=config.s3_endpoint_url.get_secret_value()
            if config.s3_endpoint_url
            else None,
        )
        super().__init__()
    @property
    def root(self) -> Path:
        """The root directory of the file workspace."""
        return self._root
    @property
    def restrict_to_root(self):
        """Whether to restrict generated paths to the root."""
        return True
    def initialize(self) -> None:
        try:
            self._s3.meta.client.head_bucket(Bucket=self._bucket_name)
            self._bucket = self._s3.Bucket(self._bucket_name)
        except botocore.exceptions.ClientError as e:
            if "(404)" not in str(e):
                raise
            self._bucket = self._s3.create_bucket(Bucket=self._bucket_name)
    def get_path(self, relative_path: str | Path) -> Path:
        return super().get_path(relative_path).relative_to(Path("/"))
    def open_file(self, path: str | Path, mode: str = "r"):
        """Open a file in the workspace."""
        path = self.get_path(path)
        obj = self._bucket.Object(str(path))
        with contextlib.suppress(botocore.exceptions.ClientError):
            obj.load()
        return obj
    def read_file(self, path: str | Path, binary: bool = False) -> str | bytes:
        """Read a file in the workspace."""
        file_content = self.open_file(path, "r").get()["Body"].read()
        return file_content if binary else file_content.decode()
    async def write_file(self, path: str | Path, content: str | bytes):
        """Write to a file in the workspace."""
        obj = self.open_file(path, "w")
        obj.put(Body=content)
        if self.on_write_file:
            path = Path(path)
            if path.is_absolute():
                path = path.relative_to(self.root)
            res = self.on_write_file(path)
            if inspect.isawaitable(res):
                await res
    def list_files(self, path: str | Path = ".") -> list[Path]:
        """List all files in a directory in the workspace."""
        path = self.get_path(path)
        if path == Path("."):
            return [
                Path(obj.key)
                for obj in self._bucket.objects.all()
                if not obj.key.endswith("/")
            ]
        else:
            return [
                Path(obj.key)
                for obj in self._bucket.objects.filter(Prefix=str(path))
                if not obj.key.endswith("/")
            ]
    def delete_file(self, path: str | Path) -> None:
        """Delete a file in the workspace."""
        path = self.get_path(path)
        obj = self._s3.Object(self._bucket_name, str(path))
        obj.delete()
--- a/autogpts/autogpt/docker-compose.yml
+++ b/autogpts/autogpt/docker-compose.yml
@@ -16,3 +16,34 @@ services:
      - ./docker-compose.yml:/app/docker-compose.yml:ro
      - ./Dockerfile:/app/Dockerfile:ro
    profiles: ["exclude-from-up"]
  # Only for TESTING purposes. Run with: docker compose run --build --rm autogpt-test
  autogpt-test:
    build: ./
    env_file:
      - .env
    environment:
      S3_ENDPOINT_URL: http://minio:9000
      AWS_ACCESS_KEY_ID: minio
      AWS_SECRET_ACCESS_KEY: minio123
    entrypoint: ["poetry", "run"]
    command: ["pytest", "-v"]
    volumes:
      - ./autogpt:/app/autogpt
      - ./tests:/app/tests
    depends_on:
      - minio
    profiles: ["exclude-from-up"]
  minio:
    image: minio/minio
    environment:
      MINIO_ACCESS_KEY: minio
      MINIO_SECRET_KEY: minio123
    ports:
      - 9000:9000
    volumes:
      - minio-data:/data
    command: server /data
    profiles: ["exclude-from-up"]
 volumes:
  minio-data:
--- a/autogpts/autogpt/poetry.lock
+++ b/autogpts/autogpt/poetry.lock
--- a/autogpts/autogpt/pyproject.toml
+++ b/autogpts/autogpt/pyproject.toml
@@ -25,6 +25,7 @@ python = "^3.10"
 # autogpt-forge = { path = "../forge" }
 autogpt-forge = {git = "https://github.com/Significant-Gravitas/AutoGPT.git", subdirectory = "autogpts/forge"}
 beautifulsoup4 = "^4.12.2"
 boto3 = "^1.33.6"
 charset-normalizer = "^3.1.0"
 click = "*"
 colorama = "^0.4.6"
@@ -68,6 +69,7 @@ openapi-python-client = "^0.14.0"
 # agbenchmark = { path = "../../benchmark", optional = true }
 agbenchmark = {git = "https://github.com/Significant-Gravitas/AutoGPT.git", subdirectory = "benchmark", optional = true}
 google-cloud-logging = "^3.8.0"
 google-cloud-storage = "^2.13.0"
 [tool.poetry.extras]
 benchmark = ["agbenchmark"]
@@ -75,6 +77,7 @@ benchmark = ["agbenchmark"]
 [tool.poetry.group.dev.dependencies]
 auto-gpt-plugin-template = {git = "https://github.com/Significant-Gravitas/Auto-GPT-Plugin-Template", rev = "0.1.0"}
 black = "*"
 boto3-stubs = {extras = ["s3"], version = "^1.33.6"}
 flake8 = "*"
 gitpython = "^3.1.32"
 isort = "*"
--- a/autogpts/autogpt/tests/conftest.py
+++ b/autogpts/autogpt/tests/conftest.py
@@ -11,7 +11,11 @@ from autogpt.agents.agent import Agent, AgentConfiguration, AgentSettings
 from autogpt.app.main import _configure_openai_provider
 from autogpt.config import AIProfile, Config, ConfigBuilder
 from autogpt.core.resource.model_providers import ChatModelProvider, OpenAIProvider
-from autogpt.file_workspace import FileWorkspace
+from autogpt.file_workspace.local import (
    FileWorkspace,
    FileWorkspaceConfiguration,
    LocalFileWorkspace,
 )
 from autogpt.llm.api_manager import ApiManager
 from autogpt.logs.config import configure_logging
 from autogpt.models.command_registry import CommandRegistry
@@ -47,7 +51,7 @@ def workspace_root(agent_data_dir: Path) -> Path:
@pytest.fixture()
 def workspace(workspace_root: Path) -> FileWorkspace:
-    workspace = FileWorkspace(workspace_root, restrict_to_root=True)
+    workspace = LocalFileWorkspace(FileWorkspaceConfiguration(root=workspace_root))
    workspace.initialize()
    return workspace
--- a/autogpts/autogpt/tests/unit/test_gcs_file_workspace.py
+++ b/autogpts/autogpt/tests/unit/test_gcs_file_workspace.py
@@ -0,0 +1,108 @@
 import os
 import uuid
 from pathlib import Path
 import pytest
 import pytest_asyncio
 from google.cloud.exceptions import NotFound
 from autogpt.file_workspace.gcs import GCSFileWorkspace, GCSFileWorkspaceConfiguration
 if not os.getenv("GOOGLE_APPLICATION_CREDENTIALS"):
    pytest.skip("GOOGLE_APPLICATION_CREDENTIALS are not set", allow_module_level=True)
@pytest.fixture
 def gcs_bucket_name() -> str:
    return f"test-bucket-{str(uuid.uuid4())[:8]}"
@pytest.fixture
 def gcs_workspace_uninitialized(gcs_bucket_name: str) -> GCSFileWorkspace:
    os.environ["WORKSPACE_STORAGE_BUCKET"] = gcs_bucket_name
    ws_config = GCSFileWorkspaceConfiguration.from_env()
    workspace = GCSFileWorkspace(ws_config)
    yield workspace  # type: ignore
    del os.environ["WORKSPACE_STORAGE_BUCKET"]
 def test_initialize(
    gcs_bucket_name: str, gcs_workspace_uninitialized: GCSFileWorkspace
 ):
    gcs = gcs_workspace_uninitialized._bucket
    # test that the bucket doesn't exist yet
    with pytest.raises(NotFound):
        gcs.get_blob(gcs_bucket_name)
    gcs_workspace_uninitialized.initialize()
    # test that the bucket has been created
    gcs.get_blob(gcs_bucket_name)
 def test_workspace_bucket_name(
    gcs_workspace: GCSFileWorkspace,
    gcs_bucket_name: str,
 ):
    assert gcs_workspace._bucket.name == gcs_bucket_name
@pytest.fixture
 def gcs_workspace(gcs_workspace_uninitialized: GCSFileWorkspace) -> GCSFileWorkspace:
    (gcs_workspace := gcs_workspace_uninitialized).initialize()
    yield gcs_workspace  # type: ignore
    # Empty & delete the test bucket
    gcs_workspace._bucket.delete_blobs(gcs_workspace._bucket.list_blobs())
    gcs_workspace._bucket.delete()
 TEST_FILES: list[tuple[str | Path, str]] = [
    ("existing_test_file_1", "test content 1"),
    ("existing_test_file_2.txt", "test content 2"),
    (Path("existing_test_file_3"), "test content 3"),
    (Path("existing/test/file/4"), "test content 4"),
 ]
@pytest_asyncio.fixture
 async def gcs_workspace_with_files(gcs_workspace: GCSFileWorkspace) -> GCSFileWorkspace:
    for file_name, file_content in TEST_FILES:
        gcs_workspace._bucket.blob(str(file_name)).upload_from_string(file_content)
    yield gcs_workspace  # type: ignore
@pytest.mark.asyncio
 async def test_read_file(gcs_workspace_with_files: GCSFileWorkspace):
    for file_name, file_content in TEST_FILES:
        content = gcs_workspace_with_files.read_file(file_name)
        assert content == file_content
    with pytest.raises(NotFound):
        gcs_workspace_with_files.read_file("non_existent_file")
 def test_list_files(gcs_workspace_with_files: GCSFileWorkspace):
    files = gcs_workspace_with_files.list_files()
    assert set(files) == set(Path(file_name) for file_name, _ in TEST_FILES)
@pytest.mark.asyncio
 async def test_write_read_file(gcs_workspace: GCSFileWorkspace):
    await gcs_workspace.write_file("test_file", "test_content")
    assert gcs_workspace.read_file("test_file") == "test_content"
@pytest.mark.asyncio
 async def test_overwrite_file(gcs_workspace_with_files: GCSFileWorkspace):
    for file_name, _ in TEST_FILES:
        await gcs_workspace_with_files.write_file(file_name, "new content")
        assert gcs_workspace_with_files.read_file(file_name) == "new content"
 def test_delete_file(gcs_workspace_with_files: GCSFileWorkspace):
    for file_to_delete, _ in TEST_FILES:
        gcs_workspace_with_files.delete_file(file_to_delete)
        with pytest.raises(NotFound):
            gcs_workspace_with_files.read_file(file_to_delete)
--- a/autogpts/autogpt/tests/unit/test_local_file_workspace.py
+++ b/autogpts/autogpt/tests/unit/test_local_file_workspace.py
@@ -1,9 +1,8 @@
 import itertools
 from pathlib import Path
 import pytest
-from autogpt.file_workspace import FileWorkspace
+from autogpt.file_workspace.local import FileWorkspaceConfiguration, LocalFileWorkspace
 _WORKSPACE_ROOT = Path("home/users/monty/auto_gpt_workspace")
@@ -31,17 +30,11 @@ _INACCESSIBLE_PATHS = (
        Path("test_folder/../../not_auto_gpt_workspace/test_file.txt"),
    ]
    + [
-        # Contains null bytes
+        # Contains null byte
-        Path(template.format(null_byte=null_byte))
+        Path("\0"),
-        for template, null_byte in itertools.product(
+        Path("\0test_file.txt"),
-            [
+        Path("test_folder/\0"),
-                "{null_byte}",
+        Path("test_folder/\0test_file.txt"),
                "{null_byte}test_file.txt",
                "test_folder/{null_byte}",
                "test_folder/{null_byte}test_file.txt",
            ],
            FileWorkspace.NULL_BYTES,
        )
    ]
    + [
        # Absolute paths
@@ -68,7 +61,7 @@ def inaccessible_path(request):
 def test_sanitize_path_accessible(accessible_path, workspace_root):
-    full_path = FileWorkspace._sanitize_path(
+    full_path = LocalFileWorkspace._sanitize_path(
        accessible_path,
        root=workspace_root,
        restrict_to_root=True,
@@ -79,7 +72,7 @@ def test_sanitize_path_accessible(accessible_path, workspace_root):
 def test_sanitize_path_inaccessible(inaccessible_path, workspace_root):
    with pytest.raises(ValueError):
-        FileWorkspace._sanitize_path(
+        LocalFileWorkspace._sanitize_path(
            inaccessible_path,
            root=workspace_root,
            restrict_to_root=True,
@@ -87,13 +80,13 @@ def test_sanitize_path_inaccessible(inaccessible_path, workspace_root):
 def test_get_path_accessible(accessible_path, workspace_root):
-    workspace = FileWorkspace(workspace_root, True)
+    workspace = LocalFileWorkspace(FileWorkspaceConfiguration(root=workspace_root))
    full_path = workspace.get_path(accessible_path)
    assert full_path.is_absolute()
    assert full_path.is_relative_to(workspace_root)
 def test_get_path_inaccessible(inaccessible_path, workspace_root):
-    workspace = FileWorkspace(workspace_root, True)
+    workspace = LocalFileWorkspace(FileWorkspaceConfiguration(root=workspace_root))
    with pytest.raises(ValueError):
        workspace.get_path(inaccessible_path)
--- a/autogpts/autogpt/tests/unit/test_s3_file_workspace.py
+++ b/autogpts/autogpt/tests/unit/test_s3_file_workspace.py
@@ -0,0 +1,106 @@
 import os
 import uuid
 from pathlib import Path
 import pytest
 import pytest_asyncio
 from botocore.exceptions import ClientError
 from autogpt.file_workspace.s3 import S3FileWorkspace, S3FileWorkspaceConfiguration
 if not os.getenv("S3_ENDPOINT_URL") and not os.getenv("AWS_ACCESS_KEY_ID"):
    pytest.skip("S3 environment variables are not set", allow_module_level=True)
@pytest.fixture
 def s3_bucket_name() -> str:
    return f"test-bucket-{str(uuid.uuid4())[:8]}"
@pytest.fixture
 def s3_workspace_uninitialized(s3_bucket_name: str) -> S3FileWorkspace:
    os.environ["WORKSPACE_STORAGE_BUCKET"] = s3_bucket_name
    ws_config = S3FileWorkspaceConfiguration.from_env()
    workspace = S3FileWorkspace(ws_config)
    yield workspace  # type: ignore
    del os.environ["WORKSPACE_STORAGE_BUCKET"]
 def test_initialize(s3_bucket_name: str, s3_workspace_uninitialized: S3FileWorkspace):
    s3 = s3_workspace_uninitialized._s3
    # test that the bucket doesn't exist yet
    with pytest.raises(ClientError):
        s3.meta.client.head_bucket(Bucket=s3_bucket_name)
    s3_workspace_uninitialized.initialize()
    # test that the bucket has been created
    s3.meta.client.head_bucket(Bucket=s3_bucket_name)
 def test_workspace_bucket_name(
    s3_workspace: S3FileWorkspace,
    s3_bucket_name: str,
 ):
    assert s3_workspace._bucket.name == s3_bucket_name
@pytest.fixture
 def s3_workspace(s3_workspace_uninitialized: S3FileWorkspace) -> S3FileWorkspace:
    (s3_workspace := s3_workspace_uninitialized).initialize()
    yield s3_workspace  # type: ignore
    # Empty & delete the test bucket
    s3_workspace._bucket.objects.all().delete()
    s3_workspace._bucket.delete()
 TEST_FILES: list[tuple[str | Path, str]] = [
    ("existing_test_file_1", "test content 1"),
    ("existing_test_file_2.txt", "test content 2"),
    (Path("existing_test_file_3"), "test content 3"),
    (Path("existing/test/file/4"), "test content 4"),
 ]
@pytest_asyncio.fixture
 async def s3_workspace_with_files(s3_workspace: S3FileWorkspace) -> S3FileWorkspace:
    for file_name, file_content in TEST_FILES:
        s3_workspace._bucket.Object(str(file_name)).put(Body=file_content)
    yield s3_workspace  # type: ignore
@pytest.mark.asyncio
 async def test_read_file(s3_workspace_with_files: S3FileWorkspace):
    for file_name, file_content in TEST_FILES:
        content = s3_workspace_with_files.read_file(file_name)
        assert content == file_content
    with pytest.raises(ClientError):
        s3_workspace_with_files.read_file("non_existent_file")
 def test_list_files(s3_workspace_with_files: S3FileWorkspace):
    files = s3_workspace_with_files.list_files()
    assert set(files) == set(Path(file_name) for file_name, _ in TEST_FILES)
@pytest.mark.asyncio
 async def test_write_read_file(s3_workspace: S3FileWorkspace):
    await s3_workspace.write_file("test_file", "test_content")
    assert s3_workspace.read_file("test_file") == "test_content"
@pytest.mark.asyncio
 async def test_overwrite_file(s3_workspace_with_files: S3FileWorkspace):
    for file_name, _ in TEST_FILES:
        await s3_workspace_with_files.write_file(file_name, "new content")
        assert s3_workspace_with_files.read_file(file_name) == "new content"
 def test_delete_file(s3_workspace_with_files: S3FileWorkspace):
    for file_to_delete, _ in TEST_FILES:
        s3_workspace_with_files.delete_file(file_to_delete)
        with pytest.raises(ClientError):
            s3_workspace_with_files.read_file(file_to_delete)