Merge branch 'main' of https://github.com/jina-ai/gptdeploy into feat-gpt-turbo

# Conflicts:
#	src/executor_factory.py
#	src/gpt.py
#	src/prompt_system.py
This commit is contained in:
Joschka Braun
2023-04-14 11:38:51 +02:00
17 changed files with 220 additions and 190 deletions

5
.gitignore vendored
View File

@@ -1,3 +1,6 @@
/executor_level2/
.env
.env
config.yml
executor
data

192
README.md
View File

@@ -27,8 +27,12 @@ Your imagination is the limit!
<a href="https://github.com/tiangolo/gptdeploy/actions?query=workflow%3ATest+event%3Apush+branch%3Amaster" target="_blank">
<img src="https://img.shields.io/badge/platform-mac%20%7C%20linux%20%7C%20windows-blue" alt="Supported platforms">
</a>
<a href="https://discord.gg/ESn8ED6Fyn" target="_blank">
<img src="https://img.shields.io/badge/chat_on-Discord-7289DA?logo=discord&logoColor=white" alt="Discord Chat">
</a>
[//]: # ([![Watch the video]&#40;https://i.imgur.com/vKb2F1B.png&#41;]&#40;https://user-images.githubusercontent.com/11627845/226220484-17810f7c-b184-4a03-9af2-3a977fbb014b.mov&#41;)
[![Watch the video](res/thumbnail.png)](https://user-images.githubusercontent.com/11627845/231530421-272a66aa-4260-4e17-ab7a-ba66adca754c.mp4)
</p>
This project streamlines the creation and deployment of microservices.
@@ -36,6 +40,10 @@ Simply describe your task using natural language, and the system will automatica
To ensure the microservice accurately aligns with your intended task a test scenario is required.
## Quickstart
### Requirements
- OpenAI key with access to GPT-4
- Create an account at [cloud.jina.ai](https://cloud.jina.ai) where your microservice will be deployed
### Installation
```bash
pip install gptdeploy
@@ -47,7 +55,7 @@ We are working on a way to use gpt-3.5-turbo as well.
### Create Microservice
```bash
gptdeploy create --description "Given a PDF, return it's text" --test "https://www.africau.edu/images/default/sample.pdf"
gptdeploy create --description "Given a PDF, return its text" --test "https://www.africau.edu/images/default/sample.pdf"
```
To create your personal microservice two things are required:
- A `description` of the task you want to accomplish.
@@ -58,6 +66,7 @@ During this time, GPT iteratively builds your microservice until it finds a stra
Once the microservice is created and deployed, you can test it using the generated Streamlit playground.
The deployment is made on the Jina`s infrastructure.
When creating a Jina account, you get some free credits, which you can use to deploy your microservice ($0.025/hour).
Be aware that the costs you have to pay for openai vary between $0.50 and $3.00 per microservice.
If you run out of credits, you can purchase more.
### Delete Microservice
@@ -67,62 +76,19 @@ jc list # get the microservice id
jc delete <microservice id>
```
## Overview
The graphic below illustrates the process of creating a microservice and deploying it to the cloud elaboration two different implementation strategies.
```mermaid
graph TB
description[description: generate QR code from URL] --> make_strat{think a}
test[test: https://www.example.com] --> make_strat[generate strategies]
make_strat --> implement1[implement strategy 1]
implement1 --> build1{build image}
build1 -->|error message| implement1
build1 -->|failed 10 times| implement2[implement strategy 2]
build1 -->|success| registry[push docker image to registry]
implement2 --> build2{build image}
build2 -->|error message| implement2
build2 -->|failed 10 times| all_failed[all strategies failed]
build2 -->|success| registry[push docker image to registry]
registry --> deploy[deploy microservice]
deploy --> streamlit[create streamlit playground]
streamlit --> user_run[user tests microservice]
```
1. GPT Deploy identifies several strategies to implement your task.
2. It tests each strategy until it finds one that works.
3. For each strategy, it creates the following files:
- executor.py: This is the main implementation of the microservice.
- test_executor.py: These are test cases to ensure the microservice works as expected.
- requirements.txt: This file lists the packages needed by the microservice and its tests.
- Dockerfile: This file is used to run the microservice in a container and also runs the tests when building the image.
4. GPT Deploy attempts to build the image. If the build fails, it uses the error message to apply a fix and tries again to build the image.
5. Once it finds a successful strategy, it:
- Pushes the Docker image to the registry.
- Deploys the microservice.
- Creates a Streamlit playground where you can test the microservice.
6. If it fails 10 times in a row, it moves on to the next approach.
## Examples
<img src="res/teaser.png" alt="QR Code Generator" width="600" />
### Animal Detector
```bash
gptdeploy create --description "Given an image, return the image with bounding boxes of all animals (https://pjreddie.com/media/files/yolov3.weights, https://raw.githubusercontent.com/pjreddie/darknet/master/cfg/yolov3.cfg)" --test "https://images.unsplash.com/photo-1444212477490-ca407925329e contains animals"
```
<img src="res/animal_detector_example.png" alt="Animal Detector" width="600" />
### Meme Generator
```bash
gptdeploy create --description "Generate a meme from an image and a caption" --test "Surprised Pikachu: https://media.wired.com/photos/5f87340d114b38fa1f8339f9/master/w_1600%2Cc_limit/Ideas_Surprised_Pikachu_HD.jpg, TOP:When you discovered GPTDeploy"
@@ -150,7 +116,7 @@ gptdeploy create --description "Given a 3d object, return vertex count and face
### Table extraction
```bash
--description "Given a URL, extract all tables as csv" --test "http://www.ins.tn/statistiques/90"
gptdeploy create --description "Given a URL, extract all tables as csv" --test "http://www.ins.tn/statistiques/90"
```
<img src="res/table_extraction_example.png" alt="Table Extraction" width="600" />
@@ -183,22 +149,22 @@ gptdeploy create --description "Generate QR code from URL" --test "https://www.e
```
<img src="res/qr_example.png" alt="QR Code Generator" width="600" />
### Mandelbrot Set Visualizer
```bash
gptdeploy create --description "Visualize the Mandelbrot set with custom parameters" --test "center=-0+1i, zoom=1.0, size=800x800, iterations=1000"
```
<img src="res/mandelbrot_example.png" alt="Mandelbrot Set Visualizer" width="600" />
[//]: # (## TO BE TESTED)
[//]: # (### ASCII Art Generator)
[//]: # (```bash)
[//]: # (gptdeploy create --description "Convert image to ASCII art" --test "https://images.unsplash.com/photo-1602738328654-51ab2ae6c4ff")
[//]: # (```)
[//]: # (### Password Strength Checker)
[//]: # (```bash)
[//]: # (gptdeploy create --description "Given a password, return a score from 1 to 10 indicating the strength of the password" --test "Pa$$w0rd")
[//]: # (gptdeploy create --description "Given a password, return a score from 1 to 10 indicating the strength of the password" --test "Pa$$w0rd => 1/5, !Akfdh%.ytRadf => 5/5")
[//]: # (```)
@@ -344,15 +310,16 @@ gptdeploy create --description "Generate QR code from URL" --test "https://www.e
[//]: # ()
[//]: # ()
[//]: # (### Mandelbrot Set Visualizer)
[//]: # (### Sound Visualizer)
[//]: # ()
[//]: # (```bash)
[//]: # (gptdeploy create --description "Visualize the Mandelbrot set with custom parameters" --test "center=-0.5+0i, zoom=1.0, size=800x800, iterations=1000")
[//]: # (gptdeploy create --description "Visualize a sound file and output the visualization as video combined with the sound" --test "some/mp3/file.mp3")
[//]: # (```)
[//]: # (## Upcoming Challenges)
[//]: # (### Chemical Structure Drawing)
@@ -381,14 +348,69 @@ gptdeploy create --description "Generate QR code from URL" --test "https://www.e
[//]: # (```)
[//]: # ()
[//]: # (### Bounding Box)
[//]: # (### ASCII Art Generator)
[//]: # (```bash)
[//]: # (gptdeploy create --description "Given an image, return the bounding boxes of all animals" --test "...")
[//]: # (gptdeploy create --description "Convert image to ASCII art" --test "https://images.unsplash.com/photo-1602738328654-51ab2ae6c4ff")
[//]: # (```)
## Technical Insights
The graphic below illustrates the process of creating a microservice and deploying it to the cloud elaboration two different implementation strategies.
```mermaid
graph TB
description[description: generate QR code from URL] --> make_strat{think a}
test[test: https://www.example.com] --> make_strat[generate strategies]
make_strat --> implement1[implement strategy 1]
implement1 --> build1{build image}
build1 -->|error message| implement1
build1 -->|failed 10 times| implement2[implement strategy 2]
build1 -->|success| registry[push docker image to registry]
implement2 --> build2{build image}
build2 -->|error message| implement2
build2 -->|failed 10 times| all_failed[all strategies failed]
build2 -->|success| registry[push docker image to registry]
registry --> deploy[deploy microservice]
deploy --> streamlit[create streamlit playground]
streamlit --> user_run[user tests microservice]
```
1. GPT Deploy identifies several strategies to implement your task.
2. It tests each strategy until it finds one that works.
3. For each strategy, it creates the following files:
- executor.py: This is the main implementation of the microservice.
- test_executor.py: These are test cases to ensure the microservice works as expected.
- requirements.txt: This file lists the packages needed by the microservice and its tests.
- Dockerfile: This file is used to run the microservice in a container and also runs the tests when building the image.
4. GPT Deploy attempts to build the image. If the build fails, it uses the error message to apply a fix and tries again to build the image.
5. Once it finds a successful strategy, it:
- Pushes the Docker image to the registry.
- Deploys the microservice.
- Creates a Streamlit playground where you can test the microservice.
6. If it fails 10 times in a row, it moves on to the next approach.
## 🔮 vision
Use natural language interface to create, deploy and update your microservice infrastructure.
@@ -396,34 +418,46 @@ Use natural language interface to create, deploy and update your microservice in
If you want to contribute to this project, feel free to open a PR or an issue.
In the following, you can find a list of things that need to be done.
Critical:
next steps:
- [ ] check if windows and linux support works
- [ ] support gpt3.5-turbo
- [ ] add video to README.md
- [ ] bug: it can happen that the code generation is hanging forever - in this case aboard and redo the generation
- [ ] new user has free credits but should be told to verify account
Nice to have:
- [ ] smooth rendering animation of the responses
- [ ] if the user runs gptdeploy without any arguments, show the help message
- [ ] don't show this message:
🔐 You are logged in to Jina AI as florian.hoenicke (username:auth0-unified-448f11965ce142b6).
To log out, use jina auth logout.
- [ ] rest endpoint instead of grpc since it is more popular
- [ ] put the playground into the custom gateway (without rebuilding the custom gateway)
- [ ] hide prompts in normal mode and show them in verbose mode
- [ ] tests
- [ ] clean up duplicate code
- [ ] support popular cloud providers - lambda, cloud run, cloud functions, ...
- [ ] support local docker builds
- [ ] autoscaling enabled for cost saving
- [ ] don't show this message:
🔐 You are logged in to Jina AI as florian.hoenicke (username:auth0-unified-448f11965ce142b6).
To log out, use jina auth logout.
- [ ] add more examples to README.md
- [ ] support multiple endpoints - example: todolist microservice with endpoints for adding, deleting, and listing todos
- [ ] support stateful microservices
- [ ] The playground is currently printed twice even if it did not change.
Make sure it is only printed twice in case it changed.
- [ ] allow to update your microservice by providing feedback
- [ ] bug: it can happen that the code generation is hanging forever - in this case aboard and redo the generation
- [ ] feat: make playground more stylish by adding attributes like: clean design, beautiful, like it was made by a professional designer, ...
- [ ] support for other large language models like ChatGLM
- [ ] support for other large language models like Open Assistent
- [ ] for cost savings, it should be possible to insert less context during the code generation of the main functionality - no jina knowledge is required
- [ ] use gptdeploy list to show all deployments
- [ ] gptdeploy delete to delete a deployment
- [ ] gptdeploy update to update a deployment
- [ ] if the user runs gptdeploy without any arguments, show the help message
- [ ] start streamlit playground automatically after the deployment
- [ ] test param optional - but how would you test the pdf extractor without a pdf?
- [ ] section for microservices built by the community
- [ ] test feedback for playground generation (could be part of the debugging)
- [ ] should we send everything via json in the text attribute for simplicity?
- [ ] fix release workflow
-
Proposal:
- [ ] just generate the non-jina related code and insert it into an executor template
- [ ] think about strategies after the first approach failed?

View File

@@ -1,4 +1,7 @@
click==8.1.3
streamlit==1.21.0
openai==0.27.4
psutil==5.9.4
click
streamlit
openai
psutil
jina
jcloud
jina-hubble-sdk

Binary file not shown.

After

Width:  |  Height:  |  Size: 746 KiB

BIN
res/discord.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB

BIN
res/mandelbrot_example.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 236 KiB

BIN
res/teaser.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 555 KiB

BIN
res/thumbnail.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 280 KiB

View File

@@ -7,7 +7,7 @@ def read_requirements():
setup(
name='gptdeploy',
version='0.18.11',
version='0.18.15',
description='Use natural language interface to create, deploy and update your microservice infrastructure.',
long_description=open('README.md', 'r', encoding='utf-8').read(),
long_description_content_type='text/markdown',

View File

@@ -1,2 +1,2 @@
__version__ = '0.18.11'
__version__ = '0.18.15'
from src.cli import main

View File

@@ -7,7 +7,7 @@ from src.constants import FILE_AND_TAG_PAIRS
from src.jina_cloud import push_executor, process_error_message
from src.prompt_tasks import general_guidelines, executor_file_task, chain_of_thought_creation, test_executor_file_task, \
chain_of_thought_optimization, requirements_file_task, docker_file_task, not_allowed
from src.utils.io import recreate_folder, persist_file
from src.utils.io import persist_file
from src.utils.string_tools import print_colored
@@ -72,10 +72,11 @@ class ExecutorFactory:
output_path,
executor_name,
package,
num_approach,
is_chain_of_thought=False,
):
EXECUTOR_FOLDER_v1 = self.get_executor_path(output_path, package, 1)
recreate_folder(EXECUTOR_FOLDER_v1)
EXECUTOR_FOLDER_v1 = self.get_executor_path(output_path, executor_name, package, num_approach, 1)
os.makedirs(EXECUTOR_FOLDER_v1)
print_colored('', '############# Executor #############', 'red')
user_query = (
@@ -175,42 +176,32 @@ Please provide the complete file with the exact same syntax to wrap the code.
playground_content = self.extract_content_from_result(playground_content_raw, 'app.py', match_single_block=True)
persist_file(playground_content, os.path.join(executor_path, 'app.py'))
def get_executor_path(self, output_path, package, version):
def get_executor_path(self, output_path, executor_name, package, num_approach, version):
package_path = '_'.join(package)
return os.path.join(output_path, package_path, f'v{version}')
return os.path.join(output_path, executor_name, f'{num_approach}_{package_path}', f'v{version}')
def debug_executor(self, output_path, package, description, test):
def debug_executor(self, output_path, executor_name, num_approach, packages, description, test):
MAX_DEBUGGING_ITERATIONS = 10
error_before = ''
# conversation = self.gpt_session.get_conversation()
for i in range(1, MAX_DEBUGGING_ITERATIONS):
print('Debugging iteration', i)
previous_executor_path = self.get_executor_path(output_path, package, i)
next_executor_path = self.get_executor_path(output_path, package, i + 1)
print('Trying to build the microservice. Might take a while...')
previous_executor_path = self.get_executor_path(output_path, executor_name, packages, num_approach, i)
next_executor_path = self.get_executor_path(output_path, executor_name, packages, num_approach, i + 1)
log_hubble = push_executor(previous_executor_path)
error = process_error_message(log_hubble)
if error:
recreate_folder(next_executor_path)
os.makedirs(next_executor_path)
file_name_to_content = self.get_all_executor_files_with_content(previous_executor_path)
is_dependency_issue = self.is_dependency_issue(error, file_name_to_content['Dockerfile'])
print(f'Current error is a dependency issue: {is_dependency_issue}')
if is_dependency_issue:
all_files_string = self.files_to_string({
key: val for key, val in file_name_to_content.items() if key in ['requirements.txt', 'Dockerfile']
})
# user_query = (
# f'I have the following files:\n{all_files_string}\n\n'
# + f'This error happens during the docker build process:\n{error}\n\n'
# + 'First, think about what kind of error is this? Look at exactly at the stack trace and then '
# "suggest how to solve it. Output the files that need change. "
# "Don't output files that don't need change. If you output a file, then write the "
# "complete file. Use the exact same syntax to wrap the code:\n"
# f"**...**\n"
# f"```...\n"
# f"...code...\n"
# f"```"
# )
user_query = (
f"Your task is to provide guidance on how to solve an error that occurred during the Docker "
f"build process. The error message is:\n{error}\nTo solve this error, you should first "
@@ -221,7 +212,6 @@ Please provide the complete file with the exact same syntax to wrap the code.
f"You are given the following files:\n\n{all_files_string}"
)
else:
# if i == 1:
all_files_string = self.files_to_string(file_name_to_content)
user_query = (
f"General rules: " + not_allowed()
@@ -229,8 +219,9 @@ Please provide the complete file with the exact same syntax to wrap the code.
+ f'\n\nHere is the test scenario the executor must pass:\n{test}'
+ f'Here are all the files I use:\n{all_files_string}'
+ f'\n\nThis error happens during the docker build process:\n{error}\n\n'
+ 'First, think about what kind of error is this? Look at exactly at the stack trace and then '
"suggest how to solve it. Output the files that need change. "
+ 'Look at exactly at the stack trace. First, think about what kind of error is this? '
'Then think about possible reasons which might have caused it. Then suggest how to '
'solve it. Output the files that need change. '
"Don't output files that don't need change. If you output a file, then write the "
"complete file. Use the exact same syntax to wrap the code:\n"
f"**...**\n"
@@ -238,9 +229,6 @@ Please provide the complete file with the exact same syntax to wrap the code.
f"...code...\n"
f"```"
)
# else:
# conversation.set_system_definition()
# user_query = f'Now this error happens during the docker build process:\n{error}'
conversation = self.gpt_session.get_conversation()
returned_files_raw = conversation.query(user_query)
@@ -251,13 +239,13 @@ Please provide the complete file with the exact same syntax to wrap the code.
for file_name, content in file_name_to_content.items():
persist_file(content, os.path.join(next_executor_path, file_name))
error_before = error_before + '\n' + error
error_before = error_before
else:
break
if i == MAX_DEBUGGING_ITERATIONS - 1:
raise self.MaxDebugTimeReachedException('Could not debug the executor.')
return self.get_executor_path(output_path, package, i)
return self.get_executor_path(output_path, executor_name, packages, num_approach, i)
class MaxDebugTimeReachedException(BaseException):
pass
@@ -329,22 +317,22 @@ package2,package3,...
generated_name = self.generate_executor_name(description)
executor_name = f'{generated_name}{random.randint(0, 1000_000)}'
packages_list = self.get_possible_packages(description, num_approaches)
recreate_folder(output_path)
for packages in packages_list:
for num_approach, packages in enumerate(packages_list):
try:
self.create_executor(description, test, output_path, executor_name, packages)
executor_path = self.debug_executor(output_path, packages, description, test)
self.create_executor(description, test, output_path, executor_name, packages, num_approach)
executor_path = self.debug_executor(output_path, executor_name, num_approach, packages, description, test)
host = jina_cloud.deploy_flow(executor_name, executor_path)
self.create_playground(executor_name, executor_path, host)
except self.MaxDebugTimeReachedException:
print('Could not debug the executor.')
print('Could not debug the Executor.')
continue
print(f'''
Executor name: {executor_name}
Executor path: {executor_path}
Host: {host}
Playground: streamlit run {os.path.join(executor_path, "app.py")}
Run the following command to start the playground:
streamlit run {os.path.join(executor_path, "app.py")}
'''
)
break

View File

@@ -19,7 +19,7 @@ class GPTSession:
self.supported_model = 'gpt-4'
self.pricing_prompt = PRICING_GPT4_PROMPT
self.pricing_generation = PRICING_GPT4_GENERATION
elif (model == 'gpt-4' and not self.is_gpt4_available()) or model == 'gpt-3.5-turbo':
else:
if model == 'gpt-4':
print_colored('GPT-4 is not available. Using GPT-3.5-turbo instead.', 'yellow')
model = 'gpt-3.5-turbo'
@@ -31,7 +31,11 @@ class GPTSession:
def get_openai_api_key(self):
if 'OPENAI_API_KEY' not in os.environ:
raise Exception('You need to set OPENAI_API_KEY in your environment')
raise Exception('''
You need to set OPENAI_API_KEY in your environment.
If you have updated it already, please restart your terminal.
'''
)
openai.api_key = os.environ['OPENAI_API_KEY']
def is_gpt4_available(self):
@@ -42,9 +46,10 @@ class GPTSession:
model="gpt-4",
messages=[{
"role": 'system',
"content": 'test'
"content": 'you respond nothing'
}]
)
break
except RateLimitError:
sleep(1)
continue
@@ -61,7 +66,7 @@ class GPTSession:
print('Estimated costs on openai.com:')
# print('money prompt:', f'${money_prompt}')
# print('money generation:', f'${money_generation}')
print('total money so far:', f'${money_prompt + money_generation}')
print('total money spent so far:', f'${money_prompt + money_generation}')
print('\n')
def get_conversation(self, system_definition_examples: List[str] = ['executor', 'docarray', 'client']):
@@ -100,7 +105,7 @@ class _GPTConversation:
delta = chunk['choices'][0]['delta']
if 'content' in delta:
content = delta['content']
print_colored('' if complete_string else 'assistent', content, 'green', end='')
print_colored('' if complete_string else 'assistant', content, 'green', end='')
complete_string += content
return complete_string

View File

@@ -10,6 +10,9 @@ import hubble
from hubble.executor.helper import upload_file, archive_package, get_request_header
from jcloud.flow import CloudFlow
from src.utils.io import suppress_stdout
from src.utils.string_tools import print_colored
def redirect_callback(href):
print(
@@ -23,6 +26,12 @@ def jina_auth_login():
try:
hubble.Client(jsonify=True).get_user_info(log_error=False)
except hubble.AuthenticationRequiredError:
print('You need login to Jina first to use GPTDeploy')
print_colored('', '''
If you just created an account, it can happen that the login callback is not working.
In this case, please cancel this run, rerun your gptdeploy command and login into your account again.
''', 'green'
)
hubble.login(prompt='login', redirect_callback=redirect_callback)
@@ -41,7 +50,9 @@ def push_executor(dir_path):
'verbose': 'True',
'md5sum': md5_digest,
}
req_header = get_request_header()
with suppress_stdout():
req_header = get_request_header()
resp = upload_file(
'https://api.hubble.jina.ai/v2/rpc/executor.push',
'filename',

View File

@@ -80,8 +80,12 @@ def set_env_variable(shell, key):
with open(config_file, "a") as file:
file.write(f"\n{export_line}\n")
click.echo(
f"✅ Success, OPENAI_API_KEY has been set in {config_file}\nPlease restart your shell to apply the changes.")
click.echo(f'''
✅ Success, OPENAI_API_KEY has been set in {config_file}.
Please restart your shell to apply the changes or run:
source {config_file}
'''
)
except FileNotFoundError:
click.echo(f"Error: {config_file} not found. Please set the environment variable manually.")
@@ -93,7 +97,12 @@ def set_api_key(key):
if system_platform == "windows":
set_env_variable_command = f'setx OPENAI_API_KEY "{key}"'
subprocess.call(set_env_variable_command, shell=True)
click.echo("✅ Success, OPENAI_API_KEY has been set.\nPlease restart your Command Prompt to apply the changes.")
click.echo('''
✅ Success, OPENAI_API_KEY has been set.
Please restart your Command Prompt to apply the changes.
'''
)
elif system_platform in ["linux", "darwin"]:
if "OPENAI_API_KEY" in os.environ or is_key_set_in_config_file(key):
if not click.confirm("OPENAI_API_KEY is already set. Do you want to overwrite it?"):

View File

@@ -7,83 +7,52 @@ Here is an example of how an executor can be defined. It always starts with a co
```python
# this executor binary files as input and returns the length of each binary file as output
from jina import Executor, requests, DocumentArray, Document
import json
class MyInfoExecutor(Executor):
def __init__(self, **kwargs):
super().__init__()
@requests() # each executor must have exactly this decorator without parameters
@requests() # each Executor must have exactly this decorator without parameters
def foo(self, docs: DocumentArray, **kwargs) => DocumentArray:
for d in docs:
d.load_uri_to_blob()
d.blob = None
content = json.loads(d.text)
...
d.text = json.dumps(modified_content)
return docs
```
An executor gets a DocumentArray as input and returns a DocumentArray as output.'''
An Executor gets a DocumentArray as input and returns a DocumentArray as output.
'''
docarray_example = f'''A DocumentArray is a python class that can be seen as a list of Documents.
A Document is a python class that represents a single document.
Here is the protobuf definition of a Document:
message DocumentProto {{
// A hexdigest that represents a unique document ID
string id = 1;
oneof content {{
// the raw binary content of this document, which often represents the original document when comes into jina
bytes blob = 2;
// the ndarray of the image/audio/video document
NdArrayProto tensor = 3;
// a text document
string text = 4;
}}
// a uri of the document is a remote url starts with http or https or data URI scheme
string uri = 5;
// list of the sub-documents of this document (recursive structure)
repeated DocumentProto chunks = 6;
// the matched documents on the same level (recursive structure)
repeated DocumentProto matches = 7;
// the embedding of this document
NdArrayProto embedding = 8;
// used to store json data the executor gets and returns
string text = 1;
}}
Here is an example of how a DocumentArray can be defined:
Here are examples of how a DocumentArray can be defined:
from jina import DocumentArray, Document
import json
d1 = Document(text='hello')
d1 = Document(text=json.dumps({{'he_says': 'hello'}}))
# you can load binary data into a document
url = 'https://...'
response = requests.get(url)
obj_data = response.content
d2 = Document(blob=obj_data) # blob is bytes like b'\\x89PNG\\r\\n\\x1a\\n\
base64_data = base64.b64encode(png_data).decode('utf-8')
d2 = Document(text=json.dumps({{'image': base64_data}}))
d3 = Document(tensor=numpy.array([1, 2, 3]), chunks=[Document(uri=/local/path/to/file)]
d4 = Document(
uri='https://docs.docarray.org/img/logo.png',
)
d5 = Document()
d5.tensor = np.ones((2,4))
d5.uri = 'https://audio.com/audio.mp3'
d6 = Document()
d6.blob # like b'RIFF\\x00\\x00\\x00\\x00WAVEfmt \\x10\\x00...'
docs = DocumentArray([
d1, d2, d3, d4
])
d7 = Document()
d7.text = 'test string'
d8 = Document()
d8.text = json.dumps([{{"id": "1", "text": ["hello", 'test']}}, {{"id": "2", "text": "world"}}])
# the document has a helper function load_uri_to_blob:
# For instance, d4.load_uri_to_blob() downloads the file from d4.uri and stores it in d4.blob.
# If d4.uri was something like 'https://website.web/img.jpg', then d4.blob would be something like b'\\xff\\xd8\\xff\\xe0\\x00\\x10JFIF\\x00\\x01\\x01... '''
array = numpy.array([1, 2, 3])
array_list = array.tolist()
d3 = Document(text=json.dumps(array_list))
d4 = Document()
d4.text = '{{"uri": "https://.../logo.png"}}'
'''
client_example = f'''After the executor is deployed, it can be called via Jina Client.

View File

@@ -9,7 +9,7 @@ def general_guidelines():
"Every file starts with comments describing what the code is doing before the first import. "
"Comments can only be written within code blocks. "
"Then all imports are listed. "
"It is important to import all modules that could be needed in the executor code. "
"It is important to import all modules that could be needed in the Executor code. "
"Always import: "
"from jina import Executor, DocumentArray, Document, requests "
"Start from top-level and then fully implement all methods. "
@@ -143,5 +143,5 @@ The executor must not access external apis except unless it is explicitly mentio
The executor must not load data from the local file system unless it was created by the executor itself.
The executor must not use a pre-trained model unless it is explicitly mentioned in the description.
The executor must not train a model.
The executor must not use Document.tags.
The executor must not use any attribute of Document accept Document.text.
'''

View File

@@ -3,14 +3,12 @@ import shutil
import concurrent.futures
import concurrent.futures
from typing import Generator
import sys
from contextlib import contextmanager
def recreate_folder(folder_path):
if os.path.exists(folder_path) and os.path.isdir(folder_path):
shutil.rmtree(folder_path)
os.makedirs(folder_path)
def persist_file(file_content, file_name):
with open(f'{file_name}', 'w') as f:
def persist_file(file_content, file_path):
with open(file_path, 'w') as f:
f.write(file_content)
@@ -34,4 +32,14 @@ def timeout_generator_wrapper(generator, timeout):
except concurrent.futures.TimeoutError:
raise GenerationTimeoutError(f"Generation took longer than {timeout} seconds")
return wrapper()
return wrapper()
@contextmanager
def suppress_stdout():
original_stdout = sys.stdout
sys.stdout = open(os.devnull, 'w')
try:
yield
finally:
sys.stdout.close()
sys.stdout = original_stdout