mirror of
https://github.com/aljazceru/Auto-GPT.git
synced 2026-01-07 08:14:25 +01:00
* Feature/tighten up ci pipeline (#3700) * Fix docker volume mounts (#3710) Co-authored-by: Reinier van der Leer <github@pwuts.nl> Co-authored-by: Nicholas Tindle <nick@ntindle.com> * Feature/enable intuitive logs for community challenge step 1 (#3695) * Feature/enable intuitive logs summarization (#3697) * Move task_complete command out of prompt (#3663) * feat: move task_complete command out of prompt * fix: formatting fixes * Add the shutdown command to the test agents * tests: update test vcrs --------- Co-authored-by: James Collins <collijk@uw.edu> * Allow users to Disable Commands via the .env (#3667) * Document Disabling command categories (#3669) * feat: move task_complete command out of prompt * fix: formatting fixes * feat: add command disabling * docs: document how to disable command categories * Enable denylist handling for plugins (#3688) Co-authored-by: Luke Kyohere <lkyohere@mfsafrica.com> Co-authored-by: Nicholas Tindle <nick@ntindle.com> * Fix call to `plugin.post_planning` (#3414) Co-authored-by: Nicholas Tindle <nick@ntindle.com> * create information retrieval challenge a (#3770) Co-authored-by: Richard Beales <rich@richbeales.net> * fix typos (#3798) * Update run.bat (#3783) Co-authored-by: Richard Beales <rich@richbeales.net> * Update run.sh (#3752) Co-authored-by: Richard Beales <rich@richbeales.net> * ADD: Bash block in the contributing markdown (#3701) Co-authored-by: Richard Beales <rich@richbeales.net> * BUGFIX: Selenium Driver object reference was included in the browsing results for some reason (#3642) * * there is really no need to return the reference to the Selenium driver along with the text summary and list of links. * * removing unused second return value from browse_website() * * updated cassette * * updated YAML cassette for test_browse_website * * after requirements reinstall, another update YAML cassette for test_browse_website * * another update YAML cassette for test_browse_website, only as a placholder commit to trigger re-testing due to some docker TCP timeout issue * * another update YAML cassette for test_browse_website --------- Co-authored-by: batyu <batyu@localhost> * Update CONTRIBUTING.md * Self feedback Improvement (#3680) * Improved `Self-Feedback` * minor tweak * Test: Updated `test_get_self_feedback.py` * community challenges in the wiki (#3764) * Update README.md * Update PULL_REQUEST_TEMPLATE.md Added link to wiki Contributing page * Add link to wiki Contributing page * fix * Add link to wiki page on Contributing * Implement Logging of User Input in logs/Debug Folder (#3867) * Adds USER_INPUT_FILE_NAME * Update agent.py * Update agent.py Log only if console_input is not the authorise_key * Reformatting * add information retrieval challenge to the wiki (#3876) * add code owners policy (#3981) * add code owners * added @ to codeowners * switched to team ownership * Memory Challenge C (#3908) * Memory Challenge C * Working cassettes * Doc fixes * Linting and doc fix * Updated cassette * One more cassette try --------- Co-authored-by: merwanehamadi <merwanehamadi@gmail.com> * memory challenge c inconsistent (#3985) * Improve & fix memory challenge docs. (#3989) Co-authored-by: Kaan Osmanagaoglu <kaano@questps.com.au> * Feature/centralize prompt (#3990) Co-authored-by: xiao.hu <454292663@qq.com> * Use correct reference to prompt_generator in autogpt/llm/chat.py (#4011) * fix typos (#3998) Co-authored-by: Minfeng Lu <minfenglu@Minfengs-MacBook-Pro.local> Co-authored-by: Richard Beales <rich@richbeales.net> * fix typo in the getting started docs (#3997) Co-authored-by: Richard Beales <rich@richbeales.net> * Fix path to workspace directory in setup guide (#3927) Co-authored-by: Nicholas Tindle <nick@ntindle.com> * document that docker-compose 1.29.0 is minimally required (#3963) Co-authored-by: Nicholas Tindle <nick@ntindle.com> * Integrate pytest-xdist Plugin for Parallel and Concurrent Testing (#3870) * Adds pytest-parallel dependencies * Implement pytest-parallel for faster tests * Uses pytest-xdist * Auto number of workers processes * Update ci.yml --------- Co-authored-by: Nicholas Tindle <nick@ntindle.com> * explain temperature setting in env file (#4140) Co-authored-by: Richard Beales <rich@richbeales.net> * Catch JSON error in summary_memory.py (#3996) Co-authored-by: k-boikov <64261260+k-boikov@users.noreply.github.com> * Update duckduckgo dependency - min should be 2.9.5 (#4142) Co-authored-by: k-boikov <64261260+k-boikov@users.noreply.github.com> * Update Dockerfile - add missing scripts and plugins directories. (#3706) Co-authored-by: k-boikov <64261260+k-boikov@users.noreply.github.com> * Updated memory setup links (#3829) Co-authored-by: k-boikov <64261260+k-boikov@users.noreply.github.com> * Parse package versions so upgrades can be forced (#4149) * parse package versions so upgrades can be forced * better version from @collijk * fix typo in autopgt/agent/agent.py (#3747) Co-authored-by: merwanehamadi <merwanehamadi@gmail.com> Co-authored-by: Richard Beales <rich@richbeales.net> Co-authored-by: k-boikov <64261260+k-boikov@users.noreply.github.com> * Fix `milvus_memory_test.py` mock `Config` (#3424) Co-authored-by: k-boikov <64261260+k-boikov@users.noreply.github.com> * Implemented showing the number of preauthorised commands left. #1035 (#3322) Co-authored-by: mayubi <marwand@ayubi-it.de> Co-authored-by: Nicholas Tindle <nick@ntindle.com> Co-authored-by: k-boikov <64261260+k-boikov@users.noreply.github.com> * Challenge: Kubernetes and documentation (#4121) * challenge_kubes_and_readme * docs * testing * black and isort * revision * lint * comments * blackisort * docs * docs * deleting_cassette * suggestions * misspelling_errors --------- Co-authored-by: merwanehamadi <merwanehamadi@gmail.com> * Make sdwebui tests pass (when SD is running) (#3721) Co-authored-by: Nicholas Tindle <nick@ntindle.com> * Add Edge browser support using EdgeChromiumDriverManager (#3058) Co-authored-by: Nicholas Tindle <nick@ntindle.com> Co-authored-by: k-boikov <64261260+k-boikov@users.noreply.github.com> * Added --install-plugin-deps to Docker (#4151) Co-authored-by: Nicholas Tindle <nick@ntindle.com> * Feature/basic proxy (#4164) * basic proxy (#54) * basic proxy (#55) * basic proxy * basic proxy * basic proxy * basic proxy * add back double quotes * add more specific files * write file * basic proxy * Put back double quotes * test new CI (#4168) * test new CI * test new CI * remove double quotes * Feature/test new ci pipeline 2 (#4169) * test new CI * remove double quotes * make it a variable * make it a variable * Test New CI Pipeline (#4170) * introduce dummy prompt change * introduce dummy prompt change * empty commit * empty commit * empty commit * push to origin repo * add s to quote * Feature/fix rate limiting issue Step 1 (#4173) * temporarilly remove 3.11 * add back 3.11 (#4185) * Revert "Put back 3.11 until it's removed as a requirement" (#4191) --------- Co-authored-by: Reinier van der Leer <github@pwuts.nl> Co-authored-by: merwanehamadi <merwanehamadi@gmail.com> Co-authored-by: Peter Petermann <ppetermann80@googlemail.com> Co-authored-by: Nicholas Tindle <nick@ntindle.com> Co-authored-by: James Collins <collijk@uw.edu> Co-authored-by: Luke K <2609441+pr-0f3t@users.noreply.github.com> Co-authored-by: Luke Kyohere <lkyohere@mfsafrica.com> Co-authored-by: Robin Richtsfeld <robin.richtsfeld@gmail.com> Co-authored-by: RainRat <rainrat78@yahoo.ca> Co-authored-by: itsmarble <130370814+itsmarble@users.noreply.github.com> Co-authored-by: Ambuj Pawar <pawar.ambuj@gmail.com> Co-authored-by: bszollosinagy <4211175+bszollosinagy@users.noreply.github.com> Co-authored-by: batyu <batyu@localhost> Co-authored-by: Pi <sunfish7@gmail.com> Co-authored-by: AbTrax <45964236+AbTrax@users.noreply.github.com> Co-authored-by: Andres Caicedo <73312784+AndresCdo@users.noreply.github.com> Co-authored-by: Douglas Schonholtz <15002691+dschonholtz@users.noreply.github.com> Co-authored-by: Kaan <kaanixir@gmail.com> Co-authored-by: Kaan Osmanagaoglu <kaano@questps.com.au> Co-authored-by: xiao.hu <454292663@qq.com> Co-authored-by: Tomasz Kasperczyk <tomaszikasperczyk@gmail.com> Co-authored-by: minfeng-ai <42948406+minfenglu@users.noreply.github.com> Co-authored-by: Minfeng Lu <minfenglu@Minfengs-MacBook-Pro.local> Co-authored-by: Shlomi <81581678+jit-shlomi@users.noreply.github.com> Co-authored-by: Itai Steinherz <itaisteinherz@gmail.com> Co-authored-by: Boostrix <119627414+Boostrix@users.noreply.github.com> Co-authored-by: Kristian Jackson <kristian.jackson@gmail.com> Co-authored-by: k-boikov <64261260+k-boikov@users.noreply.github.com> Co-authored-by: Eduardo Salinas <edus@microsoft.com> Co-authored-by: prom3theu5 <dave@simcube.co.uk> Co-authored-by: dominic-ks <contact@bedevious.co.uk> Co-authored-by: andrey13771 <51243350+andrey13771@users.noreply.github.com> Co-authored-by: Marwand Ayubi <98717667+xhypeDE@users.noreply.github.com> Co-authored-by: mayubi <marwand@ayubi-it.de> Co-authored-by: Media <12145726+rihp@users.noreply.github.com> Co-authored-by: Cenny <cwenner@gmail.com> Co-authored-by: Abdelkarim Habouch <37211852+karimhabush@users.noreply.github.com>
171 lines
5.1 KiB
Python
171 lines
5.1 KiB
Python
"""Text processing functions"""
|
|
from typing import Dict, Generator, Optional
|
|
|
|
import spacy
|
|
from selenium.webdriver.remote.webdriver import WebDriver
|
|
|
|
from autogpt.config import Config
|
|
from autogpt.llm import count_message_tokens, create_chat_completion
|
|
from autogpt.logs import logger
|
|
from autogpt.memory import get_memory
|
|
|
|
CFG = Config()
|
|
|
|
|
|
def split_text(
|
|
text: str,
|
|
max_length: int = CFG.browse_chunk_max_length,
|
|
model: str = CFG.fast_llm_model,
|
|
question: str = "",
|
|
) -> Generator[str, None, None]:
|
|
"""Split text into chunks of a maximum length
|
|
|
|
Args:
|
|
text (str): The text to split
|
|
max_length (int, optional): The maximum length of each chunk. Defaults to 8192.
|
|
|
|
Yields:
|
|
str: The next chunk of text
|
|
|
|
Raises:
|
|
ValueError: If the text is longer than the maximum length
|
|
"""
|
|
flattened_paragraphs = " ".join(text.split("\n"))
|
|
nlp = spacy.load(CFG.browse_spacy_language_model)
|
|
nlp.add_pipe("sentencizer")
|
|
doc = nlp(flattened_paragraphs)
|
|
sentences = [sent.text.strip() for sent in doc.sents]
|
|
|
|
current_chunk = []
|
|
|
|
for sentence in sentences:
|
|
message_with_additional_sentence = [
|
|
create_message(" ".join(current_chunk) + " " + sentence, question)
|
|
]
|
|
|
|
expected_token_usage = (
|
|
count_message_tokens(messages=message_with_additional_sentence, model=model)
|
|
+ 1
|
|
)
|
|
if expected_token_usage <= max_length:
|
|
current_chunk.append(sentence)
|
|
else:
|
|
yield " ".join(current_chunk)
|
|
current_chunk = [sentence]
|
|
message_this_sentence_only = [
|
|
create_message(" ".join(current_chunk), question)
|
|
]
|
|
expected_token_usage = (
|
|
count_message_tokens(messages=message_this_sentence_only, model=model)
|
|
+ 1
|
|
)
|
|
if expected_token_usage > max_length:
|
|
raise ValueError(
|
|
f"Sentence is too long in webpage: {expected_token_usage} tokens."
|
|
)
|
|
|
|
if current_chunk:
|
|
yield " ".join(current_chunk)
|
|
|
|
|
|
def summarize_text(
|
|
url: str, text: str, question: str, driver: Optional[WebDriver] = None
|
|
) -> str:
|
|
"""Summarize text using the OpenAI API
|
|
|
|
Args:
|
|
url (str): The url of the text
|
|
text (str): The text to summarize
|
|
question (str): The question to ask the model
|
|
driver (WebDriver): The webdriver to use to scroll the page
|
|
|
|
Returns:
|
|
str: The summary of the text
|
|
"""
|
|
if not text:
|
|
return "Error: No text to summarize"
|
|
|
|
model = CFG.fast_llm_model
|
|
text_length = len(text)
|
|
logger.info(f"Text length: {text_length} characters")
|
|
|
|
summaries = []
|
|
chunks = list(
|
|
split_text(
|
|
text, max_length=CFG.browse_chunk_max_length, model=model, question=question
|
|
),
|
|
)
|
|
scroll_ratio = 1 / len(chunks)
|
|
|
|
for i, chunk in enumerate(chunks):
|
|
if driver:
|
|
scroll_to_percentage(driver, scroll_ratio * i)
|
|
logger.info(f"Adding chunk {i + 1} / {len(chunks)} to memory")
|
|
|
|
memory_to_add = f"Source: {url}\n" f"Raw content part#{i + 1}: {chunk}"
|
|
|
|
memory = get_memory(CFG)
|
|
memory.add(memory_to_add)
|
|
|
|
messages = [create_message(chunk, question)]
|
|
tokens_for_chunk = count_message_tokens(messages, model)
|
|
logger.info(
|
|
f"Summarizing chunk {i + 1} / {len(chunks)} of length {len(chunk)} characters, or {tokens_for_chunk} tokens"
|
|
)
|
|
|
|
summary = create_chat_completion(
|
|
model=model,
|
|
messages=messages,
|
|
)
|
|
summaries.append(summary)
|
|
logger.info(
|
|
f"Added chunk {i + 1} summary to memory, of length {len(summary)} characters"
|
|
)
|
|
|
|
memory_to_add = f"Source: {url}\n" f"Content summary part#{i + 1}: {summary}"
|
|
|
|
memory.add(memory_to_add)
|
|
|
|
logger.info(f"Summarized {len(chunks)} chunks.")
|
|
|
|
combined_summary = "\n".join(summaries)
|
|
messages = [create_message(combined_summary, question)]
|
|
|
|
return create_chat_completion(
|
|
model=model,
|
|
messages=messages,
|
|
)
|
|
|
|
|
|
def scroll_to_percentage(driver: WebDriver, ratio: float) -> None:
|
|
"""Scroll to a percentage of the page
|
|
|
|
Args:
|
|
driver (WebDriver): The webdriver to use
|
|
ratio (float): The percentage to scroll to
|
|
|
|
Raises:
|
|
ValueError: If the ratio is not between 0 and 1
|
|
"""
|
|
if ratio < 0 or ratio > 1:
|
|
raise ValueError("Percentage should be between 0 and 1")
|
|
driver.execute_script(f"window.scrollTo(0, document.body.scrollHeight * {ratio});")
|
|
|
|
|
|
def create_message(chunk: str, question: str) -> Dict[str, str]:
|
|
"""Create a message for the chat completion
|
|
|
|
Args:
|
|
chunk (str): The chunk of text to summarize
|
|
question (str): The question to answer
|
|
|
|
Returns:
|
|
Dict[str, str]: The message to send to the chat completion
|
|
"""
|
|
return {
|
|
"role": "user",
|
|
"content": f'"""{chunk}""" Using the above text, answer the following'
|
|
f' question: "{question}" -- if the question cannot be answered using the text,'
|
|
" summarize the text.",
|
|
}
|