Update roadmap

This commit is contained in:
Anton Osika
2023-07-13 08:54:40 +02:00
parent d91384c2aa
commit e0c3b8056f

View File

@@ -1,47 +1,55 @@
# Roadmap # Roadmap
There are three main milestones we believe will 2x gpt-engineer's reliability and capability: There are three main milestones we believe will greatly increase gpt-engineer's reliability and capability:
- Continuous evaluation of our progress - [x] Continuous evaluation of our progress 🎉
- Make code generation become small, verifiable steps - [ ] Test code and fix errors with LLMs
- Run tests and fix errors with GPT4 - [ ] Make code generation become small, verifiable steps
## Steps to achieve our roadmap ## Our current focus:
- [ ] Continuous evaluation of our progress - [x] Continuous evaluation of our progress 🎉
- [ ] Create a step that asks “did it run/work/perfect” in the end of each run [#240](https://github.com/AntonOsika/gpt-engineer/issues/240) - [x] Create a step that asks “did it run/work/perfect” in the end of each run [#240](https://github.com/AntonOsika/gpt-engineer/issues/240) 🎉
- [ ] Run the benchmark multiple times, and document the results for the different "step configs" (`STEPS` in `steps.py`) [#239](https://github.com/AntonOsika/gpt-engineer/issues/239) - [x] Collect a dataset for gpt engineer to learn from, by storing code generation runs 🎉
- [ ] Document the best performing configs, and feed these learnings into our roadmap - [ ] Run the benchmark multiple times, and document the results for the different "step configs" [#239](https://github.com/AntonOsika/gpt-engineer/issues/239)
- [ ] Collect a dataset for gpt engineer to learn from, by storing code generation runs, and if they fail/succeed (on an opt out basis) - [ ] Document the best performing configs
- [ ] Self healing code - [ ] Self healing code
- [ ] Feed the results of failing tests back into GPT4 and ask it to fix the code - [ ] Run the generated tests
- [ ] Feed the results of failing tests back into LLM and ask it to fix the code
- [ ] Let human give feedback - [ ] Let human give feedback
- [ ] Ask human for what is not working as expected in a loop, and feed it into GPT4 to fix the code, until the human is happy or gives up - [ ] Ask human for what is not working as expected in a loop, and feed it into GPT4 to fix the code, until the human is happy
- [ ] Improve existing projects
- [ ] Decide on the "flow" for the CLI commands and where the project files are created
- [ ] Add an "improve code" command
- [ ] Design how gpt-engineer becomes a platform
- [ ] Integrate Aider
## Experimental research
This is not current focus, but if you are interested in experimenting, please
create a thread in our discord share your intentions in Discord's #general, and your findings as you
go along.
- [ ] Make code generation become small, verifiable steps - [ ] Make code generation become small, verifiable steps
- [ ] Ask GPT4 to decide how to sequence the entire generation, and do one - [ ] Ask GPT4 to decide how to sequence the entire generation, and do one
prompt for each subcomponent prompt for each subcomponent
- [ ] For each small part, generate tests for that subpart, and do the loop of running the tests for each part, feeding - [ ] For each small part, generate tests for that subpart, and do the loop of running the tests for each part, feeding
results into GPT4, and let it edit the code until they pass results into GPT4, and let it edit the code until they pass
- [ ] LLM tests in CI - [ ] Ad hoc experiments
- [ ] Run very small tests with GPT3.5 in CI, to make sure we don't worsen - [ ] Try Microsoft guidance, and benchmark if this helps improve performance
performance over time - [ ] Dynamic planning: Let gpt-engineer plan which "steps" to carry out itself, depending on the
- [ ] Dynamic planning
- [ ] Let gpt-engineer plan which "steps" to carry out itself, depending on the
task, by giving it few shot example of what are usually "the right-sized steps" to carry task, by giving it few shot example of what are usually "the right-sized steps" to carry
out for other projects out for such projects
## Codebase improvements
By improving the codebase and developer ergonomics we accelerate development
acroess the board. A lot can be done, here are some examples:
- [ ] Set up automatic PR review for all PRs with e.g. Codium pr-agent
- [ ] LLM tests in CI: Run super small tests with GPT3.5 in CI, that check that simple code generation still works
# How you can help out # How you can help out
You can: You can:
- Submit your first PR to address an [issue](https://github.com/AntonOsika/gpt-engineer/issues) - Post a "design" as a google doc in our Discord and ask for feedback to address one of the items in the roadmap
- Submit PRs to address one of the items in the roadmap - Submit PRs to address one of the items in the roadmap
- Do your first review of someone else's PR/issue and propose next steps (further review, merge, close) - Do a review of someone else's PR and propose next steps (further review, merge, close)
- Sign up to help [measure the progress of gpt-engineer towards recursively coding itself](https://forms.gle/TMX68mScyxQUsE6Y9)
Volunteer work in any of these gets acknowledged. Volunteer work in any of these will get acknowledged.
### Repository ergonomics
- [ ] Set up automatic PR review for all PRs (based on AI)
### Ad hoc experiments
- [ ] Try Microsoft guidance, and benchmark if this helps improve performance