mirror of
https://github.com/aljazceru/gpt-engineer.git
synced 2026-02-20 20:45:52 +01:00
Update roadmap
This commit is contained in:
32
ROADMAP.md
32
ROADMAP.md
@@ -1,16 +1,28 @@
|
||||
# Capability improvement roadmap
|
||||
- [ ] Continuous capability measurements
|
||||
- [ ] Create a step that asks “did it run/work/perfect” in the end [#240](https://github.com/AntonOsika/gpt-engineer/issues/240)
|
||||
- [ ] Run the benchmark repeatedly and document the results for the different "step configs" (`STEPS` in `steps.py`) [#239](https://github.com/AntonOsika/gpt-engineer/issues/239)
|
||||
- [ ] Document the best performing configs, and feed this into our roadmap
|
||||
- [ ] Collect a dataset for gpt engineer to learn from, by storing code generation runs, and if they fail/succeed on an opt out basis
|
||||
# Roadmap
|
||||
|
||||
We are building AGI by first creating the code generation tooling of the future.
|
||||
|
||||
There are three main milestones we think can improve gpt-engineer's capability 2x:
|
||||
- Continuous evaluation of our progress
|
||||
- Make code generation become small, verifiable steps
|
||||
- Run tests and fix errors with GPT4
|
||||
|
||||
|
||||
## Steps to achieve our roadmap
|
||||
|
||||
- [ ] Continuous evaluation of our progress
|
||||
- [ ] Create a step that asks “did it run/work/perfect” in the end of each run [#240](https://github.com/AntonOsika/gpt-engineer/issues/240)
|
||||
- [ ] Run the benchmark multiple times, and document the results for the different "step configs" (`STEPS` in `steps.py`) [#239](https://github.com/AntonOsika/gpt-engineer/issues/239)
|
||||
- [ ] Document the best performing configs, and feed these learnings into our roadmap
|
||||
- [ ] Collect a dataset for gpt engineer to learn from, by storing code generation runs, and if they fail/succeed (on an opt out basis)
|
||||
- [ ] Self healing code
|
||||
- [ ] Feed the results of failing tests back into GPT4 and ask it to fix the code
|
||||
- [ ] Let human give feedback
|
||||
- [ ] Ask human for what is not working as expected in a loop, and feed it into
|
||||
GPT4 to fix the code, until the human is happy or gives up
|
||||
- [ ] Break down the code generation into small parts
|
||||
- [ ] For each small part, generate tests for each subpart, and do the loop of running the tests for each part, feeding
|
||||
- [ ] Ask human for what is not working as expected in a loop, and feed it into GPT4 to fix the code, until the human is happy or gives up
|
||||
- [ ] Make code generation become small, verifiable steps
|
||||
- [ ] Ask GPT4 to decide how to sequence the entire generation, and do one
|
||||
prompt for each subcomponent
|
||||
- [ ] For each small part, generate tests for that subpart, and do the loop of running the tests for each part, feeding
|
||||
results into GPT4, and let it edit the code until they pass
|
||||
- [ ] LLM tests in CI
|
||||
- [ ] Run very small tests with GPT3.5 in CI, to make sure we don't worsen
|
||||
|
||||
Reference in New Issue
Block a user