Files
gpt-engineer/benchmark/RESULTS.md
Philipp Kohler e53d6f467f fix typos
2023-06-23 10:17:52 +00:00

2.5 KiB

Benchmarks

python scripts/benchmark.py

2023-06-21

Benchmark Ran Works Perfect
currency_converter
image_resizer
pomodoro_timer
url_shortener
file_explorer
markdown_editor
timer_app
weather_app
file_organizer
password_generator
todo_list

Notes on the errors

Most errors come from that the "generate entrypoint" are incorrect. Ignoring those, we get 8/11 fully correct.

All errors are very easy to fix.

One error was trying to modify a constant. One error was that the html template was not fully filled in. One error is that a dependency was used incorrectly and easy to fix

2023-06-19

Benchmark Ran Works Perfect
currency_converter
image_resizer
pomodoro_timer
url_shortener
file_explorer
markdown_editor
timer_app
weather_app
file_organizer
password_generator
todo_list

Notes on the errors

timer_app almost works with unit tests config

  • failure mode: undefined import/conflicting names

file_explorer works

file organiser: works

image_resizer almost works with unit tests config

  • failure mode: undefined import

todo_list runs. doesn't really work with unit tests config Uncaught ReferenceError: module is not defined

  • failure mode: placeholder text

url_shortener starts but gets the error: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 8636125824 and this is thread id 13021003776.

markdown_editor: failing tests, 'WebDriver' object has no attribute 'find_element_by_id'

pomodoro: doesn't run it only tests

currency_converter: backend doesnt return anything

weather_app only runs test, no code existed