{"nodes":[{"color":"#d62728","data":{"name":"TestAdaptSimpleTypoWithGuidance","category":["adaptability"],"task":"1- Run code.py.\n2- Read test.py.\n3- Modify test.py.\nRepeat step 1, 2 and 3 until code.py runs without errors.\n","dependencies":["TestDebugSimpleTypoWithGuidance"],"cutoff":75,"ground":{"answer":"Same as the original DebugWithGuidance test - [0, 1] [2, 5] [0, 3]","should_contain":["[0, 1]","[2, 5]","[0, 3]"],"should_not_contain":[],"files":["test.py"],"eval":{"type":"python"}},"info":{"difficulty":"intermediate","description":"A version of the debug challenge where the wrong files to test and change are given. The llm must figure the challenges out","side_effects":[]}},"id":"agbenchmark/generate_test.py::TestAdaptSimpleTypoWithGuidance::test_method[challenge_data0]","label":"AdaptSimpleTypoWithGuidance","shape":"dot"},{"color":"#d62728","data":{"name":"TestAdaptTeslaRevenue","category":["adaptability"],"task":"I'm a financial planner, please help me write tesla's r in 2022 into a .txt file.","cutoff":60,"dependencies":["TestRevenueRetrieval_1.0"],"ground":{"answer":"It was $81.462 billion in 2022.","should_contain":["81"],"should_not_contain":[],"files":[".txt"],"eval":{"type":"file"}},"info":{"difficulty":"intermediate","description":"Given context adapt to the missing of the word revenue","side_effects":[]}},"id":"agbenchmark/generate_test.py::TestAdaptTeslaRevenue::test_method[challenge_data0]","label":"AdaptTeslaRevenue","shape":"dot"},{"color":"#d62728","data":{"name":"TestAdaptLink","category":["adaptability"],"task":"Write the price of the book in this url 'books.toscrape.com/catalogue/meditations_33/index.html' to a .txt file.","dependencies":["TestBasicRetrieval"],"cutoff":60,"ground":{"answer":"£25.89","should_contain":["25.89"],"should_not_contain":[],"files":[".txt"],"eval":{"type":"file"}},"info":{"difficulty":"novice","description":"An advanced version of this -> remove.html as well. Same as TestBasicRetrieval but link is slightly broken, supposed to be http:// at the start.","side_effects":[]}},"id":"agbenchmark/generate_test.py::TestAdaptLink::test_method[challenge_data0]","label":"AdaptLink","shape":"dot"},{"color":"#1f77b4","data":{"name":"TestReturnCode_Tests","category":["code","iterate"],"task":"First, modify test.py to fill in the test case to be able to test the code in code.py. Next, modify the multiply_int function in code.py to be able to pass in a 'multiplier' argument to multiply the 'num' by 'multiplier'. Both arguments are integers. You can make sure you have correctly done this by running test.py that you previously modified.","dependencies":["TestReturnCode_Modify"],"cutoff":120,"ground":{"answer":"Just a simple multiple by 2 function. Num is 4 so answer is 8","should_contain":["8","49","-12"],"should_not_contain":[],"files":["test.py"],"eval":{"type":"python"}},"info":{"difficulty":"advanced","description":"Small step up, just writing the function with a name as well as the return statement.","side_effects":[]}},"id":"agbenchmark/generate_test.py::TestReturnCode_Tests::test_method[challenge_data0]","label":"ReturnCode_Tests","shape":"dot"},{"color":"#1f77b4","data":{"name":"TestReturnCode_Modify","category":["code","iterate"],"task":"Modify the multiply_int function in code.py to be able to pass in a 'multiplier' argument to multiply the 'num' by 'multiplier'. Both arguments are integers. You can make sure you have correctly done this by running test.py","dependencies":["TestReturnCode_Write"],"cutoff":120,"ground":{"answer":"def multiply_int(num, multiplier):\n return num * multiplier\n","should_contain":["8","49","-12"],"should_not_contain":[],"files":["test.py"],"eval":{"type":"python"}},"info":{"difficulty":"intermediate","description":"Builds on the previous function also take a multiplier .","side_effects":[]}},"id":"agbenchmark/generate_test.py::TestReturnCode_Modify::test_method[challenge_data0]","label":"ReturnCode_Modify","shape":"dot"},{"color":"#1f77b4","data":{"name":"TestReturnCode_Write","category":["code","iterate"],"task":"Add a function called multiply_int in code.py that multiplies numbers by 2. You can make sure you have correctly done this by running test.py","dependencies":["TestReturnCode_Simple"],"cutoff":120,"ground":{"answer":"Just a simple multiple by 2 function. Num is 4 so answer is 8","should_contain":["8"],"should_not_contain":[],"files":["test.py"],"eval":{"type":"python"}},"info":{"difficulty":"novice","description":"Small step up, just writing the function with a name as well as the return statement.","side_effects":[]}},"id":"agbenchmark/generate_test.py::TestReturnCode_Write::test_method[challenge_data0]","label":"ReturnCode_Write","shape":"dot"},{"color":"#1f77b4","data":{"name":"TestReturnCode_Simple","category":["code","iterate"],"task":"Return the multiplied number in the function multiply_int in code.py. You can make sure you have correctly done this by running test.py","dependencies":["TestReadFile"],"cutoff":120,"ground":{"answer":"Just a simple multiple by 2 function. Num is 4 so answer is 8","should_contain":["8"],"should_not_contain":[],"files":["test.py"],"eval":{"type":"python"}},"info":{"difficulty":"basic","description":"Simple test if a simple code instruction can be executed","side_effects":[]}},"id":"agbenchmark/generate_test.py::TestReturnCode_Simple::test_method[challenge_data0]","label":"ReturnCode_Simple","shape":"dot"},{"color":"#1f77b4","data":{"name":"TestDebugSimpleTypoWithGuidance","category":["code","iterate"],"task":"1- Run test.py.\n2- Read code.py.\n3- Modify code.py.\nRepeat step 1, 2 and 3 until test.py runs without errors.\n","dependencies":["TestReadFile"],"cutoff":75,"ground":{"answer":"[0, 1] [2, 5] [0, 3]","should_contain":["[0, 1]","[2, 5]","[0, 3]"],"should_not_contain":[],"files":["test.py"],"eval":{"type":"python"}},"info":{"difficulty":"novice","description":"Tests ability for the agent to debug python code with a simple typo in it.","side_effects":[]}},"id":"agbenchmark/generate_test.py::TestDebugSimpleTypoWithGuidance::test_method[challenge_data0]","label":"DebugSimpleTypoWithGuidance","shape":"dot"},{"color":"#1f77b4","data":{"name":"TestDebugSimpleTypoWithoutGuidance","category":["code","iterate"],"task":"Make test.py run without errors.","dependencies":["TestDebugSimpleTypoWithGuidance"],"cutoff":75,"ground":{"answer":"[0, 1] [2, 5] [0, 3]","should_contain":["[0, 1]","[2, 5]","[0, 3]"],"should_not_contain":[],"files":["test.py"],"eval":{"type":"python"}},"info":{"difficulty":"intermediate","description":"Tests ability for the agent to debug python code with a simple typo in it, using a very broad prompt without guidance","side_effects":[]}},"id":"agbenchmark/generate_test.py::TestDebugSimpleTypoWithoutGuidance::test_method[challenge_data0]","label":"DebugSimpleTypoWithoutGuidance","shape":"dot"},{"color":"#1f77b4","data":{"name":"TestDebugMultipleTypo","category":["code","iterate"],"task":"Make test.py run without errors.","dependencies":["TestDebugSimpleTypoWithoutGuidance"],"cutoff":90,"ground":{"answer":"[0, 1] [2, 5] [0, 3]","should_contain":["[0, 1]","[2, 5]","[0, 3]"],"should_not_contain":[],"files":["test.py"],"eval":{"type":"python"}},"info":{"difficulty":"advanced","description":"Now it's not just the typo error, but also an incomplete import statement","side_effects":[]}},"id":"agbenchmark/generate_test.py::TestDebugMultipleTypo::test_method[challenge_data0]","label":"DebugMultipleTypo","shape":"dot"},{"color":"#1f77b4","data":{"name":"TestThreeSum","category":["code","iterate"],"task":"Create a three_sum function in a file called code.py. Given an array of integers, return indices of the three numbers such that they add up to a specific target. You may assume that each input would have exactly one solution, and you may not use the same element twice. Example: Given nums = [2, 7, 11, 15], target = 20, Because nums[0] + nums[1] + nums[2] = 2 + 7 + 11 = 20, return [0, 1, 2].","dependencies":["TestFunctionCodeGeneration"],"cutoff":60,"ground":{"answer":"The three_sum function coded properly.","should_contain":["[0, 1, 2]","[0, 2, 5]","[0, 2, 3]"],"should_not_contain":[],"files":["test.py"],"eval":{"type":"python"}},"info":{"difficulty":"advanced","description":"Tests ability for the agent to create the three_sum function.","side_effects":[]}},"id":"agbenchmark/generate_test.py::TestThreeSum::test_method[challenge_data0]","label":"ThreeSum","shape":"dot"},{"color":"#1f77b4","data":{"name":"TestFunctionCodeGeneration","category":["code"],"task":"Create a two_sum function in a file called code.py. Given an array of integers, return indices of the two numbers such that they add up to a specific target. You may assume that each input would have exactly one solution, and you may not use the same element twice. Example: Given nums = [2, 7, 11, 15], target = 9, Because nums[0] + nums[1] = 2 + 7 = 9, return [0, 1].","dependencies":["TestReturnCode_Write"],"cutoff":90,"ground":{"answer":"The two_sum function coded properly.","should_contain":["[0, 1]","[2, 5]","[0, 3]"],"should_not_contain":[],"files":["test.py"],"eval":{"type":"python"}},"info":{"difficulty":"advanced","description":"Tests ability for the agent to create the two_sum function.","side_effects":[]}},"id":"agbenchmark/generate_test.py::TestFunctionCodeGeneration::test_method[challenge_data0]","label":"FunctionCodeGeneration","shape":"dot"},{"color":"#1f77b4","data":{"name":"TestPasswordGenerator_Easy","category":["code"],"task":"Create a random password generator. The password should have between 8 and 16 characters and should contain letters, numbers and symbols. The password should be printed to the console. The entry point will be a python file that can be run this way: python password_generator.py","dependencies":["TestWriteFile"],"cutoff":90,"ground":{"answer":"password_generator.py is created and satisfies the requirements.","should_contain":[],"should_not_contain":[],"files":["test.py"],"eval":{"type":"python"}},"info":{"difficulty":"basic","description":"Tests ability for the agent to code a file organizer.","side_effects":[]}},"id":"agbenchmark/generate_test.py::TestPasswordGenerator_Easy::test_method[challenge_data0]","label":"PasswordGenerator_Easy","shape":"dot"},{"color":"#1f77b4","data":{"name":"TestWritingCLI_FileOrganizer","category":["code"],"task":"Create a file organizer CLI tool in Python that sorts files in a directory based on their file types (e.g., images, documents, audio) and moves them into these corresponding folders: 'images', 'documents', 'audio'. The entry point will be a python file that can be run this way: python organize_files.py --directory_path=YOUR_DIRECTORY_PATH","dependencies":["TestPasswordGenerator_Easy"],"cutoff":90,"ground":{"answer":"The correct python file is written and organizes the files accordingly","should_contain":[],"should_not_contain":[],"files":["test.py"],"eval":{"type":"python"}},"info":{"difficulty":"basic","description":"Tests ability for the agent to create a random password generator.","side_effects":[]}},"id":"agbenchmark/generate_test.py::TestWritingCLI_FileOrganizer::test_method[challenge_data0]","label":"WritingCLI_FileOrganizer","shape":"dot"},{"color":"#1f77b4","data":{"name":"TestWebApp_ListAnimals","category":["code"],"task":"Build a web page with a list of animals. When someone clicks on the word 'Dog', a message should appear that says 'Dogs are known as man's best friend!'. You'll need to make a list with the name 'Dog' and then write a little bit of JavaScript to make the message appear when the name is clicked. Mark the div containing dog with the id 'dog'. Put the message inside a