Comparative Analysis of LLM Performance in Generating Unit Tests with SourceGraph Cody

Hi,

We are working on an Open Source (AGPL) Chrome extension: Digital Assistant Client

We have taken cody pro subscription and using extensively in vs code. While developing we have done some comparisions of LLMs that were provided along with cody pro subscription. Just wanted to share the metrics of the findings. This can help you to decide which LLM to be used based on the performance.

Here is the link : Comparative Analysis of LLM Performance in Generating Unit Tests

1 Like

Thank you very much for this benchmark. I appreciate your work.
Some follow-up questions arise.

I read in the document:

    1. Did not really get into analysing the quality of the test cases. Maybe in the future.

Does this mean that the generated tests generally work without syntax errors? Or did you need to bug fix them later on?

This would also be a helpful key metric in relation to the number of generated tests per LLM. This conclusion would help to determine if even some LLMs tend to produce fewer tests than others, but the lack of errors and higher quality would outweigh.

Thank you