Where is the code privacy (training on code) setting?

I’ve been trying to understand how cody works (in vs code and other IDEs) for more than an hour now, trying to stitch together information from Sourcegraph web pages and FAQs.

The FAQ here says that Sourcegraph won’t train on my code (data) without my permission. Where is that setting in vs code or in sourcegraph.com that allows me to grant permission or to confirm that I don’t grant it ?

I bought a pro subscription and clicked the support/help link from vs code. This page makes it clear:
Sourcegraph - Cody Enterprise Terms of Use

Sourcegraph and Sourcegraph Partner LLMs do not use code from Cody Enterprise or Cody Pro teams to train models. Sourcegraph may fine-tune a custom model solely for your proprietary use if you purchase that service.

Hmm, this statement conflicts with the above I think:

For Enterprise customers, Sourcegraph will not train on your company’s data. For Free and Pro tier users, Sourcegraph will not train on your data without your permission.

scroll down here: https://__sourcegraph.com/pricing?product=cody
to "Does Cody use my code to improve the models … "
I had to put __ into the link because the forum won’t let me post a link to that host (Sourcegraph.com ??)

Hello @serefarikan, welcome to the platform.

Firstly, your posts were flagged as spam due to the repeated posting of the same domain URL address.

Secondly, within your Cody extension settings, you have the option to enable or disable telemetry logging. This allows Sourcegraph to collect user metrics and events on Cody’s usage to enhance the product’s efficiency, reliability, and to ensure the best user experience in upcoming releases.
Thirdly, as with all other LLM-based applications, it is advisable not to include confidential information in your prompt to ensure maximum safety.

If you’re interested in the full privacy policy please have a look at Sourcegraph - Privacy Policy

I hope this helps answering your questions.

Thanks @PriNova I appreciate the help.

I posted the same domain twice because the same domain had two differing statements. I was merely asking which one of the two URLs had the valid/current one.

Thanks for pointing out to telemetry. I’m not sue what telemetry data includes, I’m usually happy to provide telemetry data as long as it does not include critical data.

The privacy policy you linked does not mention what is meant exactly by improving the services when it comes to using data collected from the users. There are now three URLs that I have which are somewhat related to whether or not proprietary code that Cody runs on is used for training. One cleary says “we don’t train on your data”, the other says “we only train on your data with your permission”, the other one says “we’ll use the collected data to improve services”.

I am afraid this situation is not ideal for someone who wants to go back to their employers and say that we can use this for product development, our code won’t be used for training. Similar offerings have very clear settings one can turn on/off, with consistent T&Cs.

I don’t want to name those offerings, but I cannot find the same clarity and as much as I want to think that all these companies would not use it if it did not satisfy this criteria, I’m hesitant. I simply sat down and read the docs and this is where I got :slight_smile:

For anyone who comes across this thread, I received a clear answer from Sourcegraph support clarifying that they do not use customer data for training. The reason they don’t have a setting to opt out is that they simply don’t train on customer data anyway, so there’s no training to opt out from :slight_smile:

I hope this helps anyone who may come looking for the same clarification. Thanks for your help @PriNova

Indeed, Sourcegraph does not use your prompts and code to train LLMs. The telemetry is for user experience in regards to their products.

I’m happy that your questions got answers directly from Sourcegraph support team.

Happy coding

2 Likes