Where is the code privacy (training on code) setting?

serefarikan · April 16, 2024, 9:24am

I’ve been trying to understand how cody works (in vs code and other IDEs) for more than an hour now, trying to stitch together information from Sourcegraph web pages and FAQs.

The FAQ here says that Sourcegraph won’t train on my code (data) without my permission. Where is that setting in vs code or in sourcegraph.com that allows me to grant permission or to confirm that I don’t grant it ?

serefarikan · April 16, 2024, 11:25am

I bought a pro subscription and clicked the support/help link from vs code. This page makes it clear:
Sourcegraph - Cody Enterprise Terms of Use

Sourcegraph and Sourcegraph Partner LLMs do not use code from Cody Enterprise or Cody Pro teams to train models. Sourcegraph may fine-tune a custom model solely for your proprietary use if you purchase that service.

serefarikan · April 16, 2024, 12:44pm

Hmm, this statement conflicts with the above I think:

For Enterprise customers, Sourcegraph will not train on your company’s data. For Free and Pro tier users, Sourcegraph will not train on your data without your permission.

scroll down here: https://__sourcegraph.com/pricing?product=cody
to "Does Cody use my code to improve the models … "
I had to put __ into the link because the forum won’t let me post a link to that host (Sourcegraph.com ??)

PriNova · April 16, 2024, 3:56pm

Hello @serefarikan, welcome to the platform.

Firstly, your posts were flagged as spam due to the repeated posting of the same domain URL address.

Secondly, within your Cody extension settings, you have the option to enable or disable telemetry logging. This allows Sourcegraph to collect user metrics and events on Cody’s usage to enhance the product’s efficiency, reliability, and to ensure the best user experience in upcoming releases.
Thirdly, as with all other LLM-based applications, it is advisable not to include confidential information in your prompt to ensure maximum safety.

If you’re interested in the full privacy policy please have a look at Sourcegraph - Privacy Policy

I hope this helps answering your questions.

serefarikan · April 17, 2024, 5:55am

Thanks @PriNova I appreciate the help.

I posted the same domain twice because the same domain had two differing statements. I was merely asking which one of the two URLs had the valid/current one.

Thanks for pointing out to telemetry. I’m not sue what telemetry data includes, I’m usually happy to provide telemetry data as long as it does not include critical data.

The privacy policy you linked does not mention what is meant exactly by improving the services when it comes to using data collected from the users. There are now three URLs that I have which are somewhat related to whether or not proprietary code that Cody runs on is used for training. One cleary says “we don’t train on your data”, the other says “we only train on your data with your permission”, the other one says “we’ll use the collected data to improve services”.

I am afraid this situation is not ideal for someone who wants to go back to their employers and say that we can use this for product development, our code won’t be used for training. Similar offerings have very clear settings one can turn on/off, with consistent T&Cs.

I don’t want to name those offerings, but I cannot find the same clarity and as much as I want to think that all these companies would not use it if it did not satisfy this criteria, I’m hesitant. I simply sat down and read the docs and this is where I got

serefarikan · April 17, 2024, 7:21am

For anyone who comes across this thread, I received a clear answer from Sourcegraph support clarifying that they do not use customer data for training. The reason they don’t have a setting to opt out is that they simply don’t train on customer data anyway, so there’s no training to opt out from

I hope this helps anyone who may come looking for the same clarification. Thanks for your help @PriNova

PriNova · April 17, 2024, 7:39am

Indeed, Sourcegraph does not use your prompts and code to train LLMs. The telemetry is for user experience in regards to their products.

I’m happy that your questions got answers directly from Sourcegraph support team.

Happy coding

dipanker · August 1, 2024, 1:42pm

Hi - I still find this unclear. I was subscribed to GitHub Copilot and I had a setting that I could turn off such that MSFT would not use the code that is passed to their model for inference for future training. Here we have access to several model providers - whom I presume also have similar permissions. When we use the models via Cody - has Sourcegraph turned these permissions off for each model provider? I know commercial contracts with MSFT for Copilot will have this also mentioned explicitly. Am I misunderstanding how Cody Accesses these models?

Topic		Replies	Views
Cody privacy terms confusion Forum support	1	81	February 12, 2025
About privacy - cody pro Cody cody-pro	1	378	May 6, 2024
What data is shared with SourceGraph Cody support	2	143	September 24, 2024
Feedback on free version of Cody/VSCode for software development Cody support , vscode	1	178	July 27, 2024
Feedback on Cody CLI (experimental feature) Forum	9	1827	April 7, 2024

Where is the code privacy (training on code) setting?

Related topics