What is GitHub Copilot?
GitHub Copilot is an AI-powered code completion tool that is designed to help developers write code more efficiently. It is developed by GitHub in collaboration with OpenAI and branded as an “AI pair programmer”. It uses OpenAI Codex to provide suggestions and autocompletions to developers as they write code, making it faster to complete programming tasks.
So, Copilot is not just an autocomplete tool, it offers a wider range of suggestions based on the context of the code being written. Copilot can propose entire functions or lines of code by analyzing the context of the code being written, providing developers with a faster and simpler way to write their code and reduce need to search for solutions elsewhere.
How Copilot works?
Copilot is powered by OpenAI Codex. Codex is designed specifically to generate code based on natural language inputs. It’s built on top of GPT-3 architecture and trained on a massive dataset of code in various programming languages, as well as other text sources such as books, articles and web pages.
The GitHub Copilot editor extension sends your code context to GitHub Copilot service, which then uses OpenAI Codex to generate code suggestions. It is worth noting that code context information (like programming language, text of the code written so far, libraries or frameworks being used, position of the cursor…) is encrypted and transmitted over a secure connection, ensuring that your code and data are protected.
Copilot can understand and generate code for a variety of programming languages. For each language the quality of suggestions may depend on the volume and diversity of training data for that language. For example, Python and JavaScript are well-represented in public repositories and are one of GitHub Copilot’s best supported languages.
Currently, GitHub Copilot supports several programming languages, including C, C++, Ruby, Scala, Python, JavaScript / TypeScript, PHP, Go, Java, C# – basically all languages that appear in public repositories.
Copilot and Visual Studio
To use GitHub Copilot has support for many IDEs including Visual Studio Code and Visual Studio.
To set up Copilot and Visual Studio you can follow these steps:
- Create a GitHub account (if you don’t have one already)
- Set up your subscription for a personal account. Go to GitHubCopilot and click on “Start my free trial”. You can then follow sign-up steps and enable your subscription for individuals. Make sure to cancel the subscription before your 60-day trial is up(if you don’t want to continue using the tool)
- Make sure you have installed Visual Studio 2022 17.4.4 or later version
- Install GitHub Copilot extension in Visual Studio and you are ready to go
Getting started is pretty straightforward:
- Start coding
- When you get suggestion, hit Tab to accept or Esc to ignore
- To see next suggestion use Alt + .
- To see next suggestion use Alt + ,
- To trigger inline suggestion Ctrl + Alt + \
Check the video below to see how Copilot generates multiple suggestions for a simple function calculating day difference between two dates. Also, it is able to translate natural language into code, and based on your comment, it tries to give adequate suggestions.
Copilot for individuals vs business
You also have the option to set up a GitHub Copilot Business subscription. More about business subscription, you can find here.
Block suggestion matching public code feature
Copilot includes a filter which detects code suggestions matching public code on GitHub and you can choose to enable or disable this filter. If the filter is ON it checks code suggestions(with surrounding code of ~150 characters) and if it finds a match or near match, the suggestion will not be shown to you.
So basically it’s designed to prevent the tool from suggesting code that is too similar to code that is publicly available on GitHub
Turning this filter ‘ON‘ can be useful for developers or organizations who prefer not to use public code for some reason (avoiding legal issues because of risk of copyright violation, improving security, encouraging originality)
And while turning on this feature can have benefits, it also can have some drawbacks. It may limit the scope and quality of code suggestions provided by Copilot, since the tool’s ability to provide relevant suggestions is based on its access to a wide range of training data, including public code on GitHub, which then can lead to increased development time because the lack of suggestions and reduced code quality (we may miss out on high-quality, well-tested solutions which could improve the quality and efficiency of our code).
Copyright issues
There have been some concerns about potential copyright issues with GitHub Copilot. Since the tool generates code based on ML models trained on publicly available code, there is a risk that it could produce code that violates someone else’s intellectual property rights.
To address these concerns, GitHub has taken steps to try to mitigate the risk of copyright infringement. For example, the tool is designed to filter out code snippets that match public code on GitHub, as discussed earlier. Additionally, they plan to add new capabilities to Copilot in 2023, according to this piece.
With these updates, developers should be able to locate licensing information for suggested code fragments and access to an inventory of similar code found in GitHub public repositories.
In the image above you can see what GitHub Copilot FAQ states. So GitHub does not own the suggestions GitHub Copilot generates. The code you write with Copilot’s help belongs to you, and you are responsible for it. Thus, developers should be aware of copyright laws. Should developers do their due diligence, perhaps by pasting suggested code snippets into search engines to ensure there’s no copyright attached?
What data Copilot collects?
Copilot collects user engagement data such as user edit actions, error data (errors or issues that occur when using Copilot), and usage data to improve its suggestions and overall user experience. For Copilot for Business, code snippets data is transmitted to GitHub only in real-time to return suggestions, and is discarded once a suggestion is returned. However, for Copilot for Individuals, code snippets data may be collected and retained depending on telemetry settings, and is used to train and enhance AI models.
More about how this data is collected, used and protected you can find here.
Incorporate copilot into existing project
The EmployeeRepository
is a class that implements the IEmployeeRepository
interface, which defines a set of methods for performing CRUD (Create, Read, Update, Delete) operations on employee entities. It uses ApplicationDbContext
class to interact with the database.
Example 1 – Adding simple LINQ query
Let’s try to add an additional method for retrieving all employees with a specific job title. The signature for this method was already defined in the IEmployeeRepository
interface so I first tried to ask Copilot a question in a comment //Q:What interface member EmployeeRepository does not implement from IEmployeeRepository? ,but he wasn’t able to identify which methods are missing from an interface and I got zero suggestions for that one. After that I provided a comment in a natural language //implement GetByJobTitle which returns a list of employees with a given job title
which in the current code context was clear enough for him. Copilot knew that method should return a list, not a single employee, as well as which property to include in the filter. Also, not from a comment but from the current code context, he could make an assumption to exclude deleted employees since the same condition was used in previous queries.
You need to provide descriptive and clear comments to Copilot. The better comment you provide – the higher your chances of getting a better suggestion. E.g. if we omit part of the comment related to the return value, Copilot couldn’t conclude that there could be more than one Employee with the given job title(since job title is not a unique identifier).
Example 2 – Context matters
Let’s go one layer above, to EmployeeService
. This class implements business logic. It contains methods that perform various operations related to employees. The class also uses a repository interface IEmployeeRepository
to interact with the data layer and a mapping tool AutoMapper
to map entities to DTOs (data transfer objects) and vice versa.
Let’s try to give Copilot a hint to implement retrieving of employees on a Service level with this simple comment. A simple comment //implement GetByJobTitle without pagination will do the work. Again, based on the comment it knows that it should call the GetByJobTitle method, and based on code context, it knows that it should use the repository for this call and that result should be mapped to DTO objects. Also, based on code context (e.g GetAll method), it returns distinct employees based on their email addresses.
Example 3 – Analysing codebase and multiple suggestions
Let’s try to add another, a little bit more complicated method to EmployeeRepository
. The method is supposed to retrieve all employees who made a desk reservation in the current year. For this purpose, we should use Reservation
property (collection navigation property) that allows the employee to access the related “Reservation” entities.
Firstly, I tried to ask Copilot for help to understand better what is Reservation
property and what is the relation between Employee and Reservation table. In the image bellow you can see the questions I asked (comments starting with //q:) and the answers I got from Copilot, also via comments starting with //a:
Some of the answers Copilot provided were not correct. E.g. the answer to why ICollection<Reservation>
is better to use is correct. Copilot sad that ICollection
is more flexible than IList
which makes sense, since ICollection
is a more generic collection interface that allows more implementation freedom and versatility. But the reason Copilot provided (the last answer in the image) is not valid. ICollection<T>
inherits from the IEnumerable<T>
interface, which provides a subset of functionality for iterating through a collection, but it does not inherit from IList<T>
. It’s the other way around, IList<T>
inherits from ICollection<T>
. Also Copilot stated that Employee
table holds a foreign key reference to Reservation
table, which is not correct.
Let’s continue with retrieving Employees who made desk reservation in some office in the current year. We received two suggestions from Copilot.
In terms of performance, the second query is likely to be more efficient than the first query. This is because the second query performs a single database query that filters the Employees
table and loads only the relevant data into memory, while the first query performs two separate database queries and then filters the data in memory using LINQ. Copilot can generate multiple suggestions for code, but it is up to the you to evaluate them carefully and choose the best suggestion that fits your needs.
I tried to ask Copilot what a potential bug in this code might be, by providing it with a question within a comment (//q:What is a potential bug in this code?) but I didn’t get any suggestions while ChatGPT was more helpful with providing insights in potential bugs and issues with some peace of code.
The potential bug in this code is that it only checks whether an employee has a reservation with a start date in the current year. It does not check whether the reservation end date is also in the current year. This could result in returning some employees who have reservations that started in the previous year and ended in the current year.
Copilot can produce code with bugs or not completely correct code so it’s important to carefully review and test any code generated by Copilot to ensure that it meets the required functionality and is free of bugs or issues.
Example 4 – Generating docstrings
Docstring is a special type of comment used to document a method, property, class, or other elements of your code. Docstrings are written using XML documentation comments that provide a structured way to document your code.
Let’s try to test how Copilot can be helpful in this. Firstly, I tried to make Copilot generate the whole docstring for a simple method that retrieves an employee by its id, providing him just a comment, but I didn’t get any meaningful suggestions.
Since I didn’t get any relevant suggestions for multiple variations of this comment, I added docstring for a few methods manually expecting that Copilot will learn from the code context(the structure and content of the documentation, as well as from the code itself) and give me more useful suggestions later.
Here is an example of some docstrings I added manually.
So now, let’s go back to our method and see how Copilot behaves now. In the video below, you can see that after some learning, Copilot is able to generate docstring by itself, step by step. Each new row is a request to Copilot service and a new suggestion, so Copilot is still unable to generate the whole docstring suggestion within one request.
To avoid some unnecessary requests(for opening and closing xml tags) to Copilot service, since they sometimes can have bigger latency, I used a combination of manual docstring writing and Copilot suggestions. One issue here is that Copilot doesn’t offer any suggestions until I separate the start, value and the end of the XML tag in a new line. The suggestions provided by Copilot in this and other examples were useful for me and saved some of my time, but some of them weren’t accurate or clear enough and they needed corrections.
So, yes, Copilot can be a helpful tool for generating code and providing insights into programming concepts, but its answers may not be completely accurate or optimal. Therefore, it is important to carefully review and validate any comments or code generated by Copilot to ensure that it meets the requirements of the project and follows best coding practices.
Summary
- The quality of the suggestions provided by Copilot may depend on the size and complexity of your codebase, as well as the specific libraries and frameworks you are using. As you continue to use Copilot and provide feedback on its suggestions, it will improve over time and become more tailored to your specific use case.
- Don’t trust by default – read the suggestions and be sure they make sense, always validate and be especially suspicious of long suggestions.
- Context is very important – keep related files open and provide clear and concise inputs.
- Time saver – you definitely type less with smarter code completion than your IDE offers, especially when its suggestions improve over time and adapt to your coding styles.
Looking forward to exploring new features that GitHubCopilot Chat offer. To apply for a private preview you can visit this page .
Written by Aleksandra Kovacevic
Senior Back-end Engineer at IT Labs