What is GitHub Copilot?

So, Copilot is not just an autocomplete tool, it offers a wider range of suggestions based on the context of the code being written. Copilot can propose entire functions or lines of code by analyzing the context of the code being written, providing developers with a faster and simpler way to write their code and reduce need to search for solutions elsewhere.

How Copilot works?

Copilot is powered by OpenAI Codex. Codex is designed specifically to generate code based on natural language inputs. It’s built on top of GPT-3 architecture and trained on a massive dataset of code in various programming languages, as well as other text sources such as books, articles and web pages.

The GitHub Copilot editor extension sends your code context to GitHub Copilot service, which then uses OpenAI Codex to generate code suggestions. It is worth noting that code context information (like programming language, text of the code written so far, libraries or frameworks being used, position of the cursor…) is encrypted and transmitted over a secure connection, ensuring that your code and data are protected.

Copilot can understand and generate code for a variety of programming languages. For each language the quality of suggestions may depend on the volume and diversity of training data for that language. For example, Python and JavaScript are well-represented in public repositories and are one of GitHub Copilot’s best supported languages.

Currently, GitHub Copilot supports several programming languages, including C, C++, Ruby, Scala, Python, JavaScript / TypeScript, PHP, Go, Java, C# – basically all languages that appear in public repositories.

Copilot and Visual Studio

To use GitHub Copilot has support for many IDEs including Visual Studio Code and Visual Studio.

  1.  Create a GitHub account (if you don’t have one already)
  2.  Set up your subscription for a personal account. Go to GitHubCopilot and click on “Start my free trial”. You can then follow sign-up steps and enable your subscription for individuals. Make sure to cancel the subscription before your 60-day trial is up(if you don’t want to continue using the tool)
  3. Make sure you have installed Visual Studio 2022 17.4.4 or later version
  4. Install GitHub Copilot extension in Visual Studio and you are ready to go

Getting started is pretty straightforward:

  1. Start coding
  2. When you get suggestion, hit Tab to accept or Esc to ignore
  3. To see next suggestion use Alt + .
  4. To see next suggestion use Alt + ,
  5. To trigger inline suggestion Ctrl Alt \

Check the video below to see how Copilot generates multiple suggestions for a simple function calculating day difference between two dates. Also, it is able to translate natural language into code, and based on your comment, it tries to give adequate suggestions.

Copilot for individuals vs business

You also have the option to set up a GitHub Copilot Business subscription. More about business subscription, you can find here.

Block suggestion matching public code feature

Copilot includes a filter which detects code suggestions matching public code on GitHub and you can choose to enable or disable this filter. If the filter is ON it checks code suggestions(with surrounding code of ~150 characters) and if it finds a match or near match, the suggestion will not be shown to you.

So basically it’s designed to prevent the tool from suggesting code that is too similar to code that is publicly available on GitHub

Turning this filter ‘ON‘ can be useful for developers or organizations who prefer not to use public code for some reason (avoiding legal issues because of risk of copyright violation, improving security, encouraging originality)

And while turning on this feature can have benefits, it also can have some drawbacks. It may limit the scope and quality of code suggestions provided by Copilot, since the tool’s ability to provide relevant suggestions is based on its access to a wide range of training data, including public code on GitHub, which then can lead to increased development time because the lack of suggestions and reduced code quality (we may miss out on high-quality, well-tested solutions which could improve the quality and efficiency of our code).

Copyright issues

There have been some concerns about potential copyright issues with GitHub Copilot. Since the tool generates code based on ML models trained on publicly available code, there is a risk that it could produce code that violates someone else’s intellectual property rights.

With these updates, developers should be able to locate licensing information for suggested code fragments and access to an inventory of similar code found in GitHub public repositories.

In the image above you can see what GitHub Copilot FAQ states. So GitHub does not own the suggestions GitHub Copilot generates. The code you write with Copilot’s help belongs to you, and you are responsible for it. Thus, developers should be aware of copyright laws. Should developers do their due diligence, perhaps by pasting suggested code snippets into search engines to ensure there’s no copyright attached?


What data Copilot collects?

Copilot collects user engagement data such as user edit actions, error data (errors or issues that occur when using Copilot), and usage data to improve its suggestions and overall user experience. For Copilot for Business, code snippets data is transmitted to GitHub only in real-time to return suggestions, and is discarded once a suggestion is returned. However, for Copilot for Individuals, code snippets data may be collected and retained depending on telemetry settings, and is used to train and enhance AI models.


Incorporate copilot into existing project

The EmployeeRepository is a class that implements the IEmployeeRepository interface, which defines a set of methods for performing CRUD (Create, Read, Update, Delete) operations on employee entities. It uses ApplicationDbContext class to interact with the database.

Example 1 – Adding simple LINQ query

Let’s try to add an additional method for retrieving all employees with a specific job title. The signature for this method was already defined in the IEmployeeRepository interface so I first tried to ask Copilot a question in a comment //Q:What interface member EmployeeRepository does not implement from IEmployeeRepository? ,but he wasn’t able to identify which methods are missing from an interface and I got zero suggestions for that one. After that I provided a comment in a natural language //implement GetByJobTitle which returns a list of employees with a given job title which in the current code context was clear enough for him. Copilot knew that method should return a list, not a single employee, as well as which property to include in the filter. Also, not from a comment but from the current code context, he could make an assumption to exclude deleted employees since the same condition was used in previous queries.

You need to provide descriptive and clear comments to Copilot. The better comment you provide – the higher your chances of getting a better suggestion. E.g. if we omit part of the comment related to the return value, Copilot couldn’t conclude that there could be more than one Employee with the given job title(since job title is not a unique identifier).


Example 2 – Context matters

Let’s go one layer above, to EmployeeService. This class implements business logic. It contains methods that perform various operations related to employees. The class also uses a repository interface IEmployeeRepository to interact with the data layer and a mapping tool AutoMapper to map entities to DTOs (data transfer objects) and vice versa.

Let’s try to give Copilot a hint to implement retrieving of employees on a Service level with this simple comment. A simple comment //implement GetByJobTitle without pagination will do the work. Again, based on the comment it knows that it should call the GetByJobTitle method, and based on code context, it knows that it should use the repository for this call and that result should be mapped to DTO objects. Also, based on code context (e.g GetAll method), it returns distinct employees based on their email addresses.

Example 3 – Analysing codebase and multiple suggestions

Let’s try to add another, a little bit more complicated method to EmployeeRepository. The method is supposed to retrieve all employees who made a desk reservation in the current year. For this purpose, we should use Reservation property (collection navigation property) that allows the employee to access the related “Reservation” entities.

Firstly, I tried to ask Copilot for help to understand better what is Reservation property and what is the relation between Employee and Reservation table. In the image bellow you can see the questions I asked (comments starting with //q:) and the answers I got from Copilot, also via comments starting with //a:

Some of the answers Copilot provided were not correct. E.g. the answer to why ICollection<Reservation> is better to use is correct. Copilot sad that ICollection is more flexible than IList which makes sense, since ICollection is a more generic collection interface that allows more implementation freedom and versatility. But the reason Copilot provided (the last answer in the image) is not valid. ICollection<T> inherits from the IEnumerable<T> interface, which provides a subset of functionality for iterating through a collection, but it does not inherit from IList<T>. It’s the other way around, IList<T> inherits from ICollection<T>. Also Copilot stated that Employee table holds a foreign key reference to Reservation table, which is not correct.

Let’s continue with retrieving Employees who made desk reservation in some office in the current year. We received two suggestions from Copilot.

In terms of performance, the second query is likely to be more efficient than the first query. This is because the second query performs a single database query that filters the Employees table and loads only the relevant data into memory, while the first query performs two separate database queries and then filters the data in memory using LINQ. Copilot can generate multiple suggestions for code, but it is up to the you to evaluate them carefully and choose the best suggestion that fits your needs.

The potential bug in this code is that it only checks whether an employee has a reservation with a start date in the current year. It does not check whether the reservation end date is also in the current year. This could result in returning some employees who have reservations that started in the previous year and ended in the current year.

Copilot can produce code with bugs or not completely correct code so it’s important to carefully review and test any code generated by Copilot to ensure that it meets the required functionality and is free of bugs or issues.

Example 4 – Generating docstrings

Let’s try to test how Copilot can be helpful in this. Firstly, I tried to make Copilot generate the whole docstring for a simple method that retrieves an employee by its id, providing him just a comment, but I didn’t get any meaningful suggestions.

Since I didn’t get any relevant suggestions for multiple variations of this comment, I added docstring for a few methods manually expecting that Copilot will learn from the code context(the structure and content of the documentation, as well as from the code itself) and give me more useful suggestions later.

Here is an example of some docstrings I added manually.

So now, let’s go back to our method and see how Copilot behaves now. In the video below, you can see that after some learning, Copilot is able to generate docstring by itself, step by step. Each new row is a request to Copilot service and a new suggestion, so Copilot is still unable to generate the whole docstring suggestion within one request.

To avoid some unnecessary requests(for opening and closing xml tags) to Copilot service, since they sometimes can have bigger latency, I used a combination of manual docstring writing and Copilot suggestions. One issue here is that Copilot doesn’t offer any suggestions until I separate the start, value and the end of the XML tag in a new line. The suggestions provided by Copilot in this and other examples were useful for me and saved some of my time, but some of them weren’t accurate or clear enough and they needed corrections.

So, yes, Copilot can be a helpful tool for generating code and providing insights into programming concepts, but its answers may not be completely accurate or optimal. Therefore, it is important to carefully review and validate any comments or code generated by Copilot to ensure that it meets the requirements of the project and follows best coding practices.

Summary

  1. The quality of the suggestions provided by Copilot may depend on the size and complexity of your codebase, as well as the specific libraries and frameworks you are using. As you continue to use Copilot and provide feedback on its suggestions, it will improve over time and become more tailored to your specific use case.
  2. Don’t trust by default – read the suggestions and be sure they make sense, always validate and be especially suspicious of long suggestions.
  3. Context is very important – keep related files open and provide clear and concise inputs.
  4. Time saver – you definitely type less with smarter code completion than your IDE offers, especially when its suggestions improve over time and adapt to your coding styles.


Written by  Aleksandra Kovacevic

Senior Back-end Engineer at IT Labs