Designing social platforms in an evolving landscape

There are many types of users using Social Networks, and we are not talking small numbers here. Everything from end-users like you and I, to administrators, and all way through to advertisers. With millions of users using these platforms, you can easily imagine how they can become very demanding from a software and hardware perspective. It’s not an easy system to get in place and maintain. This makes the subject of social networks quite intriguing. Let’s dig a little deeper into the requirements and explore them further.

1.   Requirements

Constant change – Change is the only constant, and social networks are not immune from the innovation storm that we all operate in. If any software doesn’t change, renewing itself in some way, it’s users simply lose interest. Social networking platforms are very demanding in this area; they need to adapt and change all the time. They steal ideas from one another alongside innovating with new ones. They constantly experiment, trying out ideas, applying them to small populations of users or user groups. Sometimes it creates desirable results, where the design is taken forward and published to the world.

High performance - Ten years ago, if you made some requests on an application, and the results returned within five seconds, that would have been acceptable1. Granted that some users would get impatient and annoyed, but not everybody. Nowadays, if requests last longer than a second, people will lose interest in using the application2. Having good performance is crucial with the modern human’s attention span, especially for social networking platforms. Simply put, high performance is all-important.

Easy UX – The next important requirement is being able to navigate the platform with ease. Being able to find your way around within the application is an essential aspect of user experience3. Not being able to see where you are, how you can continue, and how you can do the things you want to do, is something that will push users away from this site.

There are many types of users with their unique requirements and ways of using the platforms. Firstly there is the end-user. Like most of us reading this, we are end-users of social networks, and we expect everything already discussed above to happen. Next, there are advertisers. Advertisers are a crucial part of these platforms. They are the main revenue engine for social networks, so they are not easily ignored by the business. Finally, there are the page admins. A case in point is Facebook, where you have pages that you want to administer. For this, you need some special credentials and rules and whatnot. There are also some other features regarding the usage, but focusing on the abovementioned is enough for now.

2. Change and evolve

Social networking platforms are fine as they are. Still, we, the architects, are expected to do the architecture at the very beginning and have that same architecture remain as long as the social network remains alive. For example, Facebook has been live for more than 15 years, designing something at the beginning to withstand the changes of time is practically impossible.

If we make some analogy between Waterfall and Agile as project management types, we can say that, under the Waterfall approach, you are asking for all the requirements to be set at the beginning. Once you get all the requirements, once you have all the documentation, etc., it’s easy for you to follow the project and do what needs to be done4.

In the Agile world, it’s very different. You cannot simply require all those things upfront. You must adapt to never-ending changes every iteration. So, if the requirements are constantly changing, should the architecture also change? Can we make it possible for the architecture to evolve together with the requirements5?

Before we dive into that, to create context around what we are discussing, we will cover some important information to help the discussion.

2.1 Architectural types

Let’s discuss what architectural types there are.

Technical architecture type – What does this mean? This is in the realm of software engineers and developers. It includes frameworks, different libraries, connection types between classes, and all things technical related to the platform.

Data architecture type - This architecture focuses on everything that’s related to data. It can be database schemas, table layouts, any kind of data redundancies, replica servers, etc.

Security architecture type - When we discuss the security architecture, we are thinking about security measures, legal compliance, any guidelines that need to be taken care of.

Domain-Driven architecture - Probably one of the most recently discussed architecture types. Domain-Driven architecture focuses on the bounded context. A bounded context is a group of functionalities that work together, and only a few of those properties are exposed to the outside world. The idea behind the bounded context is that everything that is done inside this context remains the same and doesn’t break the logic of anything else outside of that context.
So we have covered the architectural types. Let’s now build up the picture further with architectural models.

2.2 Architectural models

N-tier monolith - The first well-known architectural model is the layered, or monolith-architectural model. Some call it the n-tier architectural model.

In the layered architectural model illustrated above, we have five different layers. In the ideal world, you should be able to change anything in any of those layers, and everything will work perfectly without affecting the behaviour of the others. For example, changes in the service layer have no impact outside оf itself. In a perfect world, you don’t need to touch any of the outside layers, and the application would continue to work correctly. However, experienced developers of this type of architecture are aware that this is not quite true, if at all.

The second important thing about this architecture is that we have constant flow through the layers, up and down. The requests go from the topmost layer to the bottommost layer, and the responses go back from the bottommost layer to the uppermost layer. However, from time to time, some of these layers are not relevant for every request6 (even not needed in most cases). For example, in the illustration below, the service layer is not required and is therefore skipped.

This notion is known as sinkhole anti-pattern architecture, where one of those layers is not required in most requests7. In this case, we simply skip this layer. Skipping layers, and by accepting this type of work, we have a notion of creating as many connections as necessary, and as tighter of coupling as needed, which is generally very bad for the architecture.

The next thing for consideration is the architecture. Let’s assume that we would like to change some of the domains in the application. For example, “feeds” feature that most of the social networks have nowadays. If we want to change a “feed” feature, we need to make the changes in all of these layers. That being said, the domain aspect of this application is not very well defined. This type of architecture is something that we don’t want to use if we are concerned about Domain-Driven Design. If this is not okay, the next thing that we do is to try to separate all these functionalities into separate domain models.

Let’s discuss models through an example. We have one functionality for the “feeds” in the first domain model. We have another functionality for the “ads”, in the second domain model, and everything else in the third domain model. This is okay as a concept. The only thing that is problematic here is the database. Let’s imagine that we have a 10+-year-old solution; it still lives and is being maintained and expanded. The database itself has vast amounts of tables and connections, are holding millions of records. It’s not easy to go to the database designer (or anybody that cares about the database and its data) and ask of them to simply split the database in a way the new architecture would prefer it to be divided. In practice, this is very hard, and in many examples, close to impossible when we take into consideration the time and budget needed. So, even though the Domain-Driven design is a good approach, we still have tight coupling when it comes to the database.

Next, a similar approach to this is microkernel architecture. In the context of this article, when we speak of microkernel architectures, we are referring to plugins8. An important point here is that these plugins are not focused on domains, but services. For example, browsers have microkernel architectures. You can put all different plugins in a certain browser, but all those plugins are service-based. They are rarely grouped regarding the domain.

Let us discuss a very well-known micro-services architecture. Many of us are aware of this architecture. It represents the ability to separate the domains as much as we need. With all domains being separated, it allows us to expand this architecture in multiple domains9. This is a good architecture to take as a Domain-Driven (Design) architecture.

Now, on top of the microservices approach, we can build something similar to HATEOAS architecture. It stands for “Hypermedia as the Engine of Application State”, where each response carries within itself other hyperlinks that will guide you further through the application10. This application can be built using microservices architecture, but it’s more of conceptual architecture rather than a technical one.

3. Practical Aspects

Let’s discuss a little bit about the practical aspects of architecture. When we consider social networks, as previously mentioned, we need to be able to change, adapt, and upgrade the architecture constantly. For this, we need to take care of several things.

3.1 Dimensions

We need to identify and prioritize the dimensions that are important to the outcome we are trying to achieve. There are many dimensions to any architecture, and it's imperative we prioritize what is most important to the platform. Examples of dimensions are audit-ability, performance, security, data, legality11, and scalability12. By determining these dimensions, we can focus on what is necessary for our application. Once we have determined priority, we can then focus on how to measure and constantly improve these dimensions. We do this by using so-called “fitness functions”.

3.2 Fitness functions

Fitness functions are tests for non-functional requirements in the application13. Implementing these fitness functions will help us determine the cost of maintenance. Let’s initially discuss some possibilities for fitness functions.

If performance is important to us, we need to implement functions that, with each delivery, will test our application for its performance. If, for example, a page request takes longer than one second (and one second is our limit), we become aware that there is a problem. We do not push the release live; instead, we resolve the issue there and then.

The second fitness function might be about environmental elasticity. Most of us are using Cloud environments, and we are aware that in a Cloud environment, there are possibilities to extend your fleet of servers in an elastic way. Whenever there is a huge amount of requests, the Cloud can, by itself, raise new servers, new instances, or maybe new docker instances (whatever is necessary). That way, it manages to service all requests. The question arises, are those instances raised when we need them? How can we test that? A very simple way to test the elasticity of a cloud environment is to establish a test environment on the Cloud, prepare load tests, and run them as many times as we need. This allows us to monitor and measure the instances of how they behave in specific scenarios and conditions. If, for example, we have 4,000 requests, and we are expecting a new instance to be raised, you can see if things transpire as planned. If the application doesn’t behave correctly, we can act to remedy the issue accordingly. The same goes for bringing down instances. When we reduce the number of requests, we observe to see if the application behaves as it should. If it doesn’t, then we know we need to make changes.

The next type of fitness function is to create a listener inside your application. The listener indicates what functions and methods are used in an application. If we are building code for 10+ years, we have lots of legacy code that most probably doesn’t even need to be there anymore. If we have any type of fitness functions that test what methods are executed and those that are not, and we have a report of that, we can easily clean our code of unnecessary items14.

Another example could be around accessibility for people with disabilities. Each time we push a release to the public, we can run the accessibility tests15. If there are any incorrect pages displayed which don’t fully support what our user needs, we can simply roll back the deployment and correct those issues.

The last one we will discuss here is the ability for all services to fail gracefully. We all know what this means. There are tons of services in our microservice application, and we must be aware of how those services function. Some of those services will most likely break16. When this happens, we need to be prepared so that we know what’s going on in our architecture and how our application is behaving.

3.3 Feature toggles

The article so far has discussed that we need to be aware that our architecture will evolve and change constantly. So, how do we get prepared to welcome the changes? How do we present small types of deployments to certain parts of the solution, and to test them?

This is where feature toggles come into play. The feature toggle set is a kind of functionality where we can switch on and off different features depending upon when we need them, how we need them, etc. These feature toggles are quite good for several things that we need to do17.

One feature toggle is A/B tests. We have tons of services, and we should be able to do A/B tests for all those services. A/B test allows us to make a comparison between two or more different implementations on a single service or functionality. Ones we are satisfied with one of the implementations, we should be able to toggle off the other ones or just simply remove them if needed.

Another one is canary releases. Canary releases are small chunks of releases for only a certain group of users. For example, when we have new functionality in our application, and we don’t know whether the end-user will accept this functionality. Using the approach of the canary releases, we can create a new service and toggle this functionality only a specific set of users. A few examples to illustrate this could be employees in our office alone, or maybe for only one geographical state. This way, this user group can be testers for our new functionality. Since these functionalities will be tested on a production environment, with the testing group being the actual test users, this can be described as the best testing group an application can ever have ?, although I don’t recommend doing this too often18.

Another possibility where we can use this fitness function is in experimentation. Again, if there is something new that we would like to experiment with, and see how it is received, or will it work correctly, then we can use fitness functions to toggle the functionalities of the code.

3.4 Maintenance

One important thing besides monitoring the services is that we must be able to track the routes between them. Why is this routing important? The monitoring of them is important because we will know when one of the services stops, which implicitly means that no other services are routing to it. If any are no longer being routed to, they should automatically be removed from the ecosystem. The business needs should determine the conditions and resulting response to whether a service is being routed to or not. This is another example of a fitness function.

4. Conclusion

All of what we have discussed in this article helps us manage changing requirements better. Thus, allowing architects to modify and change our architectures in unison with changing conditions. We should be able to evaluate the impact of any modifications quickly, and decide whether to push changes forward, fine-tune them, or roll them back. As you can see, all these approaches allow for an evolutionary architecture for our software.
So, to do a little recap… To be able to embrace the future of our architecture so it can change and evolve, initially, we need to design it in the Domain-Driven fashion. The second thing is that we should identify our priorities. We cannot test and monitor everything. Based on our priorities, we should be able to create fitness functions, at least for the most important cases. Besides other things, being able to develop fitness functions will also allow us to measure the cost of maintenance more easily. The price of maintenance in terms of effort, time, and money is one of the most important things for a software architect. Feature toggles are a good tool to deploy small parts and to test them, see how the application behaves and how the end-users receive them, and then depending on needs, turning the functionality or the service on or off. As for maintenance, being able to know how services are performing and which are no longer used will allow us to keep the solution clean and neat, thus enabling more efficient maintenance.

By: Ilija Mishov, Co-Founder of IT Labs
Co-author: Erin Traeger, Business Analyst at IT Labs


1 In 2011 perhaps 35% abandon by the time it’s 10 seconds, 25% at 4 seconds, which indicates a higher tolerance.

2 In 2016, some sites had an INCREASE in load time whilst trying to offer a more immersive experience. This hurt their sales. Included is a statistic “Split second” for Nordstrom, which says, “Nordstrom saw online sales fall 11% when its website response time slowed by just half a second.” Note that at that time, they believed 2.5 second load time was ideal.
As of 2018, the bounce rate was 32% for 1-3 second load time – see infographic at

3 Overall, UX is very important. But interestingly enough, in 2018, research conducted amongst mobile users indicates millennials not only “are more likely to blame a slow learning curve on the app itself. They won’t keep using an app if it frustrates them or doesn’t fulfill their needs.”, but also “Suprisingly, millennials may be attracted to Snapchat because its design is more complicated and less intuitive than other apps.” Further within, it argues that Snapchat is an example of “shareable design”, which “requires users to learn by watching others.” -

4 NGuru99 – Waterfall can have phases, though, so changes do occur.

5 “On an agile project you assume that you cannot fix the requirements of the system up-front. As a result having a detailed design phase at the beginning of a project becomes impractical The architecture of the system has to evolve through the various iterations of the software. Agile methods, in particular extreme programming (XP), have a number of practices that make this evolutionary architecture practical.” -

6 “Challenges” section backs up Ile’s statements about layers that aren’t needed for every request, “It’s easy to end up with a middle tier that just does CRUD operations on the database, adding extra latency without doing any useful work.” -

7 What architecture sinkhole anti-pattern is – see 2nd and 3rd paras under “Considerations”. It recommends that if 80 percent (from the 80-20 rule) are simple pass-through processing, then you may want to consider making some of the architecture layers open.

Another mention of sinkhole antipattern -

8 The article  in the “Pros” and “cons” sections gives reasons this wouldn’t be ideal for a high-traffic social network. Explains that there is the core and then plugins, with core containing minimal functionality needed to run the system. Also explains that generally not the ideal pattern to be used in high-performance applications, not highly scalable, and requires a thorough analysis of the design before implementation.

9“Microservices” section –Also discusses advantages that correlate with what you’d want on a social network – “ability to scale only microservices needing to be scaled.” “Easier to rewrite pieces of the application because they’re smaller and less coupled to other parts.” As well, it says it’s ideal for “applications that would become very complex if combined into one monolith.”, which for social networks always needing to push out new features, this would happen. -

10 Good explanation of what HATEOAS is, why you’d want it, such as under the section “Why do we need HATEOAS?”: “The single most important reason for HATEOAS is loose coupling. If a consumer of a REST service needs to hard-code all the resource URLs, then it is tightly coupled with your service implementation. Instead, if you return the URLs it could use for the actions, then it is loosely coupled. There is no tight dependency on the URI structure, as it is specified and used from the response.” -

11“Chapter 4. Scalability and Performance” – “A production-ready microservice is scalable and performant.” “Efficiency is of the utmost importance in real-world, large-scale distributed systems architecture, and microservices ecosystems are no exception to this rule. Scalability and performance are uniquely intertwined because of the effects they have on the efficiency of each microservice and the ecosystem as a whole…So, while scalability is related to how we divide and conquer the processing of tasks, performance is the measure of how efficiently the application processes those tasks.” -

12 Mentions how to handle data management and questions you should ask before implementing a microservice solution. The questions touch on data, auditability, security, legality…(see very start of article) -

13 Section “What is a Fitness Function?” – “…real-world architecture consists of many different dimensions, including requirements around performance, reliability, security, operability, coding standards, and integration, to name a few. We want a fitness function to represent each requirement for the architecture….Performance requirements make good use of fitness functions…Performance testing should be conducted early and frequently, in particular to pick up inflection points when performance changes radically (usually in the wrong direction) because of an update to code.” -

14 Interesting article arguing that fitness functions can be used to determine best method of refactoring and save money -

15 Lists accessibility as a non-functional requirement that can be tested by reviewing the application code à fitness functions can be used. (Note that there were some discussions elsewhere which indicated some consider this functional, others a combination of functional and non-functional.) -

16 “Design for Failure” (pg26 pdf) and “Support for failure” (pg 26 pdf) – “The more microservices there are, the higher the likelihood at least one is currently failing”, “Key: design every service assuming that at some point, everything it depends on might disappear – must fail “gracefully”, “Goal: Support graceful degradation with service failures” -

17 This article covers the toggles mentioned and backs up reasons for using them. -

18 On pp34-35, “By implementing new features hidden underneath feature toggles, developers can safely deploy the feature to production without worrying about users seeing it prematurely.”, “One beneficial side effect of habitually building new features using feature toggles is the ability to perform QA tasks in production.”

Fast transformation of work environments due to COVID-19 crisis

The Covid-19 crisis has created shock-waves in all aspects of our lives, in particular, the workplace. It's safe to say that workplaces will never be the same again. Starting with the very word, "workplace", previously it was a place where people traveled to work, now is blended with the places where we eat, sleep, exercise, play, and socialize. It's impacted how people learn, work, and socialize in so many ways.

On a positive note, this crisis has created an opportunity to design something that many organizations were slowly trying to embrace. Many were scared even to contemplate it. From an organizational perspective, the crisis is a valuable opportunity to change the way we work, benefiting the business and the precious people that work in them. Thus, bringing us to the subject of "Fast transformation of the work environment". By digitalizing our communications and putting in place tools for remote collaboration, we can redesign how our employees converse with each other, as well as with clients, customers, and vendors. And if done well, this can potentially be a significant winning point for success.

Switch to fully digital working environments

Post-virus winners will be the organizations that have quickly recognized the change and have the courage to jettison their old paradigms and start afresh. Many organizations have promptly figured out how to adapt and how to serve customers and clients remotely. From public schools remote-learning, streaming fitness classes, to tele-medicine in hospitals, every industry has accelerated its own digital working environment transformation.

Leaders need to prepare, setting expectations for the ways of working that will benefit the organization down the road so that employees can focus on the strategic business priorities of the future.

Many organizations will see and sense the opportunity to capitalize on the cost savings by switching to full digital working environments, thus reducing their operational costs.

Switching to fully digital working environments doesn't have to be overwhelming. It's not something to check off a list, but instead a mindset that becomes part of the organization's culture and experience.

Let us now cover the key elements that bring about the best results as part of this digitalization.


Companies must endeavor to remove all roadblocks and provide more streamlined information flow across the organization. Effective communication is an integral part of a well-performing group of people.

Many organizations have already embraced technology by using off the shelf communication and collaboration tools, reinforced by quick training sessions for their employees. People are using video communication more often, and organizations should create a governance structure that will remove all barriers and empower frequent interactions with local decision-making.


Leaders and employees must understand and support each other like never before. With transparency and positive criticism being championed to bring about the best results. Additionally, it doesn't have to be all about the negative side of the spectrum (i.e. problems/challenges). Positive transparency is equally, if not more, powerful and important. Public recognition of excellence amongst our teams will earn even greater trust and loyalty from employees.

Leaders who seize this mindset, sooner rather than later, will be better prepared to engage employees for the long term, regardless of what is happening in the external environment.


Let us be honest and clear; change is hard! Ideally, leaders need to champion "ways-of-working" that will help the organization down the road, rather than stagnate and hold it back. Employees must focus on the strategic business priorities of the future.

Instead of worrying about how employees spend their time, how they do their tasks, and the time it takes to get the job done, organizations should be focused more on the results and the value delivered from their work. i.e. the outcomes.

Motivating employees to perform will require modeling and measurement of their outcomes and being clear on those metrics. Companies must set expectations for what drives organizational priorities and goals rather than discrete tasks.


We will see the end of this Coronavirus episode, and many people will desire to get back to their old working habits in their original working environments. Still, many will want to continue working in "the new way", thus creating a beautiful mix of old and new, leveraging the best of both worlds.

We never know what the future will bring. We don't know when and where there will be another crisis. If we wish to succeed, we should always be prepared, knowing how to adapt fast. Learning from previous experiences, and the unique challenges that present themselves, we can be ready to take advantage of the changes that come.

AT IT Labs

For us as a team, all this came more naturally, and we adapted overnight. We could even say, almost instantly. Since day one, we already had our online digital tools and processes set up, working remotely and in mixed teams with our remote colleagues, clients, and partners. With an accumulated wealth of experience, knowledge, skills, the palette of remote managed services we offer, and the range of clients we support, we even helped many of them transform and switch to a fully digital working environment.

We take pride in who we are, what we are about, and how we can adapt to change fast. The fast transformation of our work environment is part of what we are. It's in our bones. We will continue to learn and adapt and teach others to do it as well as we do. So reach out and find out more.

Vladimir Ilievski,

Co-Founder and Managing Partner of IT Labs

There is no need to reinvent the wheel for User Identity Management

In this new era of digitalisation, one of the most expensive assets in these days is the data. Yet, most of the data breaches are caused by weak authorisation, compromised credentials and poor implementation of access control. For this reason, data protection and security must be priority number one when building a web application.

One of the core components in any architecture is the user management, in particular authentication and authorisation. The common thing for most applications is the need to know who a user is and does that user have permissions to perform a given action. We refer to this as Identity management.

Do-it-yourself (DIY) development approach, when thinking about identity management and solutions, should not be underestimated, because it is not free and will waste the resources on something that already exists in the market. Efforts to develop this functionality will keep you away from your core business of delivering value to your end customer. You would agree that’s where an organisations efforts and time should ideally be invested in, right?! Nowadays, companies and organisations are looking for ways to outsource user management to a service provider.

The good news is that there are several identity solutions that exist off-the-shelf that focus on precisely the functionality you need.

Choosing the right identity solution is one of the essential things in the process of designing a system. Simple applications might take care of identity management. But, for larger and more complex systems, that’s not a recommended approach.

Choosing the identity and access management provider depends mainly on the specific business needs and requirements.

Identity, by definition, enables the right people to access the right resources, so authentication is the central piece of any software product.

First thing first, What is an IdP? The core element of any identity management solution is the identity provider (IdP). IdP is a centralised place for storing digital user identities. The identity management solutions available are continually increasing. There are a variety of services available, and one must ideally choose wisely to satisfy the business needs on one side, and also make sure its delivered on-time and on-budget.

Azure AD B2C

Azure AD B2C is a delivery manageable Customer Identity & Access Management system (CIAM), providing business-to-customer identity as a service. It’s a cloud-based service, built on top of Azure Active Directory. While Azure Active Directory should be the choice for corporate scenarios to provide SSO service, Azure AD B2C is more suited for public-facing applications, which deals with external users.

Azure AD B2C serves as a direct replacement for managing user identity database and authentication.

Azure AD B2C guarantees security on top of the two standard protocols: OpenID Connect and OAuth 2.0. While also providing seamless integration with your SaaS or on-premises applications, with 99.9% guaranteed availability. But note, for free-tier, no Service Agreement is provided. In case of issues, one can only expect action if a ticket is raised with the Microsoft team, with the response time based on the agreed service plan that you have in place.

Data storage for Azure AD B2C is located in the United States, Europe or the Asia Pacific region.

Setting up the Azure AD B2C can be an easy-going user-friendly experience trough the Azure portal.

Azure AD B2C gives the ability to have the same look and feel as on your application, (e.g. while signing in, signing up, password resetting etc.), all this can be easily achieved through the UI, via user flows or custom policies. The recommended approach here is to define custom user-flows through the Azure portal for either for password resetting or sign—up process.

User flows provide several built-in templates. They also offer the flexibility to use customised HTML and CSS. The customised UI content should be hosted on any publicly available HTTPS endpoint that supports CORS, like AWS S3, CDNS or Azure Blob storage. Now, there is a brand-new feature named Company branding, that enables injecting banner logo, background image and even background colour. Unfortunately, at the moment of writing this article, this is in the state of public review. In any case, any additional customisations can be done with custom JavaScript code.

Multi-Factor Authentication

An additional security step is the possibility to enable multi-factor authentication. By using custom-policies, one can configure password complexity (Note: the default password complexity is set to strong). Any policy requirement can be enforced as needed, together with required error messages that dynamically update as requirements for the password are met (or not).

Azure AD B2C also provides language customisation, either by using the 36 Microsoft supported languages or by using customer’s translations, that are not provided by default.

With Azure AD B2C, we can use either social identity providers like Google, Amazon, Facebook, LinkedIn, Twitter etc., or external identity providers that support standard identity protocols like OAuth 2.0, OpenID Connect, and many more.

For each token issued, administrator access, Azure AD B2C emits audit logs, that are available for seven days. Azure AD B2C provides activity reports for each admin sign-in, along with usage reports for the number of users and number of logins. These can be used to analyse the data and create alerts on specific events.


  • Secure, using OpenID Connect and OAuth 2.0 protocols
  • UI customisation, page look & feel can be customised
  • Localisation
  • MFA
  • 99.9% availability per SLA
  • SSO


  • Not cost-effective
  • The data can be accessed only through PowerShell to Azure AD

Identity Server 4

In the list of identity solutions, Identity Server 4 has been the solution that many turn to initially. Identity Server 4 is open-source and free to use. It provides centralised login flow for all applications, either web or mobile.

Identity Server 4 has built-in support for OpenID Connect and OAuth2 protocols. SAML plugin is available in case one needs to support SAML based IdP.

Also, Identity Server 4 has support for external identity providers like Facebook, Azure AD, Google etc.

Identity Server 4 is middleware that can be used to make the authentication an authentication server hosted on a separated instance.

From a scalability perspective, this server does not provide scaling out of the box. However, this can be achieved by putting a load balancer in front of the service.

Despite logging, the emitting events provide more useful information. These events contain data in a structured way.

Direct access to the user identities in the database makes it easier in case of migration activities.

If you would like to try out this solution, there is a handy demo instance of the IdentityServer4 to play with.

The most significant advantage of the IdentityServer4 is that is open-source, so the full code base is available on GitHub, and therefore can be customised as per the needs of a particular use-case.

When talking about customisation, in case of a multi-tenant solution, separate tenant pages can be implemented, and the internal navigation can be achieved by extending the AuthorizeInteractionResponseGenerator class and overriding the ProcessInteractioAsync method.

Since there is no user interface (neither for admin purposes), the IdentityServer4 can only be configured by directly updating the database or making changes in the code itself. Luckily there is a plugin that addresses this. If there is a need for out-of-the-box admin UI, there is a paid admin plugin for precisely this purpose.


  • Core solution: free of charge
  • Good documentation
  • Easily extendable
  • Configuration as a code
  • Since it’s a framework and not IaaS, we can adapt it to our system by writing extending code


  • Multi-factor authentication is not enabled, it needs 3rd party solution
  • Localisation: needs to be developed
  • The server’s code template lacks:
    • user registration.
    • ‘forgot password’ functionality
    • MFA or Google Re-Captcha.

Amazon Cognito

Amazon Cognito is a user & identity management cloud service, enabling management of users in one place across multiple devices. It provides the possibility to sync all user information in one place securely and in a straightforward manner, with the ability to scale to hundreds of millions of users.

The two core services provided by Amazon Cognito are User and Identity pools.

User pools act as an Identity provider, storing user information’s and providing authentication information.

The authentication process resides within the Amazon Cognito user pool returned token. As defined in the OpenID Connect open standard, the ID Token contains basic unique information about the identity of the user. The Access token data is in a form that scopes which groups are granted access to a given authorised resource. Refresh token contains information needed to get new Access or ID token.

Amazon Cognito gives the possibility for customisation on multiple levels by using Lambda triggers. Either that’s a custom welcome message after a successful sign-up process or a trigger that will migrate an existing user directory (like AD) to user pools. Also, lambda triggers can be used for the pre-generation of a token, so the claims in the ID token can be modified. Post Authentication triggers might be used to send logs to CloudWatch (e.g. if a user has signed in from a new device).

For strengthening security, multi-factor authentication can be enabled from the UI. The two provided options are, sending an SMS, or using Time-based One-time Password. An everyday use case would be to use Time-based One-time password as a second step while authenticating, and keeping the SMS flow option for “forgot password” functionality.

Also, password policies can be customised based on particular use-cases.

For applications that provide a trial option, where the users can play around with the product/service before purchasing, Amazon Cognito has a perfect solution by using guest login, which enables restricted access.


  • User directory management and user profiles
  • Easy for sign-in and sign-up (resulting in faster development)
  • Sign-in using social network providers like Google, Facebook, Apple.
  • MFA
  • User migration trough AWS Lambda triggers
  • SSO
  • Supports access management via OAuth 2.0 (making authorisation easier)


  • Expensive security options
  • Less configuration control (compared to other options)
  • Not well-organised documentation




These days we rely on identity providers to securely connect our users to technologies and devices. Choosing the right identity solution must be made by taking into consideration the business value and the budget available, but without compromising security or chosen security protocols. Also, keep in mind scalability and SLA of the solution.

Follow the “do not limit a user” approach by choosing a solution that provides various authentication methods layered with a user– friendly experience. The chosen IdP should protect the user identities without making it challenging or painful for the end user.

Aleksandra Gjinovska,

Technical Lead at IT Labs

How Can You Get More Effective with DevOps?

The promise of DevOps is to provide a basis of collaboration between organisations and IT that produces superior customer value..

One of the most recent changes in operations and delivery, surrounding IT, has been a new awareness of how critical support for ongoing operations is, and how the value chain should continue to improve for the various businesses that IT supports. For large and fair-sized corporations, where the core business is anything but IT, popular opinion suggests that IT is perceived to consume company profits—essentially, a cost centre. This has led to a high level of scrutiny in recent years, especially with tighter budgets and shrinking profits. The need for strict control of expenditures in IT development and support to keep the network running smoothly is a growing concern for the CIO and CFO. In effect, this has turned into a struggle between the need to maintain tight operations with minimum expenditures and resources while attempting to maximise support and continuing to produce results.

The industry has seen this trend growing over the past few years, and the introduction of lean and agile methodologies has enabled management teams within these organisations to understand how to get more value when spending less. Gone are the days of an overabundance of IT jobs, where IT departments are populated with more personnel than needed to anticipate possible unforeseen emergencies. Times have changed. Today, IT is leaner and forced to be more efficient through better software, written specifically for enterprises, with enhanced development tools and superior support structures.

Introducing DevOps

Information services teams within organisations have certainly matured over the past fifteen years. The bar has indeed been raised from even just a few years ago. IT departments have arrived at a point where the work they do is primarily for organisational self-consumption and self-sustainability.

Organisations are getting smarter, workplaces are getting more secure, and technologies are becoming more sophisticated with every six-month cycle. The concept of accomplishing more with less is driving organisations towards better, more strategic management of resources and people, being more efficient, and generating high business value while maximising profits.

DevOps, or development operations, is a term used to define a specialised set of resources and people who supply de¬sired processes of efficiency and agility. This is designed to make organisations and their IT departments smarter and more productive while reducing defects. DevOps assists in generating higher business value for the organisation while simultaneously lessening costs. This specialised grouping of re¬sources and people wasn’t conceived yesterday—instead, this has always existed within the IT realm of application management and support.

So, What Has Changed?

DevOps is simply the result of technology’s continual quest to find something new and refreshing to refer to year after year. Certainly, DevOps sounds trendy and exciting. On a serious note, DevOps is gaining a lot of ground within structured IT management and operations circles. DevOps is not a fad; it is here to stay. Although the terminology might change over the years, the underlying integrity benefits will not. The prime purpose of creating a structure around DevOps is quintessential. Organisations don’t just see development costs as a benchmark by which to indicate the product’s quality, value, and profitability. Now, organisations take the perspective that cost saved is of more value when strategically invested toward the betterment of technologies. This results in operational gains, which promote success in business and help attain more customers while successfully moving forward.

DevOps assists in generating higher business value for the organisation while simultaneously lessening costs.

DevOps provides a more cooperative, productive partnership between development and operations teams, through fostering improved communications and efficiency during critical planning and development stages. Thus reducing or eliminating potential costs and problems down the road commonly linked with unforeseen changes. Typically, most personnel involved with DevOps, apply Agile and enterprise principles that help result in the successful deployment of DevOps processes.

DevOps focuses on typical key product development issues, such as testing and delivery, while stressing the business value of processes beyond release management, such as maintenance updates. This desired outcome is accomplished through the adop¬tion of iterative methods and incremental build models of development. Each milestone is carefully evaluated by the product development teams, analysed, and modified as needed. Only then does the team continue with the build and, ultimately, deployment. This continuous integration might seem tedious, but these frequent checks and balances will make the entire deployment process smoother, and more effective in the long run, as the need for backtracking and correcting mistakes is minimised.

The iterative approach for DevOps is in contrast with other more traditional methods. It emphasises the importance of strategic partnering of development team members. Thus, promoting communication among crucial personnel, inviting every team members critical input to be considered, thus streamlining the development process even further. Communication and feedback are viewed as essentials to reducing production costs, delivering business value, IT stability, and efficiency. This path to more effective communication requires an excellent communication infrastructure in place to ensure that nothing gets missed and that no team member is out of the loop. All these activities can appear to be daunting, especially with geographically distributed teams that include a diverse number of resources. That is where the value of change management and release management is demonstrated. All team members must be aware of the expectations of them, and their full participation is agreed-upon. Some typical communication challenges to be managed can include:

  • “I didn’t know we were supposed to do that.”
  • “My team doesn’t have the expertise, time, or resources to complete this milestone.”
  • “Team A didn’t communicate to Team B what their requirements were.”

Any potential problems can be avoided by carefully managing and facilitating communication among team members, so there are no lingering surprises to be uncovered.

Going Forward with DevOps

Ultimately, the broad definition of DevOps is simply a method to foster effective communications and collaboration amongst development and operations team members. It’s about:

  • delivering more with less
  • working smarter, not harder
  • getting things done quicker

The rise of social media and cloud computing necessitates the rapid, effective deployment of new IT systems. It addresses the critical need for fewer maintenance releases while recognising the unacceptability of the word ‘downtime’.

IT developers know the importance of business value, and DevOps helps them accomplish that by delivering faster product solutions, eliminating problems, and introducing added value through reduced costs and network and system stability. In addition to fostering communication and trust between departments, DevOps team members also should learn some new skills—all of which has a positive trickle-down effect and ultimately leaves a significant, positive impact on the organization.

Manoj Khanna

Chief Methodologist at IT Labs

E2E development challenges and overview of the E2E testing frameworks

E2E Development challenges

End to end (E2E) testing plays a significant role in the quality of any product, and it is undoubtedly more than desirable for it to be part of the process of developing software solutions. If we look at the test pyramid, we will see the relationship of these tests as part of the bigger picture. As you can see, they are not additional or optional tests. So why do we rarely see them in projects?


Although in this pyramid diagram, these E2E tests look like a small part of the testing process, they are in fact, a big part of the testing and quality strategy.

Taking the iceberg analogy, in terms of impact, they hold considerable value. Yet because they are not as visible as other testing regimes to the development cycle and teams, they are often given lower priority and are left for better times when the conditions allow for it. Which as many of us know, is never.

The conditions that cause this missed opportunity can vary from lack of funding, limited experience or availability of time to invest in it.

However, in recent years, E2E testing has shown a resurgence, where the popularity and value of the approach are having a breath of new life breathed back into it. The increased speed of new software releases and features is increasingly crucial to whether a product will succeed or be overtaken by the competition. Significant innovation and integration with other systems also play an impactful role here. Manual testing of all substantial changes in short periods can no longer guarantee quality and the desired end-user experience. In the next image below, you can see a simple application and all the relevant and essential environments.



So to embrace this essential testing approach, its necessary to add a new environment. Not surprisingly, this is called the E2E test environment, where tests can be executed continuously without false-positives or false-negatives. So when tests, no matter how many times run, almost always give the same results. The more complex the system is, and the more dependent it is on other components, the more likely it is that one of the tests will fail due to problems with consistent behaviour of components. For example, communication with the database may be slow or non-existent; or the identity provider is not available in a given moment, etc.

The need for speedy software development has led to the development of various tools and libraries that will allow large parts of the software dependencies to be minimized. So now we can separate the development environment in the previous diagram, allowing each member of the team to develop, maintain and use E2E tests in isolation (in their local environment).


This means that we need to set all the configurations for the system in the source code. i.e. part of the software. So for each change made, team members are presented with the same changes to their environment automatically. (e.g. upgrading database version: if it's not configuration in the code, developers would have to upgrade the database version manually).

So this solves one of the biggest obstacles to development, E2E testing and maintenance. The following diagram shows us what that environment would look like.


Similar to local setup configurations, E2E tests are part of the code base, and maintained in parallel with system changes. So the tests can be created/changed/performed independently of other developers, allowing for rapid development of tests local to a given developers setup. If tests are run for extended periods (a few hours for example), there is the risk of these local systems (such as the developers PC) affecting the performance of the system under test. In this case, a more stable local environment is needed, one that is insulated from local interference. e.g. CPU resources being used by other applications. There are also techniques on grouping the tests, where some are executed on a local environment for the current changes. All other tests that are not directly related to the change being performed on another test environment.

So it’s crucial to choose the right framework for developing the E2E tests based on the software requirements and what the frameworks offer.

E2E testing frameworks

There are several environments for the execution of the E2E automated tests. By simulating the user flow from start to finish, the completion of this testing will not only validate the system under test but will also ensure that all other systems work and behave as expected.

It should also be noted that with E2E tests, we don't need to check all possible scenarios. This is because much of the test coverage will already have been done with the unit tests. The idea here is we want to check that those units all work together as they should as an integrated user flow.

So, before we proceed to develop E2E tests, we need to choose an appropriate framework that will satisfy our needs. In the remainder of the article, let's look at the JavaScript-based testing frameworks and wrappers. The reason for focusing on JavaScript is because most companies nowadays are making a shift-left move for testing, i.e. moving left in the project timeline and performing tests earlier in the development lifecycle. So, for this reason, developers can develop the E2E tests as part of their development practices using the language they are very familiar with.


Nightwatch is one of the most popular test automation frameworks that include highly transparent and readable code. Uses Selenium WebDriver API and allows the user to perform end to end testing, by simplifying the process of writing automated tests and setting up CI in the development cycle.

The syntax of Nightwatch is clean, in the respect that it only uses JavaScript as a supported language, with CSS and XPath for locating the elements. You can still find the elements by id, which is the most desired locator, by converting them to CSS first.

Nightwatch has several features that make it a highly popular testing tool. It has a built-in test runner that has many useful options such as sequential and parallel run of tests. It also allows the user to set implicit waits on the tests, or to retry a test execution for failed tests. Users can group test suites and add tags. Another powerful feature is, it has an inbuilt command-line test runner suite with Grunt support for executing the automated tests.

Compared to other frameworks, cloud support is one of its most significant benefits. It means tests can be run on a specific browser version on SauceLabs and BrowserStack, avoiding users needing to have multiple installations of any particular browser on their machines.

Compared to other frameworks, cloud support is one of its most significant benefits. It means tests can be run on a specific browser version on SauceLabs and BrowserStack, avoiding users needing to have multiple installations of any particular browser on their machines.

As mentioned previously, Nightwatch supports CI, allowing tests to be created and integrated with systems such as TeamCity, Jenkins and Hudson.

The main advantages of using Nightwatch.js are the following:

  • Quick setup
  • Supports multiple browsers
  • Real browser is not required for test execution, supports headless browser execution, and supports cross-browser on the desired browser version
  • Supports good test organization
  • Supports 3rd party integration with Cucumber
  • Compatible with CI Integration

The main disadvantages of using Nightwatch.js are the following:

  • Weak documentation - you may not find all the information you need, compared to other frameworks on the market that have excellent resources.
  • Slow performance – test executions may have long periods with lots of waits
  • Slightly less available support compared to other frameworks

Cypress is a relatively new product that has gained industry trust in a relatively short amount of time, with many satisfied users.

The major plus of using this framework is that it is an all-in-one, with assertion library, mocking and stubbing, and no connection or usage of Selenium. With all of this in mind, users won't need to install multiple tools to set up the environment where they will write and run the tests. A good illustration of this is provided in the image below. You will observe on the left-hand side numerous tools need installing to run E2E tests. However, on the right side, there is Cypress all on its own, doing all of that by itself.

Cypress is also more developer-centric and focuses on making TDD part of the development process for its users. Since it's not Selenium-based, like most of the other testing frameworks, it has a different architecture to Selenium. Selenium-based testing frameworks use WebDriver, which runs remotely outside of the browser. Whereas for Cypress, it runs inside the browser, helping to provide more consistent test results. This also means test execution is a lot faster than any other tool.

Cypress also takes a screenshot of every step of the test, which makes the debugging process more straightforward. There is also comprehensive documentation to make the development of the tests easier for developers.

Cypress also supports visual regression, but their approach is more manual compared to other tools. Additionally, they support, a cloud-based visual testing tool that neatly integrates into Cypress, allowing for visual regression testing.

The main advantages of using Cypress are the following:

  • Easy to set up and write E2E tests
  • Test runner makes debugging straightforward
  • It has excellent documentation and support
  • Fast test execution
  • Takes a screenshot of each executed step (no matter if we get a pass or fail of a test)
  • Compatible with CI Integration

The main disadvantages of using Cypress are the following:

  • Some functions are still not built-in and require workarounds and libraries (e.g. file upload, SSO Login (like login with AWS Cognito) and many others)
  • You need to pay to unlock the full version (free version supports up to 500 tests)


Puppeteer is a NodeJS library that runs headless by default but can be run in a browser as well (note: only in Chrome or Chromium's latest versions). Every single run creates its browser user profile which is cleaned on every next run.

It has some of the same functionalities as Cypress, such as clicking and filling out fields etc. but also can handle popups and SSO logins, which Cypress cannot.

Puppeteer can be used in combination with other frameworks; for example, it can fill in the functionality gaps that Cypress has. For instance, SSO login, since the first E2E test will be the login step that will be reused in many more tests.

The main advantages of using Puppeteer are the following:

  • Runs a real browser when executing the tests, which means the quality of the test is much higher
  • It also supports headless browser execution
  • Works well for visual testing
  • Supports testing in offline mode
  • Can take screenshots of webpages

The main disadvantages of using Puppeteer are:

  • Not suitable for cross-browser testing since it supports Chromium only


It's one of the earliest testing frameworks for automated E2E testing and is dedicated to testing Angular JS and Angular software. It allows for cross-browser testing since this library wraps Selenium, but also has additional locators for selecting elements like repeater, model, binding, etc.

It also supports the Page Object model, so the tests and locators can be organized in a desired manner.

Installation is easy, and Protractor has good documentation of the API.

The big plus for the framework is automatic waiting. Protractor is capable of executing the next step of the test at the exact moment the web application finishes pending tasks.

The main advantages of using are the following:

  • Allows use of angular specific commands (identifying the elements for angular.js)
  • Runs on multiple browsers in parallel
  • A real browser is required for test execution, supporting headless browser execution and cross-browser (via selenium utility)
  • Compatible with CI Integration

The main disadvantages of using Protractor are:

  • Hard to debug
  • Returning a value from a protractor promise is hard
  • You cannot simulate real user


All testing frameworks have their pros and cons. Selecting the right one depends on several factors.

It is essential the framework that you use has excellent support, and the engineers that are in charge of developing the E2E tests are familiar with it. Don't use a tool just because it's popular and widely used. If the team lacks the knowledge to use it fully, they won't be able to take full advantage of it.

Depending on the software requirements, it is essential to factor in if the tests need to be executed on multiple browsers and browser versions. Or for example, your application only supports Chrome, thus making various browser executions irrelevant.

Maybe your project has specific CI infrastructure so you will need to take into consideration how the E2E framework you are using is compatible with it.

To summarise, choose the right framework, based on criteria that suits your capabilities and needs.

Aleksandra Angelovska
QA Lead at IT Labs

Jovica Krstevski
Technical Lead at IT Labs