There are a lot of different versions of Testing Pyramid available on different web sites. Here is the correct one:
Each layer of testing pyramid tests the integration of adjacent lower layer artifacts, except for Unit testing, because it is the lowest layer. Testing at each level consist of both functional and non-functional test cases. Avoid using term "Integration testing", because it does not tell what is integrated!
Unit testing tests the smallest units of software. Typically the smallest unit of software is a function. Each class is tested through its public interface (public functions). There should be a test suite for each public function in a class. There should be max 4-7 test cases inside a test suite. If you have more test cases in a test suite, your function is too big and possibly doing too many things and it should be split into multiple smaller functions.
In Unit testing, you should always aim for 100% test coverage. Using modern tools, this should be possible. And yes, you have to test all the exceptional cases also! Unit tests should be straightforward and small and they should run fast. Only when you have 100% unit test coverage, you can trust that each function alone is working flawlessly. Having 100% unit test coverage helps you design and write next level test cases: SW component tests. In SW component tests you don't have to test the functionality inside functions, but you solely concentrate on testing integration of these individual functions so that they work together and each calling function has understood the interface of called function correctly.
In addition to verifying your implementation, Unit test suites and cases act as a living low-level documentation of you software. They describe how a certain class or function should work.
SW component testing
SW component testing is about integrating units of software to a single software component. SW component tests are more complex than unit tests and there are clearly less test cases than in Unit testing. You should have all your SW component tests automated. SW component tests use mocks for other SW component that they use. Typically other used SW components are other microservices like REST APIs or databases. If you have a React UI SW component, you can use TestCafe's feature to mock backend REST API calls. Or if you have a Java backend SW component, you can use H2 in-memory database for mocking a real database and use WireMock for mocking other backend microservices that are used by the Java backend SW component.
I recommend to use Behavior Driven Development (BDD) and Cucumber and Gherkin syntax for defining test suites and test cases. In Gherkin, you divide your SW component into multiple features which correspond to test suites and which consist of one or more scenarios which correspond to individual test cases. These features and scenarios act as living documentation of you SW component. Because the features and scenarios are written in plain English language they are understandable by a wide audience including product owner and manager.
Some sources tell that you should have roughly 50% coverage of code lines for SW component testing. However, code line coverage is not a good measure for SW component testing effectiveness. 50 % test code line coverage does not tell us what we have tested and we haven't tested. It might that we have just duplicated many tests that are already tested in Unit testing or we have tested things that should have been tested in Unit testing. It's wrong to test something at SW component level that could be tested in Unit testing, because writing and executing Unit tests are faster.
SW component tests should solely concentrate on integration of software units (functions) and test all function call graphs (from first called function to last called functions). We test the interfaces of functions, but not the functions themselves which are already tested in Unit testing phase. We would benefit from a testing tool that would be able to measure if we tested all possible function call graphs. And also here we should aim to 100% coverage of function call graphs.
Testing function interfaces should reveal problems where semantics of function parameters are misunderstood. The syntax of function calls should be checked by a static type checker, but semantics cannot be checked. Let's assume we have a function with two number type parameters. It is possible that function caller misunderstands the interface and supplies these two parameters in wrong order. Only way you can spot this problem is to write SW component tests. You can't find this problem in Unit testing. For this reason, I recommend that you should have all your function parameters of different types which avoids this kind of problems. You can also make it clearer by naming the function correctly:
In you IDE, you should also enable parameter hints which helps to avoid this problem also.
From SOLID principles, remember also Open-Closed Principle: Open for extension but closed for modification. Don't modify your function signature, but introduce new function instead or add an optional parameter to the end of you function's parameter list.
Let's take a simple example of a product backlog feature to implement a way to create new customer and get list of existing customers. The acceptance criteria for this feature is defined by product manager and product owner. Acceptance criteria:
User creates a new customer with following info: customer name, address, email and phone number. Customer info is stored to a persistent storage for future retrieval
User can request a list of existing (created) customers
User can request customer by name
We shall implement this as a REST API. We will have one class called CustomerService, which will contain three methods, one handling new Customer creation request, second for handling a request for all existing customers and last method handling request for a customer by name. Then we have another class called CustomerStore that has tree methods for handling database actions. Because of this simple case, we can see that there are three possible function call graphs:
CustomerService::createCustomer => CustomerStore::storeCustomer
CustomerService::getCustomers => CustomerStore::findCustomers
CustomerService::getCustomerByName => CustomerStore::findCustomerByName
Let's make this a bit more realistic by adding a requirement that some error is displayed if a customer cannot be found by a name. There will a new function call graph where exception is thrown and handled by an ExceptionMapper:
CustomerService::getCustomerByName => CustomerStore::findCustomerByName => CustomerNotFoundExceptionMapper::toResponse
Now we need to write the SW component level tests. Because we are only testing the REST API component, we will mock the database. Let's write these SW component level tests in Gherkin syntax:
Someone might think that this level of testing results in 100% code line coverage, but it is not true. In very simple cases, like this one, the code line coverage can be closer to 100% than 50%. But in bigger and more realistic cases, the code line coverage is closer to 50% than 100%. This is because there are cases where a function can call some other function with different parameters, for example in if and else branches. Usually we need to test only one of these branches. And there are many so-called leaf functions that do not call other functions of the SW component. You need to test only one case, the typical case, of each such leaf function. The leaf function can contain code in if-else-elseif and switch blocks for other special cases. These cases are tested in Unit testing.
System testing consist of two levels of testing: Subsystem test (SW component integration tests) and End-to-End (E-2-E) tests.
Subsystem Testing (SW component integration testing)
SW component integration testing tests integration of several SW components, but it does not test integration of all SW components or end-to-end testing. SW component integration testing mocks services that it does not integrate. This type of testing is typical when developing a new software system where certain SW components don't exist yet, and they must be mocked. When SW system matures and all SW components are ready it is advisable to merge existing SW component integration tests into End-to-end tests that integrate the whole SW system without mocking any SW components.
I recommend to use Behavior Driven Development (BDD) and Cucumber and Gherkin syntax for defining test suites and test cases. In Gherkin, you divide your SW into multiple features which correspond to test suites and which consist of one or more scenarios which correspond to individual test cases. These features and scenarios act as living documentation of you SW component. Because the features and scenarios are written in plain English language they are understandable by a wide audience including product owner and manager.
Let's have an example using following SW system architecture:
We have a team A, which is developing Data Explorer (Service + GUI) and then we have a team B which is developing both Data Ingester and Data Writer. Team A will start to write a SW component integration test that integrates Database, Data Explorer Service and Data Explorer GUI together. Similarly, team B start to write a SW component integration test that integrates Data Ingester, Message Queue, Data Writer and Database together.
Now we have two separate System tests which are not End-to-End tests (one covers the red path and the other covers the blue path:
We have one problem here. If blue path is expecting something different from database that red path is actually producing there, then we have a bug. Let's call E-2-E testing to help!
In E-2-E testing we are testing the different full paths of dependencies between SW components. This is similar to SW component tests where we test different graphs of function calls in function call hierarchy.
When our teams have mature enough SW components, the two system test cases they have produced can be merged into a one system test case that is end-to-end:
So this E-2-E test case tests that data supplied to Data Ingester is somehow available and visible in Data Explorer GUI.
I recommend to use Behavior Driven Development (BDD) and Cucumber and Gherkin syntax for defining test suites and test cases in System testing. In Gherkin, you divide your system into multiple features which correspond to test suites and which consist of one or more scenarios which correspond to individual test cases. These features and scenarios act as living documentation of your SW system. Because the features and scenarios are written in plain English language they are understandable by a wide audience including product owner and manager.
In solution testing, multiple SW systems are integrated together. In System testing phase, you must mock all external interfaces (southbound and northbound interfaces). In solution testing, you install multiple SW systems in same environment. The purpose of solution testing is check that different SW systems work together.
Interfaces between SW systems are typically quite stable, but they change. The focus in Solution testing is to test compatibility of different versions of integrated SW systems.
Non-functional testing is typically done in System testing phase and can be done in SW component testing phase or Solution testing phase. The following are the most important and typical Non-functional testing categories:
Stability testing involves testing a system with a typical production load, over a continuous period of time. The time period is typically from several days to couple of weeks. Stability testing is also sometimes called Endurance testing or Soak testing, but I like the term Stability testing best.
In stability testing the goal is to verify that SW system performs reliably for the test period. There shall not be crashes, expected resource usage, e.g. no memory leaks or performance degradation. Performance degradation can happen for example when first empty database is starting to fill up. Stability testing should verify the scalability of stateful services, i.e. databases when relevant. Stability testing should also verify the recovery from artificially introduced failures.
Stability testing is executed in production-like environment(s) as part of System testing. Stability testing can be automated.
The goal of Performance testing is to find the upper limits of SW system capacity and compare it to expected values. Performance testing should also test the scalability of the SW system: When load is increased, the stateless services of system should scale up and when load is decreased, stateless services should scale down. Performance testing is executed in production-like environment(s) as part of System testing. Performance testing can be automated.
The role of security testing is to test that system is secure. Security testing is a broad scope and I won't handle it here completely. Security testing should verify the SW system is safe from attacks, like SQL injection, Denial-of-Service, Man-in-the-Middle attacks. It should also verify that sensitive data both at-rest and in-transit is stored encrypted. Security testing is mostly done as part of System testing. At least parts of Security testing can and should be automated using appropriate security testing tools. Security tests should written and executed by security professionals. If you lack security competence in your teams, you should use external services provided by security experts.
Usability testing tests that you SW system is as usable as possible by its end-users. This testing is performed as part of GUI SW component testing by UX/UI designers, but you can also use experts from outside. Usability tests are hard to automate and they are usually done manually.
There is no Acceptance Testing or Acceptance Testing Driven Development (ATDD). ATDD can give you a false assumption that testing of only acceptance criteria is sufficient. Acceptance criteria for epic/feature is written by product management together with Product Owner in development team. Acceptance criteria is reflected in some of functional test cases, especially focused in successful or "happy path" scenarios. Acceptance criteria is verified by product management in iteration/system demos.