Skip to main content

The search of efficiency and quality is critical in the currect economic state, where teams are struggling with shortage of resources. The need for quick releases and the complexity of programs combined make automated testing more and more essential. Fundamentally, automated testing is using software tools to run preset tests on a software application. Automated testing has several benefits that raise software development techniques to new heights, unlike manual testing, which is subject to human mistake and time limits.

Automated testing has the main advantage of speeding up the testing procedure. Developers may quickly verify software functioning and receive faster feedback on code modifications and release cycles by automating time-consuming and repetitive test cases. Together with increasing productivity, this quick feedback loop helps teams spot and fix problems early in the development process, which lowers the expense and work of bug patches.

Additionally, automated testing increases test coverage by making a wide range of test scenarios possible to be run in various settings and environments. This thorough testing coverage guarantees the software’s resilience and dependability by pointing up flaws that might escape manual testing. Because tests yield consistent findings each time they are run, automated testing also encourages uniformity and repeatability in the testing process.

Why Regression Testing


Regression testing, a vital component of automated testing, plays a pivotal role in maintaining software stability while facilitating rapid development cycles. This testing approach involves re-running test cases to ensure that changes made to the software, such as code modifications or bug fixes, have not adversely affected previously working functionality.

The positive aspect of regression testing, it guarantees that fresh code modifications do not bring back bugs or regressions into the program, so preserving its stability and reliability. Regression testing lowers the possibility that problems may appear in production by methodically retesting the program following every code update.


Regression testing does, however, potentially provide difficulties. For big and complicated applications in particular, it can take a long time, which can cause release cycle delays. To keep up with application changes, a complete suite of regression tests also needs constant work, which over time raises maintenance overhead.

Regression testing, which balances efficiency and stability to provide end users with high-quality software, is nevertheless an essential component of software development in spite of these difficulties.




Regression testing is essential to the continuity and high quality of software releases in the context of Continuous Integration and Continuous Deployment (CI/CD). Through automation of the build, test, and deployment stages, CI/CD pipelines expedite the software delivery process. Regression testing is essential under this paradigm to preserving software quality while facilitating quick iteration and deployment.

Regression tests are included into automated testing in CI/CD pipelines to guarantee that every code change is thoroughly validated against current functionality. Teams may quickly find and fix any regressions brought about by new code modifications by automating the execution of regression tests, therefore preventing them from spreading to production.


Regression testing also functions as a deployment gate in continuous integration/continuous delivery pipelines, guaranteeing that builds are advanced to production only after passing every regression test. In this way, the integrity and dependability of the software are preserved and errors or regressions are prevented from entering the production environment.

Even with its vital function, CI/CD pipeline integration of regression testing necessitates careful analysis of test coverage, execution time, and automation efficiency. Code changes can be validated more quickly without sacrificing quality by using selective testing techniques to rank important test cases according to the breadth of changes.

Regression testing, then, is a crucial part of CI/CD pipelines because it guarantees the quality and continuity of software releases and allows for quick iteration and deployment in the hectic development environment of today.


Test Cases


Regression testing typically involves re-running test cases to ensure that changes made in the software, such as code modifications or bug fixes, have not adversely affected previously working functionality. For this type of automated testing you need to pre-define your Test Scenarios.

These scenarios can/should involve any/all type of the expected issues. Missing input files, changes in the input file names, differences in column names, types or structure, unexpected special caracters in the inputs. If the workflow has switches/decision points, then different scenarios of the parameters.

If your workflow has mathematical calculations, you can test column types, extremely small or big numbers to ensure your workflow can handle them, or to see if it gives the expected error outputs.
Typos, special characters in the file/column names, if your inputs are user inputs on a portal, test against SQL Injection.

If your workflow requires internet connection, you can also set up test cases for timeout, no internet connection, etc.

You can also include test cases for performance test, where you have 50-100x of your expected inputs, to see if your workflow runs into memory issues, or you can use Timer Info nodes to see the runtime if each iteration.

Probably the best way to think about these test cases: Your testing will be exactly as accurate as your test cases are detailed and well-defined.

An include indicator is also a nice to have: Most probably you wont want to test every test cases, only the part where your changes could affect the workflow. This way you can filter your automation to the test cases which you want to run.



Folder Structure


Once you have defined your test cases, which are now suitable for manual testing, you need to set up your folder structure to be able to automatize.

  • In your test environment, you need to set up a test folder, specific to the version of the tested workflow, or the test period.
  • For each test case, you need to create a subfolder, which contains:
    • an Input folder
    • an Output folder
    • a To_Compare folder

Input folder

In this folder you will have the test case specific input files

Output folder

Your workflow will create the actual output of the current run here

To_Compare folder

A manually set up “Output Folder” with the desired outputs. The Testflow will compare the actual output with these files/folders, if they matches or are below thresholds, the TC will PASS





The most efficient way to achieve this is to manually create a golden version, where all of your inputs are present, like running the workflow in an optimal conditions, where no error should be expected. This could also be your “Test Case 1”, where everything is perfect.

At this point you have a prepared Input folder, and another 2 empty folders. The most convenient way is to replicate this test case folder via Workflow or a python/VBA script, depending on your preferred technology. Once you created the test folders for all of your test cases, you need to modify the content of your Input folders according to your test cases.

Example Test Cases



In Test Case 1, everything is according to the optimal run, all of the Input files are in their place, the To_Compare folder should also contain the optimal desired outputs.


In this Test Case, the Input files are missing.

According to the acceptance criteria, the workflow should create a file called “Error_list” with the message: “Input_File1, Input FileN” is missing.

As you already created the files/folders, you need to delete the input file(s) from the Input folder, and add the Error_list file to the To_Compare folder, in the exact format how your wf should create it, so the testflow will read this file, and mark the TC as PASS.


Getting Started


Once you made all of the preparations, you reached the part where you need a test automation workflow.

The simplest way to develop a testflow, is to think about how would you manually test the outputs of your workflow.

  1. Checking if all of the required files are created, by comparing the file names in the actual output (Output folder) and the expected outputs (To_Compare folder)
  2. Separating the missing files and the files present in both folders to prepare them for further analysis
  3. Analyzing the created output files and comparing them with the expected output files
  4. Creating information from the collected data for the specific test case
  5. Repeating 1-4 points on each test case
  6. Collecting the Results
  7. Report with Visualization, Data Apps or other action items to fix or deploy



Checking Differences In Folders Content


To be able to compare the files, you need to extract the file names first for both Folders (Output, To_Compare). You will also need to clean the filename, by removing the timestamps, usernames, or other variables added to them.

By setting up rules in both folders (file present only in Output -> unexpected file, file present only in To_Compare -> missing file, etc) you are starting to generate the output messages. Grouping and formatting these messages will give you the final output message:




To prepare your analysis to the next step you will need to filter the results to files which exists in both folders, you don’t want to waste resources to compare files with empty inputs.

You will also need to prepare your testflow to each scenarios to avoid any errors: file present in both folders, one of the folders are empty, both folders are empty.

In other case, your testflow will fail.


Comparing Files


When it comes to file comparison, you have two ways, both has it’s advantages and disadvantages.

The easiest way to compare files is simply using KNIME’s Testing Framework UI nodes: the Table Difference Checker and the Table Difference Finder nodes.

Pros: no development is needed, quick and reliable solution.

Cons: They are really basic comparisons, without detailed outputs, kind of a simple Y/N answer to the question: Are these table exactly the same?

For example, working with bigger datasets, your Joiner can mix up the row order, but your case is not order-sensitive, your test case will fail, even if the acceptance criteria is met.


The other way to compare these files is to develop a custom comparator component, mixing these nodes.

This way you can have the freedom to set the rules what do you accept as same, or passing criteria’s.

Just a few examples, what you can achieve by a custom comparator:

  • Check column names/types
  • Set up upper/lower thresholds
  • Set up custom error messages
  • Specify the exact location of the differences
  • Reorder the rows by your key
  • Compare non-tabular data, other file types by converting them to binary format
  • Set up any rules that fits to your workflow, testing approach

With this approach you will get much useful error messages, instead of a simple “Different” message, you will see “Different Column Names”, “Same Table Specs, but different values”, “Same values until Row34213”

You can also set up a matrix, by getting a TRUE/FALSE value in each compared cell, marking the exact location of the differences. For bigger datasets, you can also set up rules to check only x % of the cells, similarly to the native solution

It comes with a one-time development cost, which can return quickly by saving effort on bugfixes.



After finishing with the comparison, you will need to wrap the results, marking them with the exact file name, and the test cases, merge them with the previous issues present in the current test case, and put in a loop to run your workflow in each test scenarios.


Collecting Results


Once your testflow finished with all of the selected test cases, as the output of your test case loop, you will get separate rows for each issues in each iteration.


With some GroupBy and data manipulation nodes, you can easily pre-format your data into a more manageable format, what you can show on your dashboards.


You can count the amount of issues per test cases, visualize the number of the passed/failed cases, highlight the most problematic output files or test cases, to point out where could your team do some branch-specific improvement to make your workflow less error-prone. You can also build a Data App to share your findings with your team.


data app


Workflow Invocation


Another key elements of the automated testing in KNIME are the workflow invocation nodes. They allow you to call and run an already existing workflow.

In case if your workflow already developed in a modular manner, where a main workflow calls the sub-workflows your workflow is already prepared for this testing approach. In case if your workflow uses Reader and Writer nodes for in/outputs, you can simply replace your input nodes to the Workflow Invocation Input nodes (Container Input/Workflow Services Input) and your output nodes to to the Container Output/Workflow Services Output nodes. After these steps, your Callee Workflow (the workflow you want to test) is prepared,  the next step is to add the caller Node (Call Workflow / Call Workflow Services) to your testflow to be able to call and run the tested workflow.


In this case, you need to add the Caller Node inside the loop which iterates through the test cases, to run the tested workflow in each I/O folders.

This may sound confusing at first, to learn more about this topic, I recommend you to take the official KNIME L3-CD course, or read our article about the topic, which will be released in the upcoming weeks.


Triggers on KNIME HUB



At this point, we are almost done, if your testflow finished, that means you were able to manually run it through the test cases. We are not done yet, the KNIME Hub has another variety of features to offer, let’s focus on the triggers.

What are triggers? You can define them as a set of actions that are executed automatically whenever previously set conditions are met.

In ETL aspect, you can set up your processes to run automatically whenever a file is added/modified in a folder, to keep your database always up to date.

In testing aspect, you can also set up triggers for Components/Workflows, in this case run the testflow, whenever a new version of the workflow is available, new workflow is added to the folder, workflow is modified, depending on the structure or the practice that you are storing your workflows.

You can also set up a monitoring workflow that collects the results of your testflow. If the results are written to files, you can set up triggers to the Test Results folder, whenever a new file is present, that workflow can analyze, create reports with dashboards about the performance of the tested workflow, send out alert emails if actions are needed.

With triggers, you can also keep clean your test folders, archive your test results, moving the output files in a specific folder, keeping it for few months, in case if a new issue found, you can analyze when has the issue started.



Gabor Zombory, Data Engineer, Datraction

Leave a Reply