Data-Driven Feature Development: Unveiling the Hidden Consequences of Code Changes

Sharesquare.co engineering blog by R. Vincelli

8 min readJan 8, 2024

Introduction

Imagine this scenario: you are excited about introducing a shiny new feature or fixing a pesky bug in your software. You roll out the update, feeling a sense of accomplishment. But suddenly, reports start pouring in from users, complaining about things breaking that were perfectly fine before. How did this happen? It’s like a magician’s trick gone wrong, leaving you scratching your head in bewilderment.

The Concept

We have all been there. It’s what we call a regression bug — a sneaky little gremlin that slips into your code, wreaking havoc on seemingly unrelated parts of your product. Now, you might be thinking, Isn’t testing the key to catching these bugs?. Well, you are partially right, but there’s a missing puzzle piece here: data.

That’s right! When you tinker with your code, whether it’s adding a new feature or fixing a bug, you might unknowingly disrupt the delicate balance of your data model or the data itself. And let me tell you, that can have unexpected consequences for your loyal users out there in the wild.

Regression Bug: one feature forward, two bugs backward. Image credits: R. Yadav.

Scope

In this blog post, we are diving deep into the fascinating world of data-driven feature development. We are going beyond the realms of traditional testing and exploring what happens to your data when you make those code changes. We want to uncover the hidden dangers and shed light on why it’s crucial to consider the impact on production users.

So buckle up as we embark on a thrilling adventure of unraveling the secrets of data-driven feature development. By the end of this journey, you will have the tools and knowledge to ensure your software updates not only dazzle with their new capabilities but also maintain the harmony of your data ecosystem.

Ready? Let’s step into the captivating realm of code changes and their intimate dance with data. Trust me, you don’t want to miss this!

Sample Problem

In this blog we sketch a few scenarios, with a hands-on approach in the context of a Laravel-based web project.

You have been working hard on updating your User model to improve the logic for various user status values such as active, bad leaver, good leaver, etc. You have implemented new methods like isActiveNew() and isInactiveNew() to ensure the correct behavior according to the original specifications. Excitedly, you have run your tests, and they all pass with flying colors (mostly green, even if some yellow for skipped tests is allowed). Everything seems ready for deployment, but before you hit that CI/CD button, a question lingers in your mind: will these changes potentially break anything in production?

This is where a data-driven feature impact assessment comes into play. While you have taken care of updating your code and verifying its correctness, it’s essential to understand the potential consequences on real production data. You want to ensure that flipping customers from one status to another, such as from active to inactive, won’t cause unexpected issues.

Now, before we proceed, a word of caution: production data should generally be left untouched, even if you are only accessing it in a read-only manner. It’s crucial to respect data integrity and privacy. However, there are ways to assess the potential impact without directly manipulating production data. Let’s explore those options.

One approach is to analyze a representative sample of production data. By examining a subset of customer records, you can gain insights into the potential impact of your code changes. For example, you can identify which customers will be affected and to what extent if their statuses are flipped from active to inactive.

Pub quiz: does that while loop condition ever evaluate to false? Image credits: Krishna Pandey on Unsplash.

Assessing Potential Impact

Let’s consider the code examples provided below. In this code example, we checked for users with more than one status. This assessment helps identify potential data inconsistencies and ensures that users have the expected single status assigned.

$users = \PlanUser::all();
$usersWithMoreThanOneStatus = $users->filter(function ($user, $_) {
  return count($user->statuses()->get()) > 1;
});
count($usersWithMoreThanOneStatus);
count($users);

This code snippet retrieves all users from the PlanUser model and filters out users who have more than one status associated with them. The count of users with more than one status and the count of all users are then calculated.

Potential Impact — example one

By examining the users with more than one status, you can identify potential data inconsistencies. It helps you pinpoint situations where a user’s status might have been mistakenly assigned multiple times. This can have implications on how the system interprets and treats these users, leading to incorrect calculations and reporting, or even worse a degraded user experience together with serious bugs.

Below are code examples focused on comparing the old and new status values for various status types such as active, inactive, invited, good leaver, bad leaver, and draft. By filtering users with status mismatches, you can pinpoint potential discrepancies in how the system interprets and treats user statuses.

use Carbon\Carbon;
$users = \PlanUser::all();
$usersWithActiveStatusMismatch = $users->filter(function ($user, $_) {
  return $user->isActive() != $user->isActiveOld();
});

$usersWithInactiveStatusMismatch = $users->filter(function ($user, $_) {
  return $user->isInactive() != $user->isInactiveOld();
});

$usersWithInvitedStatusMismatch = $users->filter(function ($user, $_) {
  return $user->isInvited() != $user->isInvitedOld();
});

$usersWithGoodLeaverStatusMismatch = $users->filter(function ($user, $_) {
  return $user->isGoodLeaver() != $user->isGoodLeaverOld();
});

$usersWithBadLeaverStatusMismatch = $users->filter(function ($user, $_) {
  return $user->isBadLeaver() != $user->isBadLeaverOld();
});

$usersWithDraftStatusMismatch = $users->filter(function ($user, $_) {
  return $user->isDraft() != $user->isDraftOld();
});

count($usersWithActiveStatusMismatch);
count($usersWithInactiveStatusMismatch);
count($usersWithInvitedStatusMismatch);
count($usersWithGoodLeaverStatusMismatch);
count($usersWithBadLeaverStatusMismatch);
count($usersWithDraftStatusMismatch);


$users = \PlanUser::all();
$usersWithMoreThanOneStatus = $users->filter(function ($user, $_) {
  return count($user->statuses()->get()) > 1;
});
count($usersWithMoreThanOneStatus);
count($users);

This code block compares the current status of users with their old status values. It filters out users who have mismatching status between the current and old values for various status types like active, inactive, invited, good leaver, bad leaver and draft. Finally, the counts of users with status mismatches are calculated for each category.

Potential Impact — example two

By identifying status mismatches, you can uncover discrepancies that occurred during code changes. These mismatches can indicate inconsistencies in how the system handles user statuses, potentially leading to incorrect access permissions, notification triggers, or other critical functionalities. Addressing these mismatches ensures that the system behaves consistently and accurately, providing a reliable user experience.

$usersWithDraftStatusMismatch->map(function ($user, $_) {
  return $user->plan_id;
})->distinct();

In this code snippet, for users with a draft status mismatch, the plan IDs associated with those users are extracted and returned as a distinct list.

Potential Impact — example three

When there are draft status mismatches, it’s crucial to analyze the plan IDs to determine if any specific plans are affected. This information helps in understanding the scope of impact and allows for targeted fixes or investigations. By isolating the plans related to the draft status mismatches, you can ensure that the data and functionality related to these plans are consistent and accurate.

For instance, imagine if many customers were flipped from active to inactive due to your code changes. By assessing production data, you can identify the customers who will be impacted by this transition and understand the scope of the change. This knowledge empowers you to proactively address any issues that may arise, such as ensuring access permissions, notifications, or other critical functionalities remain accurate and consistent.

PHPUnit is a robust testing framework designed for PHP that enables developers to write automated tests for their code. By crafting tests using PHPUnit, you can systematically check various parts of your codebase to uncover issues before they manifest in production. in our particular case please pay attention to below unit test.

No bugs Luca, Bravo! Image credits: Luca Bravo on Unsplash.

Setting up a test suit

use Tests\TestCase;
use App\Models\PlanUser;
use Illuminate\Foundation\Testing\RefreshDatabase;
use Faker\Factory as Faker;

class FeatureImpactTest extends TestCase
{
    use RefreshDatabase;

    protected $faker;

    protected function setUp(): void
    {
        parent::setUp();
        $this->faker = Faker::create();
    }

    ...............
}

Checking for Users with More Than One Status

......

class FeatureImpactTest extends TestCase
{
    ...............

    public function it_checks_for_users_with_more_than_one_status()
    {
        // Creating 10 user records
        $users = PlanUser::factory()->count(10)->create();

        // Simulating users with more than one status
        $userWithMoreThanOneStatus = $users->random();
        $userWithMoreThanOneStatus->statuses()->attach([1, 2]);

        // Querying for users with more than one status
        $usersWithMoreThanOneStatus = PlanUser::has('statuses', '>', 1)->get();

        // Asserting that the result is not empty
        $this->assertNotEmpty($usersWithMoreThanOneStatus);
    }

    ...............
}

In this test, we use Laravel’s built-in testing functionalities to create a set of user records. Then, we artificially introduce a situation where a user unexpectedly has more than one status. Through querying, we identify users with such discrepancies and assert that the result isn’t empty. This serves as a red flag, indicating the presence of potential data inconsistencies.

Checking for Status Mismatches

public function it_checks_for_status_mismatches()
  {
      // Creating 10 user records
      $users = PlanUser::factory()->count(10)->create();

      // Simulating status mismatches
      $userWithMismatchedStatus = $users->random();
      $userWithMismatchedStatus->update([
          'status' => 'active',
          'old_status' => 'inactive',
      ]);

      // Querying for users with status mismatches
      $usersWithMismatchedStatus = PlanUser::where(function ($query) {
          $query->whereColumn('status', '!=', 'old_status');
      })->get();

      // Asserting that the result isn't empty
      $this->assertNotEmpty($usersWithMismatchedStatus);
  }

Our second PHPUnit test focuses on precisely this scenario. We’re scrutinizing status changes to verify that old and new statuses align as intended. Let’s dive into the test code:

In this test, we craft a scenario where a user’s old and new statuses don’t align as expected. We update a user’s status and its corresponding old_status field to intentionally create a mismatch. The test then queries for users with such mismatches and asserts that the result isn’t empty.

This test case serves as a safeguard against unintentional status discrepancies. By systematically assessing the alignment of old and new statuses, we’re ensuring that code changes don’t trigger unexpected shifts in user behavior.

If only I had one **La Chouffe** for every bug I fixed… Image credits: Alexander Schimmeck on Unsplash.

Conclusion

In conclusion, understanding the impact of code changes on production data is a vital aspect of software development. While we strive to improve our systems and introduce new features, we must also consider the potential consequences for our users. By conducting data-driven feature impact assessments, we can proactively identify and address any issues that may arise. This approach not only ensures data integrity and consistency but also enhances the overall user experience. So, as you embark on your development journey, remember the power of analyzing production data, respecting its sanctity, and making informed decisions to deliver reliable and seamless software to your valued customers.

Blog by Riccardo Vincelli and Rohit Yadav brought to you by the engineering team at Sharesquare.