Coder Social home page Coder Social logo

Automatic testing about evidence HOT 6 OPEN

evidence-dev avatar evidence-dev commented on June 17, 2024 1
Automatic testing

from evidence.

Comments (6)

mcrascal avatar mcrascal commented on June 17, 2024 1

Yngve, yes automated testing is a big priority for us.

Could you expand on what you would be testing in the case where you are loading an example database?

There are two tiers of testing that we're thinking about right now.

1. Do the queries run, and does everything work?
The idea here would be to never serve an error message to your end users that was the result of a bad SQL query or otherwise.

When you go to generate your site to serve it to people in a production environment, we can block it from deploying if it's throwing errors. Then, instead of serving a broken report, we'd just serve the last build of the report + a warning 'data is delayed'

We'll get a lot of this 'for free' by just deploying an evidence project on vercel or netlify.

2. Is the source data still reliable?
You may still be able to successfully execute a query against your DB, and make a chart, but if there is a problem in your ETL process, that query might return erroneous data. In our experience, this is a much more common failure case in reporting applications, and it is much more damaging to user trust.

Right now, we don't have plans to build our own testing suite for this type of thing. Instead, we're going to build tooling to plug into other data quality & testing tools. dbt 'exposures' is our first priority here.

Again, the end result will be that updates to your reports are blocked while you have failing tests up stream.

Stale data > wrong data.

from evidence.

yhoiseth avatar yhoiseth commented on June 17, 2024 1

Great points.

I would add a third question which I don’t think is covered by your tiers: Do the queries return the expected results?

The rationale is that there is usually some sort of business logic in queries, and I would sleep better at night knowing that this logic is tested. I don’t want to be in a situation where the system “somehow creates some charts and who knows if they are correct”.

For example, let’s say that I have the following users table.

id email_confirmed_at
1 NULL
2 2021-08-06 10:56:44.244319

If I have a query called number_of_users_with_confirmed_email, then I would like to assert that the query returns 1 for this example table.

from evidence.

mcrascal avatar mcrascal commented on June 17, 2024 1

@yhoiseth. This is fantastic, thank you for taking the time to explain it to me.

The table would be a fixture. In other words, it would be hardcoded and would not change when production data changes.

That is the piece I was missing. Of course, this makes total sense.

I suspect the right answer here is going to be building thoughtful support for a variety of existing test frameworks. The interesting challenge here is the mix of:

  • Existing test frameworks for web applications (jest comes to mind), and;
  • Unit testing your SQL queries on known limited data sets

I don't have a clear answer on this yet, but this definitely needs to form the third (or third and fourth) track of our thinking around testing.

from evidence.

mcrascal avatar mcrascal commented on June 17, 2024

@yhoiseth I am struggling a bit with this example.

To stay with your table, if tomorrow user id 1 confirms their email, wouldn't you want your query's result to change to 2? If so, wouldn't your test fail?

Put another way, if you know the result of the query in advance, what is the purpose of writing the query at all?

That said, in tools like dbt you can do assertions. Usually people do these things to assert that a column is unique, or not null, but they can also be used to ensure that some historical known fact hasn't changed. E.g. 2019 revenue should always sum to $30M, and if it doesn't the test should fail.

Is there a data testing tool or library that you are thinking about that I should look into?

from evidence.

yhoiseth avatar yhoiseth commented on June 17, 2024

I am struggling a bit with this example.

No problem :)

To stay with your table, if tomorrow user id 1 confirms their email, wouldn't you want your query's result to change to 2? If so, wouldn't your test fail?

The table would be a fixture. In other words, it would be hardcoded and would not change when production data changes.

Let me try to illustrate with an example. If I did something similar in Django, number_of_users_with_confirmed_email might be a function:

def number_of_users_with_confirmed_email() -> int:
    return self.objects.filter(email_confirmed_at__isnull=False).count()

A test for this function might look something like this:

from django.test import TestCase
from myapp.models import User

class UserTestCase(TestCase):
    def setUp(self):
        User.objects.create(email_confirmed_at=None)
        User.objects.create(email_confirmed_at="2021-08-06 10:56:44.244319")

    def test_confirmed_email_count(self):
        self.assertEqual(number_of_users_with_confirmed_email(), 1)

Put another way, if you know the result of the query in advance, what is the purpose of writing the query at all?

I know the result of the query on a small, simple dataset, as I can manually check it. But I don’t know the result of the query on my production dataset, as that is too big to manually check. That is why I need the query.

That said, in tools like dbt you can do assertions. Usually people do these things to assert that a column is unique, or not null, but they can also be used to ensure that some historical known fact hasn't changed. E.g. 2019 revenue should always sum to $30M, and if it doesn't the test should fail.

That’s good (I don’t know dbt), but it doesn’t protect my team from making mistakes in the queries we are writing in Evidence.

Is there a data testing tool or library that you are thinking about that I should look into?

My thinking is that when I am using Evidence, I am making an application. If I want to do it properly, I should test it like I would any other application. That way, development is more enjoyable and I can be reasonably sure that things work, even if many people who don’t know the codebase are working on it at the same time, we don’t have QA people, etc.

I’m not a testing expert, so I don’t have any strong opinions on what would be a good inspiration for testing in Evidence. I can provide some pointers, though:

First, most web frameworks have something similar to Django’s testing tools.

Second, it is possible to write tests in SQL, so this might be a good approach for Evidence. See pgTAP. (I have never written tests in SQL myself.)

Third, a slick BDD implementation might be the best approach. I could, for example, write the above test in the following way.

Feature: Number of users with confirmed email

Scenario: One has confirmed, another has not
  Given the following users exist:
    | id  | email_confirmed_at         |
    | 1   | NULL                       |
    | 2   | 2021-08-06 10:56:44.244319 |
  When I check the number of users with confirmed email
  Then the result is 1

See, e.g., https://cucumber.io.

from evidence.

archiewood avatar archiewood commented on June 17, 2024

Best current strategy

npm run build:strict

https://docs.evidence.dev/cli/#commands

More to come here!

from evidence.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.