Comments (6)
Yngve, yes automated testing is a big priority for us.
Could you expand on what you would be testing in the case where you are loading an example database?
There are two tiers of testing that we're thinking about right now.
1. Do the queries run, and does everything work?
The idea here would be to never serve an error message to your end users that was the result of a bad SQL query or otherwise.
When you go to generate your site to serve it to people in a production environment, we can block it from deploying if it's throwing errors. Then, instead of serving a broken report, we'd just serve the last build of the report + a warning 'data is delayed'
We'll get a lot of this 'for free' by just deploying an evidence project on vercel or netlify.
2. Is the source data still reliable?
You may still be able to successfully execute a query against your DB, and make a chart, but if there is a problem in your ETL process, that query might return erroneous data. In our experience, this is a much more common failure case in reporting applications, and it is much more damaging to user trust.
Right now, we don't have plans to build our own testing suite for this type of thing. Instead, we're going to build tooling to plug into other data quality & testing tools. dbt 'exposures' is our first priority here.
Again, the end result will be that updates to your reports are blocked while you have failing tests up stream.
Stale data > wrong data.
from evidence.
Great points.
I would add a third question which I don’t think is covered by your tiers: Do the queries return the expected results?
The rationale is that there is usually some sort of business logic in queries, and I would sleep better at night knowing that this logic is tested. I don’t want to be in a situation where the system “somehow creates some charts and who knows if they are correct”.
For example, let’s say that I have the following users
table.
id | email_confirmed_at | … |
---|---|---|
1 | NULL |
… |
2 | 2021-08-06 10:56:44.244319 | … |
If I have a query called number_of_users_with_confirmed_email
, then I would like to assert that the query returns 1
for this example table.
from evidence.
@yhoiseth. This is fantastic, thank you for taking the time to explain it to me.
The table would be a fixture. In other words, it would be hardcoded and would not change when production data changes.
That is the piece I was missing. Of course, this makes total sense.
I suspect the right answer here is going to be building thoughtful support for a variety of existing test frameworks. The interesting challenge here is the mix of:
- Existing test frameworks for web applications (jest comes to mind), and;
- Unit testing your SQL queries on known limited data sets
I don't have a clear answer on this yet, but this definitely needs to form the third (or third and fourth) track of our thinking around testing.
from evidence.
@yhoiseth I am struggling a bit with this example.
To stay with your table, if tomorrow user id 1 confirms their email, wouldn't you want your query's result to change to 2? If so, wouldn't your test fail?
Put another way, if you know the result of the query in advance, what is the purpose of writing the query at all?
That said, in tools like dbt you can do assertions. Usually people do these things to assert that a column is unique, or not null, but they can also be used to ensure that some historical known fact hasn't changed. E.g. 2019 revenue should always sum to $30M, and if it doesn't the test should fail.
Is there a data testing tool or library that you are thinking about that I should look into?
from evidence.
I am struggling a bit with this example.
No problem :)
To stay with your table, if tomorrow user id 1 confirms their email, wouldn't you want your query's result to change to 2? If so, wouldn't your test fail?
The table would be a fixture. In other words, it would be hardcoded and would not change when production data changes.
Let me try to illustrate with an example. If I did something similar in Django, number_of_users_with_confirmed_email
might be a function:
def number_of_users_with_confirmed_email() -> int:
return self.objects.filter(email_confirmed_at__isnull=False).count()
A test for this function might look something like this:
from django.test import TestCase
from myapp.models import User
class UserTestCase(TestCase):
def setUp(self):
User.objects.create(email_confirmed_at=None)
User.objects.create(email_confirmed_at="2021-08-06 10:56:44.244319")
def test_confirmed_email_count(self):
self.assertEqual(number_of_users_with_confirmed_email(), 1)
Put another way, if you know the result of the query in advance, what is the purpose of writing the query at all?
I know the result of the query on a small, simple dataset, as I can manually check it. But I don’t know the result of the query on my production dataset, as that is too big to manually check. That is why I need the query.
That said, in tools like dbt you can do assertions. Usually people do these things to assert that a column is unique, or not null, but they can also be used to ensure that some historical known fact hasn't changed. E.g. 2019 revenue should always sum to $30M, and if it doesn't the test should fail.
That’s good (I don’t know dbt), but it doesn’t protect my team from making mistakes in the queries we are writing in Evidence.
Is there a data testing tool or library that you are thinking about that I should look into?
My thinking is that when I am using Evidence, I am making an application. If I want to do it properly, I should test it like I would any other application. That way, development is more enjoyable and I can be reasonably sure that things work, even if many people who don’t know the codebase are working on it at the same time, we don’t have QA people, etc.
I’m not a testing expert, so I don’t have any strong opinions on what would be a good inspiration for testing in Evidence. I can provide some pointers, though:
First, most web frameworks have something similar to Django’s testing tools.
Second, it is possible to write tests in SQL, so this might be a good approach for Evidence. See pgTAP. (I have never written tests in SQL myself.)
Third, a slick BDD implementation might be the best approach. I could, for example, write the above test in the following way.
Feature: Number of users with confirmed email
Scenario: One has confirmed, another has not
Given the following users exist:
| id | email_confirmed_at |
| 1 | NULL |
| 2 | 2021-08-06 10:56:44.244319 |
When I check the number of users with confirmed email
Then the result is 1
See, e.g., https://cucumber.io.
from evidence.
Best current strategy
npm run build:strict
https://docs.evidence.dev/cli/#commands
More to come here!
from evidence.
Related Issues (20)
- DropDown Not Showing Values (Possible Regression) HOT 1
- Big Fan of Your Service, But I have a Few Suggestions. HOT 1
- Statistical Testing Features
- Reference line and area components aren't reactive HOT 2
- QueryLoad not flagging empty query results, breaks emptySet HOT 1
- npm run build failed HOT 3
- Map for plotting points of interest
- Accurate line references in errors
- Add aggregation to the value component
- Respect frontmatter sidebar_link in breadcrumbs
- Native XmR Charts HOT 5
- Incorrect DateRange preset for "Last month" HOT 1
- Dropdown list shows gap when list of options is small
- Apply colorscale to subtotals and totalRow in DataTable HOT 2
- Dropdown search does not work HOT 2
- generated tabs are broken in latest update HOT 6
- BigValue downIsGood not working properly
- Add slot state for unresolved queries in `QueryLoad`
- can't install duckdb HOT 2
- Invalid date being rendered HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from evidence.