Coder Social home page Coder Social logo

Comments (22)

okal avatar okal commented on August 31, 2024 2

I think the fix should be on the client side, but that the test cases should be updated. + to space is standard encoding. I think it could get really messy if we start adding exceptions to how those are interpreted.

from onadata.

kelvin-muchiri avatar kelvin-muchiri commented on August 31, 2024 1

@machariamuguku For number 2, it appears its working fine with only the caveat that ?query={"_date_modified": {"$gt": "2023-07-18T19:09:52"}} will also include the record whose _date_modified is "2023-07-18T19:09:52" in addition to records whose _date_modified is greater than "2023-07-18T19:09:52". The workaround is starting from the next millisecond such "2023-07-18T19:09:53". Or what did you mean by matching the exact filter?

from onadata.

ukanga avatar ukanga commented on August 31, 2024 1

The path to resolution:

  1. Store the date and time in ISO format that includes the milliseconds and timezone time in the JSON field data for all date fields generated serverside. This is not the case at the moment.
"_date_modified": "2023-06-14T14:52:06",

To:

"_date_modified": "2023-06-14T14:52:06.940826+00:00",
  1. The defect is not in how the logical filters are applied but in the data used since it is not complete as stored in the database where the milliseconds are excluded from the API data returned but are present in the data stored.
  2. Will be investigated and resolved
{
  "detail": "invalid input syntax for type timestamp with time zone: \"2020-12-18T09:36:19.767455 00:00\"\nLINE 1: ...65119) AND deleted_at IS NULL AND date_modified > '2020-12-1...\n                                                             ^\n"
}
  1. Will be investigated and resolved.

from onadata.

okal avatar okal commented on August 31, 2024 1

@okal @ukanga I think updating the tests will not change anything. I think what we should update is the documentation.

Yeah, I think you have a point @kelvin-muchiri. It was an invalid input, and resulted in a meaningful error message to the client. I do wonder if there might be a better way to signal to clients what's going on. Might be something to talk through with @machariamuguku, to see if there might have been a way to make the issue more obvious to someone scanning the response.

from onadata.

okal avatar okal commented on August 31, 2024 1

@ukanga The dates in get_full_dict (_date_modified_, _submission_time) are always 1 step behind the model's fields.

auto_now and auto_now_add are updated when calling Model.save. That means line [DATE_MODIFIED] = self.date_modified.strftime(MONGO_STRFTIME) will always be assigned the previous value of date_modified during save while date_modified is now updated with the current time.

Therefore Instance.json.get("_date_modified") will always be referencing the previous date_modified value which is incorrect. I have manually tested this to verify

Should we be updating the json inside a post_save signal instead?

Hm. I think, in a perfect world, we shouldn't be tracking this info in two different places. Might the suggested approach result in auto_now being called yet again, since the json is a property of the model, which is being modified in its own post-save signal? And wouldn't that result in the same problem, and be a possible recipe for resource contention? Maybe instead of having auto_now on the date_modified field, the code could use the same explicitly set timestamp for both the field and the json (in an overridden save method, or something similar).

from onadata.

kelvin-muchiri avatar kelvin-muchiri commented on August 31, 2024 1

@ukanga Thanks, I think I might need to create a new issue for this and handle it in a separate PR?

from onadata.

ukanga avatar ukanga commented on August 31, 2024

@machariamuguku

  1. Why is the _date_modified relevant? The _date_modified historically was only added if a record had been edited. If it has not been edited then there is no reason the record should have _date_modified. Note: this does not mean the submission data was edited, just the DB record was modified in in most situations it may not touch the XML and JSON submission itself.
    Why is the _submission_time patchy? Can you explain? this represents when on the server the record was received.
    The _last_edited field when present does indicate that the XML/JSON submission has been modified and most likely edited either via Enketo or the CSV upload edit option.
    How is the current implementation affecting integration with the understanding that this fields are and can be optional?
    The Reasons we may not include them always are:
    1. Minimise the data size on the JSON column by including fields that are not necessary unless that activity has been noted.
    2. The presence of the field indicates a particular action occurred, we do not have to maintain a separate field for such activities.

from onadata.

kelvin-muchiri avatar kelvin-muchiri commented on August 31, 2024

The invalid input syntax for type..... is being thrown because + in the URL is being parsed into a white space. This is what is received by the API '{"_date_modified": {"$gt": "2023-09-18T13:57:26.231579 00:00"}}' (notice the + has been replaced by whitespace)

The work around that can be applied at the moment is to encode it to %2b so 2023-07-05T16:56:32.172+02:00 becomes 2023-07-05T16:56:32.172%2b02:00 when making a request

Also the reason this was not caught by our tests is because the test cases testing this functionality use APIRequestFactory instead of APIClient. APIClient mimicks the browser. While using it, I have been able to replicate the bug in the test cases cc @ukanga @machariamuguku

from onadata.

kelvin-muchiri avatar kelvin-muchiri commented on August 31, 2024

@ukanga Should we have the client amend the URL to encode the + or should we have the API apply fix to reformat the value as desired?

from onadata.

machariamuguku avatar machariamuguku commented on August 31, 2024

@kelvin-muchiri The API client I'm using encodes the request and it still returns the error:

This request:

?query={"_date_modified": {"$gt": "2020-12-18T09:36:19.767455+00:00"}}&page_size=1

Is encoded to

?query=%7B%22_date_modified%22%3A%20%7B%22%24gt%22%3A%20%222020-12-18T09%3A36%3A19.767455%2B00%3A00%22%7D%7D&page_size=1

But it still fails with:

{
  "detail": "invalid input syntax for type timestamp with time zone: \"2020-12-18T09:36:19.767455 00:00\"\nLINE 1: ...65119) AND deleted_at IS NULL AND date_modified > '2020-12-1...\n                                                             ^\n"
}

Looks like a decoding error

from onadata.

kelvin-muchiri avatar kelvin-muchiri commented on August 31, 2024

@machariamuguku When I test on the browser without the encoding, I get the 400 error but with the encoding, I am getting a successful response

from onadata.

ukanga avatar ukanga commented on August 31, 2024

@machariamuguku Maybe you need to encode the value of the values - in this case, encoding this portion by itself before including it in the query - "2020-12-18T09:36:19.767455+00:00" to "2020-12-18T09:36:19.767455%2b00:00".

from onadata.

machariamuguku avatar machariamuguku commented on August 31, 2024

@ukanga @kelvin-muchiri My bad, turns out the request is encoded properly (%2B) by the ETL connector client but is sent unencoded (+) by API clients (postman and thunder). This works

from onadata.

kelvin-muchiri avatar kelvin-muchiri commented on August 31, 2024

@okal @ukanga I think updating the tests will not change anything. I think what we should update is the documentation.

from onadata.

kelvin-muchiri avatar kelvin-muchiri commented on August 31, 2024

@ukanga The dates in get_full_dict (_date_modified_, _submission_time) are always 1 step behind the model's fields.

auto_now and auto_now_add are updated when calling Model.save. That means line [DATE_MODIFIED] = self.date_modified.strftime(MONGO_STRFTIME) will always be assigned the previous value of date_modified during save while date_modified is now updated with the current time.

Therefore Instance.json.get("_date_modified") will always be referencing the previous date_modified value which is incorrect. I have manually tested this to verify

Should we be updating the json inside a post_save signal instead?

from onadata.

kelvin-muchiri avatar kelvin-muchiri commented on August 31, 2024

Might the suggested approach result in auto_now being called yet again, since the json is a property of the model

Thanks @okal hadn't thought of this case, leads us back to square 1

from onadata.

kelvin-muchiri avatar kelvin-muchiri commented on August 31, 2024

@okal I agree, an alternative option would be to set them explicitly and remove reliance of auto_now and auto_now_add

from onadata.

kelvin-muchiri avatar kelvin-muchiri commented on August 31, 2024

@okal I was of the opinion the metadata can be added when serializing the response, but @ukanga pointed out that the reason the metadata is stored in the DB during save is to ensure no further calculation is done when rendering the response.

However, can't we have the update happen when rendering the response then rely on caching to store the results for some time X thus reducing the frequency of recalculations?

from onadata.

kelvin-muchiri avatar kelvin-muchiri commented on August 31, 2024

But from what I see from the code is that alot is going on such as getting the attachments, notes e.t.c which are expensive operations which makes sense to perform them before hand

from onadata.

ukanga avatar ukanga commented on August 31, 2024

We may then need to switch to using the default=timezone.now instead of auto_now and auto_add_now as recommended in the docs. And then ensure we update the date_modified field in the pre_save() method. That should have the desired results of all data lining up.

from onadata.

kelvin-muchiri avatar kelvin-muchiri commented on August 31, 2024

@ukanga I have create a separate issue Instance model date_modified and date_created fields are out of sync with their aliases in the json field

from onadata.

kelvin-muchiri avatar kelvin-muchiri commented on August 31, 2024

After further analysis, it turned out that this issue and issue #2480 are mutually inclusive and have been handled by the same PR #2477

from onadata.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.