Coder Social home page Coder Social logo

Comments (9)

imrehg avatar imrehg commented on July 22, 2024 1

I also hit this issue while working on changes for #32814.

I've wanted to add, that this happens when parsing the response from the server (ie. already have to have something set up, and talking to some end point), and when it happens, there's no request nor response log shown, which makes it more difficult to debug (since it's response parsing, but it's unclear what's the response that triggers it. For example in my case most of the tested API endpoints work just fine, but some reliably triggers this issue.

from airbyte.

tturkenitz avatar tturkenitz commented on July 22, 2024 1

Hey @imrehg, thanks a lot for picking this up and providing a solution. I'm going to implement the MR on my end and test if it fixes my issue too. I'll report back with my results shortly.

from airbyte.

tturkenitz avatar tturkenitz commented on July 22, 2024 1

Hi @imrehg , your patch works fine and behaves as expected.

Looks good and hopefully it can be merged and released in future versions soon.
Really appreciate you taking the lead on this šŸ™‡

from airbyte.

imrehg avatar imrehg commented on July 22, 2024

@tturkenitz could you try the proposed changes in the above PR, if you have a chance? Edit: It's slightly different from your workaround, but the spirit is the same (ie. setting defaults when "type" not available) it should be getting to the root of the problem, see the comment below.

from airbyte.

imrehg avatar imrehg commented on July 22, 2024

I got my debug system working and actually it's likely a more complex edge case than the linked MR. The node values that triggers for me has this value:

{
    "anyOf": [
        {"type": ["boolean", "null", "string"]},
        {
            "type": "object",
            "properties": {
                "type": {"type": "string"},
                "interval": {"type": "string"},
                "maxValue": {"type": "number"},
                "minValue": {"type": "number"},
                "currencyCode": {"type": "string"},
            },
        },
    ]
}

Thus what happens is not that "there's no type info", it's that "the type info is not correctly extracted from an anyOf?

I've updated the MR to fix the issues correctly for my case, I do wonder if these changes would work for you too, @tturkenitz (or your issue is triggered by some other cleanup edge case)? Is there a chance that you can test-drive this PR?

from airbyte.

tturkenitz avatar tturkenitz commented on July 22, 2024

I've tested the fix on my end, but it doesn't resolve my specific issue. The type field is missing from the document entirely, and re-adding it as STRING resolves the problem. This occurs when I interact with the Coupa ExpenseReports endpoint and only happens when extracting more than one record. I suspect a schema difference between the records could be the issue but I'm not familiar enough with Airbyte code and my assumption is that Airbyte is able to reconcile the schemas into one master schema document when such things happen, but maybe not in this case?

I can share the schema, but it's quite large at 15k lines, and Iā€™m not sure if it reflects the missing type accurately. It seems to be the schema Airbyte retrieved from the first document.

Adding

if node.get("type", "") == "":
    node["type"] = "string"

does solve the issue for me, but I doubt it's production ready code šŸ˜…

from airbyte.

imrehg avatar imrehg commented on July 22, 2024

@tturkenitz for testing could you try a debug step?

In schema_inferrer.py:

below that line add:

            if "type" not in node:
                print(node)

which should print the node data in the logs when the cleaning step fails. This would show what's your offending schema content.

Let me know if any of this is unclear šŸ˜¬

I would be surprised if the schema inferrer wouldn't have any info to go on it (the type doesn't come from your source, but the tool that looks at the source's response).

I suspect a schema difference between the records could be the issue

That's totally the case for my breakage as well, that's when the inferrer would end up with an anyOf entry (a collection of variations of the data encoded in them), and the handling of that anyOf is the problematic bit in the code.

from airbyte.

tturkenitz avatar tturkenitz commented on July 22, 2024

@imrehg,

I captured the malformed schema. It seems I was actually wrong, type does exist in the schema, but it is set to Null. It confuses me, because my solution assumed that type is completly missing and it adds it back to the schema. But maybe, type is removed by Airbyte as it processes the schema and my solution re-adds it? Lots of assumptions, sorry!

Here is the node and you can see that there are multiple attributes like parent-id, salesforce-id and avatar-thumb-url where type is null.

{
    "anyOf": [
        {
            "type": "string"
        },
        {
            "type": "object",
            "properties": {
                "parent-id": {
                    "type": "null"
                },
                "lookup": {
                    "type": "object",
                    "properties": {
                        "content-groups": {
                            "type": "array",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "updated-by": {
                                        "type": "object",
                                        "properties": {
                                            "salesforce-id": {
                                                "type": "null"
                                            },
                                            "avatar-thumb-url": {
                                                "type": "null"
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    ]
}

This is the log row header:

airbyte-connector-builder-server  | 2024-06-06 14:14:47 INFO i.a.w.i.VersionedAirbyteStreamFactory(logMalformedLogMessage):390

from airbyte.

imrehg avatar imrehg commented on July 22, 2024

Hey @tturkenitz thanks for passing on the node information!

I'm skeptical of the null causing any issues. I think schema inferrer getting null would happen if all the examples the inferrer has seen had null as the value (and APIs can send that back: send a field name but setting it explicitly to null, rather than only sending the field if it has a non-null value).

Instead, your troublesome node seems to have the same characteristics as mine (anyOf that isn't just a null and something else as two entries)

I've tested your node info, and I've found that.

  1. current master branch indeed breaks with that KeyError
  2. the patch from #39146 works correctly with your example too, so seems to be addressing the issue

Sorry for sounding basic, could you check if you were actually using the patched version of the CDK from that merge request?

Direct testing

For reference, since I don't have access to the source you are using, I've used a simple code for direct testing, directly feeding. Install the version of the CDK in a Python environment and run the script. With the patch I get `Success`, with the CDK from `master` it's the usual failure. (click the triangle to expand the code)
import airbyte_cdk as cdk

node = {
    "anyOf": [
        {
            "type": "string"
        },
        {
            "type": "object",
            "properties": {
                "id": {
                    "type": "number"
                },
                "created-at": {
                    "type": "string"
                },
                "updated-at": {
                    "type": "string"
                },
                "active": {
                    "type": "boolean"
                },
                "name": {
                    "type": "string"
                },
                "description": {
                    "type": "string"
                },
                "external-ref-num": {
                    "type": "string"
                },
                "external-ref-code": {
                    "type": "string"
                },
                "parent-id": {
                    "type": "null"
                },
                "lookup-id": {
                    "type": "number"
                },
                "depth": {
                    "type": "number"
                },
                "is-default": {
                    "type": "boolean"
                },
                "approval-group-1": {
                    "type": "string"
                },
                "approval-user-1": {
                    "type": "string"
                },
                "approval-group-2": {
                    "type": "string"
                },
                "approval-user-2": {
                    "type": "string"
                },
                "custom-fields": {
                    "type": "object",
                    "properties": {
                        "watcher": {
                            "type": "string"
                        },
                        "watcher-group": {
                            "type": "string"
                        },
                        "requester-known-for-invoice": {
                            "type": "string"
                        },
                        "territory": {
                            "type": "string"
                        }
                    }
                },
                "lookup": {
                    "type": "object",
                    "properties": {
                        "id": {
                            "type": "number"
                        },
                        "created-at": {
                            "type": "string"
                        },
                        "updated-at": {
                            "type": "string"
                        },
                        "active": {
                            "type": "boolean"
                        },
                        "name": {
                            "type": "string"
                        },
                        "description": {
                            "type": "string"
                        },
                        "fixed-depth": {
                            "type": "boolean"
                        },
                        "level-1-name": {
                            "type": "string"
                        },
                        "level-2-name": {
                            "type": "string"
                        },
                        "level-3-name": {
                            "type": "string"
                        },
                        "level-4-name": {
                            "type": "string"
                        },
                        "level-5-name": {
                            "type": "string"
                        },
                        "level-6-name": {
                            "type": "string"
                        },
                        "level-7-name": {
                            "type": "string"
                        },
                        "level-8-name": {
                            "type": "string"
                        },
                        "level-9-name": {
                            "type": "string"
                        },
                        "level-10-name": {
                            "type": "string"
                        },
                        "content-groups": {
                            "type": "array",
                            "items": {
                                "type": "object",
                                "properties": {
                                    "id": {
                                        "type": "number"
                                    },
                                    "created-at": {
                                        "type": "string"
                                    },
                                    "updated-at": {
                                        "type": "string"
                                    },
                                    "name": {
                                        "type": "string"
                                    },
                                    "description": {
                                        "type": "string"
                                    },
                                    "updated-by": {
                                        "type": "object",
                                        "properties": {
                                            "id": {
                                                "type": "number"
                                            },
                                            "login": {
                                                "type": "string"
                                            },
                                            "email": {
                                                "type": "string"
                                            },
                                            "employee-number": {
                                                "type": "string"
                                            },
                                            "firstname": {
                                                "type": "string"
                                            },
                                            "lastname": {
                                                "type": "string"
                                            },
                                            "fullname": {
                                                "type": "string"
                                            },
                                            "salesforce-id": {
                                                "type": "null"
                                            },
                                            "avatar-thumb-url": {
                                                "type": "null"
                                            },
                                            "department-ucf": {
                                                "type": "string"
                                            },
                                            "role": {
                                                "type": "string"
                                            },
                                            "uaf": {
                                                "type": "string"
                                            },
                                            "custom-fields": {
                                                "type": "object",
                                                "properties": {
                                                    "test-employee-number": {
                                                        "type": "string"
                                                    },
                                                    "default-cost-center": {
                                                        "type": "string"
                                                    },
                                                    "frequent-buyer-training": {
                                                        "type": "boolean"
                                                    },
                                                    "approver-training": {
                                                        "type": "boolean"
                                                    },
                                                    "starter": {
                                                        "type": "boolean"
                                                    },
                                                    "coa-test": {
                                                        "type": "string"
                                                    }
                                                }
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                },
                "account-type": {
                    "type": "object",
                    "properties": {
                        "id": {
                            "type": "number"
                        },
                        "name": {
                            "type": "string"
                        }
                    }
                }
            }
        }
    ]
}

cdk.utils.SchemaInferrer._clean(None, node)
print("Success")

from airbyte.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    šŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. šŸ“ŠšŸ“ˆšŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ā¤ļø Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.