pitchmuc / aepp Goto Github PK
View Code? Open in Web Editor NEWAdobe Experience Platform API for humans
License: Apache License 2.0
Adobe Experience Platform API for humans
License: Apache License 2.0
Support policy evaluation of usage labels for an intended marketing action so that users of the package can handle policy enforcement in their Python code.
Add methods to the Policy
class in policy.py
to make requests to the /marketingActions/{{namespace}}/{{marketing action name}}/constraints
endpoint of the Policy Service API (API reference)
There are 2 policy evaluation GET requests within the Policy Service API that have not yet been implemented in aepp:
The two GET requests can be consolidated into one method by taking namespace ("core" or "custom")
The POST requests for evaluating datasets against a marketing action are covered in issue #50
local variable 'params' referenced before assignment
Exception has occurred: UnboundLocalError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
local variable 'params' referenced before assignment
This happens because kwargs is an empty object in my case. Params should be defined before that.
if kwargs.get("properties", None) is not None:
params = {"properties": kwargs.get("properties", "title,$id")}
Currently it seems the only function to enable a dataset for profile with enableDatasetProfile
is restricted to doing upserts. The tag for upserts should likely be optional so one can enable a dataset for inserts or upserts depending on the use case.
The call to enableDatasetProfile
seems to succeed, but it also seems to have no effect on the dataset, and I've verified on multiple occasions the "Profile" toggle is not flipped on even after this. Seems like some kind of bug, I haven't looked at the root cause.
Right now if you call getSchemas
but there is some authentication issue, it fails with some intermediate stack trace that obfsucates what is happening.
Right now when we call createSourceConnectionDataLake
it will not fetch the dataset name, so the resulting destination contains an empty dataset name.
We should modify this function so it automatically fetches the name and embed that into the payload so it shows up nicely in the UI.
The goal is to work with really large datasets and extract the results of large queries into a spark dataframe, this will allow us to work with pqs and spark to do large scale feature transformation and processing, we'll need to do a little design around this before implementation
Right now everything seems to be hard-coded to use the prod environment. This includes:
config.py
connector.py
We should support additional environments like stage or int. Ideally it's something that could be added in the config.json
file or directly via the aepp.configure
call.
We will need to make changes to config.py
and connector.py
to fetch the environment at runtime and modify the URLs accordingly to IMS and experience platform to be respectively ims-na1-stg1.adobelogin.com
and experience-stage.adobe.com
Currently the library is very much tailored to JWT-based authentication.
However it would be really useful to also support service tokens where you just have a client ID, client secret and auth code. This would allow this library to be used directly in services, and some APIs are also only accessible via service tokens.
Using mocking and patching add pytests to all APIs
Reading a .csv
file into a python dictionary for data upload, does not work at the moment. It maybe due to the flat structure of csv files.
Request to support upload of a csv file, by just taking the localFilePath
as input, where reading and converting the file into a hierarchical structure is handled by the wrapper.
Currently in the createDataSets
method there is no way to pass labels and tags programmatically. You can do it most likely with the data
parameter (haven't tried) but this is cumbersome as it requires passing the entire payload.
We would like to add extra parameters to this function to add system_labels: list[str]
and tags: dict[str, list[str]]
so it's completely transparent and easy to manipulate.
Possibly we can go further and abstract that fully for some use case, like for example if I want to create a dataset that is profile enabled it would be nice to just have to call createDataSets(..., profile_enabled=True)
Similarly, having the ability to update system labels and tags could be added in datasets.py
module.
It would be useful to have a module for miscellaneous functions that can be useful. For example stuff we use:
We need to have a CI/CD build system that runs unit tests when code gets merged and possibly publishes a new version of the library using poetry publish to pypi
Get Schema list succesfully from a Sandbox
Schemas are not returned
We are trying to use Aepp wrapper in order to automate the provisioning of the schemas, etc.
However when we try to use the it we noticed that AEPP wrapper is supplying a start parameter. When the start parameter is not provided from the request the default is set to 0. So either way the start parameter is provided. In this case the result is an empty list as it expects other parameters such as order by etc. Shouldn't the default behavior be without providing the start parameter at all?
use
schemaConnection = schema.Schema()
schemas = schemaConnection.getSchemas()
Trying to enable a dataset for profile ingestion with enableDatasetProfile
in catalog.py
is failing due to invalid content type:
Out[248]: {'type': '/placeholder/type/uri', 'status': 400, 'title': 'BadRequestError', 'detail': 'Content-type does not match json-patch body format, value of Content-type should be application/json-patch+json.'}
In the method streamMessage inside the DataIngestion class, data
is expected to be a dict. There exists a checking as follows:
if data is None and type(data) != dict:
raise Exception("Require a dictionary to be send for ingestion")
The exception tells the user that it expects data
to be a dictionary.
Rereading the code,
if data is None and type(data) != dict:
raise Exception("Require a dictionary to be send for ingestion")
When data
is not None, no matter what type data
is, that condition will always evaluate to False
, leading to it not raising exception when data is, say, a string.
One potential common pitfall is to type data
as a string instead of a dict.
Currently createSchedule
requires sql
param to be passed. However that seems incorrect because templateId
can be passed in which case the sql
is not needed.
In fact currently specifying both sql
and templateId
causes an error in PQS, see below:
{'message': 'requirement failed: Only one of sql and templateId must be defined',
'statusCode': 400}
We should accept calling this function as valid if templateId
is passed without sql
. In the meantime we can still use the full object.
It would be very useful to support UPS Exports that are available via API - both for profiles, events and profile+events.
For example to export profile data aggregated with events:
{
"filter": {
"segmentQualificationTime": {
"startTime": "2022-12-04T00:00:00Z",
"endTime": "2023-01-04T00:00:00Z"
},
"emptyProfiles": false
},
"additionalFields": {
"eventList": {
"filter": {
"fromIngestTimestamp": "2022-12-04T00:00:00Z",
"toIngestTimestamp": "2023-01-04T00:00:00Z"
}
}
},
"destination": {
"datasetId": "{{upsExportDataset}}",
"segmentPerBatch": false
},
"schema": {
"name": "_xdm.context.profile"
},
"properties": {
"checkBatchStatusForSuccess": true
}
}
I can help with that if you are open to it as I've been intimately familiar with this API for the past few months and there's a few caveats for best practices in what should be provided by the user.
Currently the small file API support in ingestion
seems incomplete, as in the method uploadSmallFile
it expects data
of type Union[list, dict]
which works fine for JSON, but for Parquet format you would be passing in bytes, I couldn't figure out how to get this working with some Parquet data so ended up using JSON. But I think either the prototype of the function needs to change since the notion of multiline isn't really applicable to Parquet, or we need to actually change the code to fully support Parquet binary payload.
Support policy evaluation of datasets against a marketing action so that users of the package can handle policy enforcement in their Python code.
Add a method to the Policy
class in policy.py
to make POST requests to the /marketingActions/{{namespace}}/{{marketing action name}}/constraints
endpoint of the Policy Service API (API reference)
There are 2 policy evaluation requests in the Policy Service API that have not yet been implemented in aepp
:
The 2 POST requests can be consolidated into 1 method by taking the namespace ("core" or "custom") as a parameter
A method to implement the GET requests for evaluating a set of DULE labels against a marketing action is covered in Issue #51
We should have a module to support that.
The method - enableSchemaForRealTime
in the schema module, currently only accepts meta:altId
attribute for a schema. The method does not support the $id
attribute at the moment.
It could be enhanced to accept either - meta:altId
or $id
for enabling a schema for RT.
Thanks a lot again for your work on this!
Currently extend a soft enum field for event types is only possible via API so being able to update other Schema Descriptors would help to do this.
I did not manage to do this with the current Python wrapper? Can you help here?
If there is no runs returned in getRuns
then it will fail because the _links
won't be set in the response:
File /usr/local/lib/python3.10/site-packages/aepp/flowservice.py:756, in FlowService.getRuns(self, limit, n_results, prop, **kwargs)
754 res: dict = self.connector.getData(self.endpoint + path, params=params)
755 items: list = res["items"]
--> 756 nextPage = res["_links"].get("next", {}).get("href", "")
757 while nextPage != "" and len(items) < float(n_results):
758 token: str = res["_links"]["next"].get("href", "")
KeyError: '_links'
It should just return an empty array.
see #33 (comment) for context
While creating a new datasets, profile and identity are enabled.
While creating a new datasets, profile and identity are not enabled.
While calling createDataSets I'm sending profileEnabled=True, identityEnabled=True
But in the UI I see that it's not enabled.
connection.createDataSets(name=name, schemaId=schema_id, profileEnabled=True, identityEnabled=True)
In catalog.py following part causing the issue:
if profileEnabled:
data['tags']["unifiedProfile"] = ["enabled: true"]
if identityEnabled:
data['tags']["unifiedIdentity"] = ["enabled: true"]`
There shouldn't be any space in enabled: true
I created the following object
tags = {
"unifiedProfile": [
"enabled:true"
],
"unifiedIdentity": [
"enabled:true"
]
}
Then passed as a parameter to createDataSets and verified in the UI that they are enabled
connection.createDataSets(name=name, schemaId=schema_id, tags=tags )
Current:
The uploadSmallFile
currently takes a python dictionary as the input for data to be ingested.
Suggested:
To add an additional parameter - localFilePath
that takes the path of the json file and uses json.load()
to read the file into a dictionary. Basically, to handle the file processing part by the wrapper.
Missing the Content-type header application/json-patch+json
in the request sent for enabling the dataset for Identity (catalog.enableDatasetIdentity).
Expected behaviour is to get dataset labels on invoking the datasets.Datasets() method. I have added the config correctly as the Schemas module is working fine.
seeing this error:
AttributeError: module 'aepp.datasets' has no attribute 'Datasets'
On invoking the help method I see the class Datasets. Not sure what I'm doing wrong.
It would be useful to support the API calls to the data landing zone (DLZ) so users can retrieve their blob container and credentials.
Details on API under https://experienceleague.adobe.com/docs/experience-platform/sources/api-tutorials/create/cloud-storage/data-landing-zone.html?lang=en
This functionality is being built internally right now, and once it is available we would like to update the destination
module to trigger on-demand dataset exports.
Right now the call to create a destination just takes a raw dictionary destinationObj: dict
but it would be nice to actually be able to programmatically pass just a dataset_id: str
and not having to deal with complicated payloads.
Currently when calling most of the functions it just returns a raw JSON response, and you have to manually inspect it to retrieve what you need. It would be nice to instead return a specific object that you can directly extract known fields from.
For example when creating a dataset, to get the dataset ID you need to do something like dataset_response[0].split("/")[-1]
but we would like to change it so that we can just do dataset_response.dataset_id
Another example in catalog
module to get the table name for a dataset we have to do response[dataset_id]["tags"]["adobe/pqs/table"][0]
but it would be so much easier to use response.table_name
Currently to pass the sandbox the documentation mentions using kwargs
to pass it like this:
mySchemaConnection1 = schema.Schema({"x-sandbox-name":"mySandbox1"})
However in the orgs I have tried this on it gives an error about having an invalid source, even when just using the default prod
sandbox. I can only get this to work when not passing the sandbox at all so it defaults to the default sandbox.
We would like to add the following:
schema.Schema(sandbox_name=foo)
Currently we only fetch credentials and container for the user space, but there's a separate container and credentials for destinations in DLZ. See https://experienceleague.adobe.com/docs/experience-platform/destinations/catalog/cloud-storage/data-landing-zone.html?lang=en#connect-your-data-landing-zone-container-to-azure-storage-explorer
Hi, I'm trying to create a schema using aepp and am getting the following exception.
` 474 raise TypeError("Expecting a dictionary")
475 if "allOf" not in schema.keys():
--> 476 raise Exception(
477 "The schema must include an ‘allOf’ attribute (a list) referencing the $id of the base class the schema will implement."
478 )
Exception: The schema must include an ‘allOf’ attribute (a list) referencing the $id of the base class the schema will implement.`
I am creating the schema by running getSchema in a different sandbox and then importing it into another one by using createSchema. I made sure the field groups and other dependencies pre-exist in the new sandbox.
Any ideas on how I can get the allOf attribute that is needed to create the schema?
Currently if I try to create a descriptor with createDescriptor
in schema.py
it is failing because of the missing xdm:property
field. See error below:
Out[262]: {'type': 'http://ns.adobe.com/aep/errors/XDM-4000-400', 'title': 'Validation error', 'status': 400, 'report': {'registryRequestId': 'd754ef24-8a43-4625-9742-3cd86156fce8', 'timestamp': '02-03-2023 08:02:43', 'detailed-message': 'An error occurred validating the schema.', 'sub-errors': [{'path': '$', 'type': 'required', 'arguments': ['xdm:property'], 'message': '$.xdm:property: is missing but it is required'}]}, 'detail': 'An error occurred validating the schema.'}
The only way to get this function working now seems to be by just passing the raw object with descriptorObj
but we should modify the function to ensure it works with the parameters.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.