aws-samples / amazon-personalize-samples Goto Github PK

View Code? Open in Web Editor NEW

551.0 31.0 338.0 25.51 MB

Notebooks and examples on how to onboard and use various features of Amazon Personalize

License: MIT No Attribution

Jupyter Notebook 98.78% Python 1.19% JavaScript 0.03%

amazon-personalize-samples's Introduction

Amazon Personalize Samples

Notebooks and examples on how to onboard and use various features of Amazon Personalize

Getting Started with the Amazon Personalize

The getting_started/ folder contains a CloudFormation template that will deploy all the resources you need to build your first campaign with Amazon Personalize.

The notebooks provided can also serve as a template to building your own models with your own data. This repository is cloned into the environment so you can explore the more advanced notebooks with this approach as well.

Amazon Personalize Next Steps

The next_steps/ folder contains detailed examples of the following typical next steps in your Amazon Personalize journey. This folder contains the following advanced content:

Core Use Cases
Generative AI
- Personalized marketing campaigns
- User personalized marketing messaging with Amazon Personalize and Generative AI. - Use this sample to create personalized marketing content (for instance emails) for each user using Amazon Personalize and Amazon Bedrock. In this sample you will train an Amazon Personalize 'Top picks for you' Recommender to get personalized recommendations for each user. You will then generate a prompt that includes the user's preferences, recommendations, and demographics. Finally you will use Amazon Bedrock to generate a personalized email for each user.
- Amazon Personalize Langchain extensions
Scalable Operations examples for your Amazon Personalize deployments
- Maintaining Personalized Experiences with Machine Learning
  - This AWS Solution allows you to automate the end-to-end process of importing datasets, creating solutions and solution versions, creating and updating campaigns, creating filters, and running batch inference jobs. These processes can be run on-demand or triggered based on a schedule that you define.
- MLOps Step function (legacy)
  - This is a project to showcase how to quickly deploy a Personalize Campaign in a fully automated fashion using AWS Step Functions. To get started navigate to the ml_ops folder and follow the README instructions. This example has been replaced by the Maintaining Personalized Experiences with Machine Learning solution.
- MLOps Data Science SDK
  - This is a project to showcase how to quickly deploy a Personalize Campaign in a fully automated fashion using AWS Data Science SDK. To get started navigate to the ml_ops_ds_sdk folder and follow the README instructions.
- Personalization APIs
  - Real-time low latency API framework that sits between your applications and recommender systems such as Amazon Personalize. Provides best practice implementations of response caching, API gateway configurations, A/B testing with Amazon CloudWatch Evidently, inference-time item metadata, automatic contextual recommendations, and more.
- Lambda Examples
  - This folder starts with a basic example of integrating put_events into your Personalize Campaigns by using Lambda functions processing new data from S3. To get started navigate to the lambda_examples folder and follow the README instructions.
- Personalize Monitor
  - This project adds monitoring, alerting, a dashboard, and optimization tools for running Amazon Personalize across your AWS environments.
- Streaming Events
  - This is a project to showcase how to quickly deploy an API Layer in front of your Amazon Personalize Campaign and your Event Tracker endpoint. To get started navigate to the streaming_events folder and follow the README instructions.
Workshops
- Workshops/ folder contains a list of our most current workshops:
- Partner Integrations
  - Explore workshops demonstrating how to use Personalize with partners such as Amplitude, Braze, Optimizely, and Segment.
Data Science Tools
- The data_science/ folder contains an example on how to approach visualization of the key properties of your input datasets.
  - Missing data, duplicated events, and repeated item consumptions
  - Power-law distribution of categorical fields
  - Temporal drift analysis for cold-start applicability
  - Analysis on user-session distribution
Demos/Reference Architectures
- Retail Demo Store
  - Sample retail web application and workshop platform demonstrating how to deliver omnichannel personalized customer experiences using Amazon Personalize.
- Live Event Contextualization
  - This is a sample code base to illustrate the concept of personalization and contextualization for real-time streaming events. This blog illustrates the concept

License Summary

This sample code is made available under a modified MIT license. See the LICENSE file.

amazon-personalize-samples's People

Contributors

Stargazers

Watchers

Forkers

sethiv okeee0 imyoungyang paulmgithub wk-heisenberg stevenshim vivian5096 izzetagoren aashmeet yifeim rghidella gwulfs tonybpge rikima makeane alangixxer rohithlv dobinyim kiana58 idjustid ab-group jamshedmelik sciarrilli scbronder cruble yuan00yuan realharry melwinpais favdagic jamiekang judygab salman0 andreluiz365 whn09 rmalladi1 devender-yadav almoghoro ozukaramumaya ozukaramu jwpm01 nswamy ravis22 gianrubio gregarious9612 lilysu amandabnyc dilipmaurya kenny85916 neerajmb kpmadhan joaquin6 mrvaldemar fdrennan kzheng07 kaiyue-zheng iamshivamjaiswal syntaf amit2014 paulrigor retavemyliu ajeyamk nightlyjourney harikrishnanvr omarshehe brightdu jmianok animuk ffranceschi xujunbj alangur luluob mart0703 pennyfan1031 professormartha trnwwz preesr ayeshabaig qizixialex trivedisorabh mf523 mark-igaw darkhaamn joben jihys bigdatasciencegroup raineydavid huynhnguyen patbi lkrishnamurthy lkrishna-cs fosterleejoe jasonjklim wasimbloch rabbithx chatchai-komrangded dataanalyst4lyfe rajaneesh93 dhruvpratapsingh girishgodage phnascimento

amazon-personalize-samples's Issues

diagnose.py should accept Timestamp units

When loading data from csv file, timestamps are not always in the same units (miliseconds, seconds or nanoseconds). This leads to raising errors - date is out of bounds.

To fix this, there should be an option, to specify datetime units:
diagnose.py:234
df.index = df["TIMESTAMP"].values.astype("datetime64[ms]")

My proposal is to add optional parameter to diagnose, diagnose_interactions, diagnose_items:

def diagnose(df, users=None, items=None, timestamp_units=None):
def diagnose_items(df, items, timestamp_units=None):
def diagnose_interactions(df, timestamp_units=None):

Or there should be global constant to define that.

Bringing our own models

Hello,
I attended an official aws personalize webinar yesterday and the presenter told that we can bring our own customizable models for recommondation in AWS personalize. Upon asking, he directed me to this git account where I can find a notebook regarding that.
Can anyone help me find such examples?

[Question] Can I have a user-interactions dataset that has unique item-ids with no item metadata?

I'm looking to try personalising a user experience for people buying cars that I identify with vehicle identification numbers (VIN) which are unique to each car (so cars with same etc will have different VINs)

Would it be appropriate for me to make my user-interactions dataset with USER_IDs, ITEM_IDs (VINs) and TIMESTAMPS (car purchase dates/times), or would the unique VINs make it difficult for personalised recommendations to be made in the future? Without any item-metadata, AWS Personalize wouldn't(?) have any way of telling if VIN=xxxx is a 'similar' car to VIN=yyyy that might be listed next week for example...

An alternative is to have a new id number for cars of a certain make / model / year combination, such that the new id identifies groups of cars that are similar according to this grouping.

So is using the VIN as the ITEM_ID ok in the user-interactions dataset if I include item-metadata, or should I make a new id describing groups of cars with the same specifications?

Thanks

AWS CloudBuild for CI

https://github.com/gkrizek/codebuild-ci-sample

Create Dataset

the personalize.create_dataset() method is missing the name parameter.

Use real time events

Hello there,
I am trying to use personalize with real time event. When I push a bank csv file (with no header) to S3 and tried to use amplify to connect with personalize. But I do not know when I can ready build a model because if I choose a recipe then I got an error that I need 10000 records to ready.

Could you suggest me some ways to do it?

Thanks in advance!

05_Interacting_with_Campaigns_and_Filters.ipynb - loading in .csv file with title cast as str and then reference as int

noticed I was getting key errors when running through the notebook:
05_Interacting_with_Campaigns_and_Filters.ipynb

when reading in the dataframe from the .csv, we cast title( a unique integer value) as 'str' - but all the following code references it as being cast as 'int'.
Easy fix is to change the following:

Create a dataframe for the items by reading in the correct source CSV

items_df = pd.read_csv(dataset_dir + '/movies.csv', sep=',', usecols=[0,1], encoding='latin-1', dtype={'movieId': "object", 'title': "str"},index_col=0)
to

Create a dataframe for the items by reading in the correct source CSV

items_df = pd.read_csv(dataset_dir + '/movies.csv', sep=',', usecols=[0,1], encoding='latin-1', dtype={'movieId': "object", 'title': "int"},index_col=0)

roleArn

Missing roleArn in 2.View_Campaign_And_Interactions and personalize_sample_notebook.

Amazon Personalize User Segmentation - Required Updates

For the user segmentation Personalize example (see: https://github.com/aws-samples/amazon-personalize-samples/blob/master/next_steps/core_use_cases/user_segmentation/user_segmentation_example.ipynb) there are several outstanding issues preventing the notebook from being run successfully as-written.

The current Prime Pantry dataset links are dead and must be updated to the latest links provided by UCSD.
In the first cell of the "Load review data and build interaction dataset" section, reading the JSON into a DataFrame reliably causes Kernel Panic (reproduced 3/3 times on an m5d.xlarge instance), believed to be due to overwhelming the memory available. One solution is to load the dataset into a list of chunked smaller DataFrames then concatenate them once the dataset is fully loaded.
The first cell of the "Get the Personalize API model Json and Personalize Boto3 Client" section references the downloading of a private Amazon internal file external users will not have access to that manually adds Personalize as an available service in the CLI and is not required or runnable. This must be removed.
The entire historical dataset from 2000-08-09 to 2018-10-05 with approximately 1.7 million interactions is used for training the model (minus the 5% holdout for testing). Due to the size of this training dataset in number of users and interactions the resulting training of the Solution and Batch Segment jobs are more expensive than most customers will be comfortable with for a demo. The dataset size must be reduced to prevent an unpleasant billing surprise to customers.

CreateRecommender with custom domain to use Start Stop Recommender

When I'm trying to create a recommender via Command Line, I receive an raise saying "Please use a domains-specific Dataset Group to create the Recommender", but I set my dataset group domain as Custom.

I don't know if I understand, but to use the GetRecommendations we need to create a solution or create a recommender, but the option to stop the solution doesn't exists, so when I try to create a recommender was raised this error, so I'm in this loop, if I try to use Solution, I can't use the Start Stop Recommender to reduce my billing costs, and if I try to create a recommender I can't because I'm using a custom solution(I don't know if is it, but my hypothesis is, the Personalize only accept VIDEO_ON_DEMAND and ECOMMERCE)

I don't know if this helps, but when I try to list the recipes, we only have domains in ECOMMERCE recipes, and maybe this is broken

Incorrect path for user data resulting in FileNotFoundError

Hi:

I was going through the "amazon-personalize-samples/next_steps/workshops/Immersion_Day/personalize_hrnn_metadata_contextual_example.ipynb" and got a runtime error of "FileNotFoundError" because the file path was missing "airlines_data".

#users_df = pd.read_csv('a_users.csv') users_df = pd.read_csv('airlines_data/a_users.csv')

Create solution version

After the personalize.create_solution() method there is a step missing which is personalize.create_solution_version().

Looks like this was recently added.

CheatSheet Description to README

Add a blurb and link to it from the main README.md file

PoCiaB nullable 'YEAR' field not tagged as such

The MovieLens sample dataset used in PoC-in-a-Box workshop includes a few movies without years in their title (e.g. ID 162414 "Moonlight").

If I'm not mistaken (I had tweaked the extraction slightly in my fork), the YEAR field as extracted by regex has some blanks. However, the item metadata schema in notebook 2 tags:

{
    "name": "YEAR",
    "type": "int"
}

Doesn't it need to be "type": ["int", "null"] for this field to be picked up correctly by the model?

KMS Example

Provide a notebook that explains how to leverage both KMS with Personalize to secure information but to also allow Personalize to read encrypted data from an existing S3 bucket.

Dataset ignores 1st column

It appears that even when the dataset's first column is user_id, the console keeps on complaining of missing that column in csv, however if rename first column to lets say just "id" and put user_id at a different position, then the upload to personalize works. This is a really annoying issue. I guess someone assumed there'll be an index column or primary key column as first in dataset whereas all examples and demos don't reflect it.

Cannot run CloudFormation template for immersion day - access denied

S3 error: Access Denied For more information check http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html

Typo error

As HRNN has been replaced by user-personalization recipe in this sample, there are some HRNN refer in current guide.

Position: next_steps/workshops/POC_in_a_box/05_Interacting_with_Campaigns_and_Filters.ipynb

You could search 'HRNN' in this notebook and replace it with ' user_personalization' as 'HRNN' is no longer used in this sample.

Metrices in AWS Personalize console and Personalize SDK API get_matrics are different

I have noticed that Solution metric displayed in AWS Personalize console and personalize SDK API get_metric function are different for percision@k.

Which is correct?

Below are the snapshots:
AWS Personalize web console:

AWS SDK API reference personalize.get_metric:

Precision at 5, 10 and 25 are having different values in console and in get_metric api.

Please let me know which one is to be considered as my deployment is dependent on the metric.

Thanks,
Rohan Hodarkar

[Feature Request] Sample for the new batch recommendations functionality

Great to see that Amazon Personalize now supports batch inference as well as real-time. It would be cool if that functionality could be integrated in to the existing or a new sample.

metrics.py file does not exist

multiple notebooks import metrics.py and the file does not exist.
For example this one:
https://github.com/aws-samples/amazon-personalize-samples/blob/master/next_steps/core_use_cases/user_personalization/personalize_hrnn_metadata_example.ipynb

Pass custom solutionConfig while creating solution via python sdk

I am trying to create a solution on AWS Personalise using custom hyperparameter config.
This is the error I am facing.

InvalidInputException                     Traceback (most recent call last)
<ipython-input-63-504a17505a0e> in <module>
      4     datasetGroupArn = dataset_group_arn,
      5     recipeArn = recipe_arn,
----> 6     solutionConfig=solutionConfig
      7 )
      8 
~/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/botocore/client.py in _api_call(self, *args, **kwargs)
    274                     "%s() only accepts keyword arguments." % py_operation_name)
    275             # The "self" in this scope is referring to the BaseClient.
--> 276             return self._make_api_call(operation_name, kwargs)
    277 
    278         _api_call.__name__ = str(py_operation_name)
~/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/botocore/client.py in _make_api_call(self, operation_name, api_params)
    584             error_code = parsed_response.get("Error", {}).get("Code")
    585             error_class = self.exceptions.from_code(error_code)
--> 586             raise error_class(parsed_response, operation_name)
    587         else:
    588             return parsed_response
InvalidInputException: An error occurred (InvalidInputException) when calling the CreateSolution operation: Provide a hyperparameter that is used in the algorithm: arn:aws:personalize:::algorithm/aws-contextual-bandits

This is the solution config that I am passing.

solutionConfig={
  'hpoConfig': {
      'hpoResourceConfig': {
          'maxNumberOfTrainingJobs': '16',
          'maxParallelTrainingJobs': '8'
      },
      'algorithmHyperParameterRanges': {
          'integerHyperParameterRanges': [
              {
                  'name': 'model.num_hidden',
                  'minValue': 32,
                  'maxValue': 256
              },
              {
                  'name': 'training.bptt',
                  'minValue': 2,
                  'maxValue': 32
              }
          ],
          'categoricalHyperParameterRanges': [
              {
                  'name': 'data.recency_mask',
                  'values': ['True', 'False']
              },
          ]     
      }
  },
  'featureTransformationParameters': {
      'max_hist_length_percentile': '0.99',
      'min_hist_length_percentile': '0.00'
  }
}

This is how I am invoking the call in python SDK.

create_solution_response = personalize.create_solution(
    name = "personalize-soln-user-personalization-test",
    performHPO = True,
    datasetGroupArn = dataset_group_arn,
    recipeArn = recipe_arn,
    solutionConfig=solutionConfig
)
solution_arn = create_solution_response['solutionArn']

Any leads on this will be appreciated.

Which package do I need to install to run the metrics 'mean_reciprocal_rank'

I am using SageMaker jupyterLab with Conda_python3 kernel.

I manage to pip install metrics, but it doe snot seem to be the right package.

ImportError Traceback (most recent call last)
in ()
1 from tqdm import tqdm_notebook
2 import numpy as np
----> 3 from metrics import mean_reciprocal_rank, ndcg_at_k, precision_at_k

ImportError: cannot import name 'mean_reciprocal_rank'

What is the package name that should be installed?

Create a sub page for external assets

We would like to have a curated list of other demos using Personalize, create a section of the ReadMe to link to those and provide context around their usage.

Missing KMS kms:CreateKey on resource: *

When trying to run the notebook Best_Clientside_Security_Practices in the step that creates KMS key the notebook role doesn't have the appropriate permissions.

when looking on the cloudformation at https://amazon-personalize-github-samples.s3.amazonaws.com/PersonalizeDemo.yaml it needs to have for the demo purpose a CreateKey policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": "kms:CreateKey",
            "Resource": "*"
        }
    ]
}

enumerate IAM policies that are required to run jupyter notebooks

That's what worked for me:

AmazonEC2ContainerRegistryFullAccess
AmazonS3FullAccess
AmazonPersonalizeFullAccess
AmazonSageMakerFullAccess
AmazonSageMaker-ExecutionPolicy-20191017T020905
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::*"
]
}
]
}

Misleading sample code?

The documentation says that for HRNN models

[...] you only provide user-id at inference. While the system does not throw an error if you also provide an item-id, the system neglects the value and the value does not impact the results.

In example code https://github.com/aws-samples/amazon-personalize-samples/blob/master/personalize_sample_notebook.ipynb?short_path=f7aa1a1#L1168 itemId is provided (and presumably neglected by the inference engine):

get_recommendations_response = personalize_runtime.get_recommendations(
  campaignArn = campaign_arn,
  userId = str(user_id), 
  itemId = str(item_id)
)

One consequence is that the results often look unreasonably bad in light of the ITEM that's printed:

USER: 734
ITEM: Roman Holiday (1953)
Recommendations: [
  "Twister (1996)", 
  "Phenomenon (1996)", 
  "Birdcage, The (1996)", 
  "Mission: Impossible (1996)", 
  "Willy Wonka and the Chocolate Factory (1971)",

coldstart demo

I reran the coldstart demo and found that the hrnn-coldstart precision@5 has gone up to 0.092 from 0.040. The baseline has also gone up to 0.0076 from 0.0044. I can understand that there is a shuffle on the item-ids, which causes the baseline changes. But I was wondering if it would cause this amount of variance in the trained model, too.

Related, the demo notebook created fake_user. I wonder if that is actually necessary.

Attaching a script with the new results:
personalize_coldstart_demo.ipynb.zip
Thanks.

Event Tracker Not Showing Logs in Metrics

Event Tracker Not Showing Logs in Metrics after putting events in Event Tracker. I triggered multiple multiple put events to the event tracker. But When I check the Logs than there is only 1 event showing in the console. Not able to verify whether events are ingested by Amazon or not.

Get Metrics of Solution - ClientError: An error occurred (InternalFailure) when calling the GetMetrics operation (reached max retries: 4):

Hi,

Running the lab in SageMaker.

When running the Get Metrics of Solution step below:

get_metrics_response = personalize.get_metrics(
    solutionArn = solution_arn
)

print json.dumps(get_metrics_response, indent=2)

I receive the following error:

ClientErrorTraceback (most recent call last)
<ipython-input-20-d09f81367af1> in <module>()
      1 get_metrics_response = personalize.get_metrics(
----> 2     solutionArn = solution_arn
      3 )
      4 
      5 print json.dumps(get_metrics_response, indent=2)

/home/ec2-user/anaconda3/envs/python2/lib/python2.7/site-packages/botocore/client.pyc in _api_call(self, *args, **kwargs)
    318                     "%s() only accepts keyword arguments." % py_operation_name)
    319             # The "self" in this scope is referring to the BaseClient.
--> 320             return self._make_api_call(operation_name, kwargs)
    321 
    322         _api_call.__name__ = str(py_operation_name)

/home/ec2-user/anaconda3/envs/python2/lib/python2.7/site-packages/botocore/client.pyc in _make_api_call(self, operation_name, api_params)
    622             error_code = parsed_response.get("Error", {}).get("Code")
    623             error_class = self.exceptions.from_code(error_code)
--> 624             raise error_class(parsed_response, operation_name)
    625         else:
    626             return parsed_response

ClientError: An error occurred (InternalFailure) when calling the GetMetrics operation (reached max retries: 4):

ParamValidationError: Parameter validation failed: Missing required parameter in input: "solutionVersionArn" Unknown parameter in input: "solutionArn", must be one of: name, solutionVersionArn, updateMode

As per notification in AWS Personalize console, I have updated service JSON files.
After updating and adding to the model I am facing below error while creating campaign:

create_campaign_response = personalize.create_campaign(
    name = "d3-recommendation-v2-1",
    solutionArn = solution_arn,
    updateMode = "MANUAL"
)

campaign_arn = create_campaign_response['campaignArn']
print (json.dumps(create_campaign_response, indent=2))

ParamValidationError: Parameter validation failed:
Missing required parameter in input: "solutionVersionArn"
Unknown parameter in input: "solutionArn", must be one of: name, solutionVersionArn, updateMode

Seems like previously required parameter 'solutionArn' has been removed and 'solutionVersionArn' parameter is added. However the method which creates this solution in python SDK - 'personalize.create_solution' does not send back solution version ARN but only the Solution ARN.
See below output of 'personalize.create_solution' method.

Currently I have to manually go to the console and get the solution version ARN and use in my python code to create the campaigns.

Please add functionality to send solution version ARN from personalize.create_solution method.

Thanks,
Rohan Hodarkar

Evaluating a Model

Create a notebook that illustrates how to interpret a model using the metrics provided.

CloudFormation Deployment Script

The repository should start with a standardized deployment of an S3 bucket, IAM policy, and a SageMaker notebook with the repository cloned inside. This will give all users a standardized way of evaluating the service and a way to triage when PoCs are problematic.

Step functions fails without informations

I have been trying to deploy the personalize definition based on the params.json file but the step functions always fail after the Wait SIMS Solution Version step succeeds, the Wait Personalized Ranking Solution Version and Wait User Personalization Solution Version that are still running at this point have their statuses set to "Cancelled" and the state machine run status goes to "Failed".
I cant find the reason of the actual error. The Execution event history is full of failed executions up until the last ones but that seems to be expected behavior as the code just raises ResourcePending when waiting for various tasks.

The state machine execution ends up in failed status but all the steps are either succeeded or canceled, there is no "failed" step to examine.

I upped the "MaxAttempts": 100 to 1000 for all lambdas but that didnt change anything.

In the Personalize console, I can see that the dataset import jobs were successful and the 3 solution versions are active. The tracker and the campaigns are not created

params.json:

{
  "datasetGroup": {
    "name": "myPipeline"
  },
  "datasets": {
    "Interactions": {
      "name": "InteractionsDataset",
      "schema": {
        "type": "record",
        "name": "Interactions",
        "namespace": "com.amazonaws.personalize.schema",
        "fields": [
          {
            "name": "USER_ID",
            "type": "string"
          },
          {
            "name": "ITEM_ID",
            "type": "string"
          },
          {
            "name": "TIMESTAMP",
            "type": "long"
          },
          {
            "name": "EVENT_TYPE",
            "type": "string"
          },
          {
            "name": "EVENT_VALUE",
            "type": [
              "null",
              "float"
            ]
          },
          {
            "name": "IMPRESSION",
            "type": "string"
          },
          {
            "name": "RECOMMENDATION_ID",
            "type": [
              "null",
              "string"
            ]
          }
        ],
        "version": "1.0"
      }
    },
    "Users": {
      "name": "UsersDataset",
      "schema": {
        "type": "record",
        "name": "Users",
        "namespace": "com.amazonaws.personalize.schema",
        "fields": [
          {
            "name": "USER_ID",
            "type": "string"
          },
          {
            "name": "STRING1",
            "type": [
              "string",
              "null"
            ],
            "categorical": true
          },
          {
            "name": "STRING2",
            "type": [
              "string",
              "null"
            ],
            "categorical": true
          },
          {
            "name": "STRING3",
            "type": [
              "string",
              "null"
            ],
            "categorical": true
          }
        ],
        "version": "1.0"
      }
    },
    "Items": {
      "name": "ItemsDataset",
      "schema": {
        "type": "record",
        "name": "Items",
        "namespace": "com.amazonaws.personalize.schema",
        "fields": [
          {
            "name": "ITEM_ID",
            "type": "string"
          },
          {
            "name": "STRING1",
            "type": [
              "null",
              "string"
            ],
            "categorical": true
          },
          {
            "name": "STRING2",
            "type": [
              "null",
              "string"
            ],
            "categorical": true
          },
          {
            "name": "STRING3",
            "type": [
              "null",
              "string"
            ],
            "categorical": true
          },
          {
            "name": "STRING4",
            "type": [
              "null",
              "string"
            ],
            "categorical": true
          },
          {
            "name": "STRING5",
            "type": [
              "null",
              "string"
            ],
            "categorical": true
          },
          {
            "name": "STRING6",
            "type": [
              "null",
              "string"
            ],
            "categorical": true
          },
          {
            "name": "STRING7",
            "type": [
              "null",
              "string"
            ],
            "categorical": true
          }
        ],
        "version": "1.0"
      }
    }
  },
  "solutions": {
    "userPersonalization":{
      "name":"userPersonalizationSolution",
      "recipeArn":"arn:aws:personalize:::recipe/aws-user-personalization",
      "performAutoML": false
    },
    "sims":{
      "name":"simsSolution",
      "recipeArn":"arn:aws:personalize:::recipe/aws-sims",
      "performAutoML": false
    },
    "personalizedRanking":{
      "name":"personalizedRanking",
      "recipeArn":"arn:aws:personalize:::recipe/aws-personalized-ranking",
      "performAutoML": false
    }
  },
  "eventTracker":{
    "name":"myPipeline-tracker"
  },
  "campaign": {
    "userPersonalizationCampaign":{
      "name":"userPersonalizationCampaign",
      "minProvisionedTPS":1
    },
    "simsCampaign":{
      "name":"simsCampaign",
      "minProvisionedTPS":1
    },
    "personalizedRankingCampaign":{
      "name":"personalizedRankingCampaign",
      "minProvisionedTPS":1
    }
  }
}

wget ml-100k.zip is failing using HTTP protocol

In the section titled Download and Explore the Dataset , the wget command below is failing.

!wget -N http://files.grouplens.org/datasets/movielens/ml-100k.zip

The result of executing the cell is:

--2022-03-18 23:45:36--  http://files.grouplens.org/datasets/movielens/ml-100k.zip
Resolving files.grouplens.org (files.grouplens.org)... 128.101.65.152
Connecting to files.grouplens.org (files.grouplens.org)|128.101.65.152|:80... failed: No route to host.
unzip:  cannot find or open ml-100k.zip, ml-100k.zip.zip or ml-100k.zip.ZIP.
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-8-f7329d654eb9> in <module>
      1 get_ipython().system('wget -N http://files.grouplens.org/datasets/movielens/ml-100k.zip')
      2 get_ipython().system('unzip -o ml-100k.zip')
----> 3 data = pd.read_csv('./ml-100k/u.data', sep='\t', names=['USER_ID', 'ITEM_ID', 'RATING', 'TIMESTAMP'])
      4 pd.set_option('display.max_rows', 5)
      5 data

The problem is resolved if I change the protocol to HTTPS for the wget command.

I plan to run through this sample completely. I will create a PR to fix this and any other occurrences as part of my testing, if this is indeed the correct way to fix this. Please let me know.

All values in the event_type column are either reserved keywords or null

Hi,

When I want try train a model for personalize recipe, I am getting this error:

All values in the event_type column are either reserved keywords or null

In EVENT_TYPE I have like, click, view, bookmark.
I have Item file with one metadata and user file with meta data.

Missing required parameter in input for create_solution

Hello,

The ml_ops step function pipeline fails on my side because of a missing input for the create_solution here.

  "statesError": {
    "Error": "ParamValidationError",
    "Cause": "{\"errorMessage\": \"Parameter validation failed:\\nMissing required parameter in input: \\\"datasetGroupArn\\\"\", \"errorType\": \"ParamValidationError\", \"stackTrace\": [\"  File \\\"/var/task/solution.py\\\", line 42, in lambda_handler\\n    create_solution(solutionArn, event['solution']['{}'.format(event['solutionType'])])\\n\", \"  File \\\"/var/task/solution.py\\\", line 19, in create_solution\\n    LOADER.personalize_cli.create_solution(**params)\\n\", \"  File \\\"/var/runtime/botocore/client.py\\\", line 316, in _api_call\\n    return self._make_api_call(operation_name, kwargs)\\n\", \"  File \\\"/var/runtime/botocore/client.py\\\", line 607, in _make_api_call\\n    request_dict = self._convert_to_request_dict(\\n\", \"  File \\\"/var/runtime/botocore/client.py\\\", line 655, in _convert_to_request_dict\\n    request_dict = self._serializer.serialize_to_request(\\n\", \"  File \\\"/var/runtime/botocore/validate.py\\\", line 297, in serialize_to_request\\n    raise ParamValidationError(report=report.generate_report())\\n\"]}"
  }

I tried to track down the issue and found that datasetGroupArn is assigned to an object but it seems to me that is not passed on to the actual function arguments as it should. Here is the current code:

## datasetGroupArn is assigned inside of the 'datasetGroupArn' property of event['solution']
event['solution']['datasetGroupArn'] = event['datasetGroupArn'] 

## however here, only the property event['solution']['{}'.format(event['solutionType'])] is passed to create_solution
create_solution(solutionArn, event['solution']['{}'.format(event['solutionType'])])

Shouldn't it be something like this instead?

solutionParams=event['solution']['{}'.format(event['solutionType'])]
solutionParams['datasetGroupArn'] = event['datasetGroupArn'] 
create_solution(solutionArn, solutionParams)

Sorry if I'm getting this wrong, I'm quite new to both Personalize and Python 😓

Cheers
Joel

User Metadata

Create a notebook that shows the impact on a model by adding User Metadata to it.

Adding Item Metadata

Create a notebook that shows the impact on a model by adding item metadata.

Provide an example calling from a Lambda trigger for a put_event.

We should have a demo that showcases how to deploy a Lambda function that reacts to some trigger, probably S3 to then put an event into Personalize.

Batch Predictions with AWS Personalize

As per the FAQ page, AWS Personalize provides batch recommendations along with real-time recommendations.

When deployed, developers call the service from their production services to get real-time or batch recommendations, and Amazon Personalize will automatically scale to meet demand. Once Amazon Personalize begins making inferences on production traffic, it measures the lift in engagement from personalization and generates reports in the AWS Console to allow developers to evaluate the model’s success.

However I did not find any code where we can have batch recommendations.

Is it something in development?

Wrong parameter name minTPS

When trying to execute cell 15 the minTPS = should be changed to minProvisionedTPS =

[Feature request] Add example of creating and using Filters

Hello,

As Personalize user I would like to see how to create Filters, use them with Campaign and how does they affect recommendations including changes after calling PersonalizeEvents.PutEvents API call:
https://docs.aws.amazon.com/eventbridge/latest/APIReference/API_PutEvents.html

Thanks!

pd.read_csv('./ml-100k/u.item

when i run
[items = pd.read_csv('./ml-100k/u.item', sep='|', usecols=[0,1], header=None)]
i get below error any idea
"UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 3: invalid continuation byte"

Specify item metadata weight

@chrisking @james-jory

stackoverflow link

Is there any way to specify item metadata weight in AWS Personalize in the following scenarios:

Scenario 1: Weight of multiple metadata

A video might have multiple metadata like GENRES, THEME, etc. with different weights. But we might want a model that will recommend similar videos prioritising one metadata (for example: GENRES) more than another (for example: THEME). Is there any way to pass this information to the model?

Scenario 2: Weight of multiple categories in same metadata

A video might have multiple categories with different weights in the same metadata. For example: a video might be in both action and adventure category for GENRES metadata:

GENRES: Action|Adventure

But the video belongs to action category more than adventure category. Is there any way to pass this information to the model?

Scenario 3: Weight of multiple hierarchical categories in same metadata

A video might have multiple hierarchical categories with different weights in the same metadata. For example: a video might be in both action > crime > biopic hierarchy and adventure > western > biopic hierarchy for GENRES metadata:

GENRES: Action|Adventure 
GENRE_L2: Crime|Western 
GENRE_L3: biopic

But the video belongs to action > crime > biopic hierarchy more than adventure > western > biopic hierarchy, is there any way to pass this information to the model?

CloudFromation yaml is no more accessible

Hello,
When I want to deploy the Cloudformation we have the following error on the provide file https://chriskingpartnershare.s3.amazonaws.com/PersonalizePOC.yaml:
S3 error: Access Denied For more information check http://docs.aws.amazon.com/AmazonS3/latest/API/ErrorResponses.html

Refactor exploration notebook to use open dataset

The exploration notebook currently references source dataset from an S3 bucket that no longer exists. The notebook should be refactored to use a different public dataset such as MovieLens.

Recommendations Not Getting Updated

I am using realtime recommendations using Event Tracker. But My current compaign is not showing updated recommendation. Can anyone please help here.

personalize_events.put_events(
        trackingId="ea78ed5a-cff2-491d-844f-a1ee0806d3c6",
        userId= 'user1',
        sessionId = str(random.randint(10,10000)),
        eventList = [{
                        'sentAt': time.time(),
                        'eventType': 'CLICK',
                        'properties': json.dumps({
                                'itemId': 'videoId1',
                                'eventValue': 'PLAY',
                                'watchedDuration': random.randint(10,10000)
                        })
                    }
        ]
    )

Amazon Personalize User Segmentation - High Training Cost

The current Amazon Personalize User Segmentation demo notebook trains on a dataset large enough that training costs result in ~$1,000 USD in Personalize costs to train just one of the solution version.

The dataset could be reduced to a fraction of the current size if it could be shown it retains consistent performance metrics.

Search or Information Retrieval in AWS Personalize

Hi,

Is there a way where Amazon Personalize can be used for both, Recommendations and for user search in one system. Something exactly what Amazon.com store has? On the main page of Amazon.com we have an option to search by inputting our key words, but the page also gives us a set of recommendations based on our previous history of interactions.

I'd like to see if I can send my data through a batch approach to Amazon Personalize (user's metadata as well as all content) and get the model and processed data back from so that after I load this data back to my servers, when my customers go to my web site and use the search option, the results that are presented are the top notch selection of recommendations?

Thanks

aws-samples / amazon-personalize-samples Goto Github PK

amazon-personalize-samples's Introduction

Amazon Personalize Samples

Getting Started with the Amazon Personalize

Amazon Personalize Next Steps

License Summary

amazon-personalize-samples's People

Contributors

Stargazers

Watchers

Forkers

amazon-personalize-samples's Issues

Create a dataframe for the items by reading in the correct source CSV

Create a dataframe for the items by reading in the correct source CSV

ImportError: cannot import name 'mean_reciprocal_rank'

Recommend Projects

Recommend Topics

Recommend Org