Coder Social home page Coder Social logo

Comments (9)

npatki avatar npatki commented on June 18, 2024

Thanks for filing this issue @Ng-ms. I'm a bit confused at the scenario you are describing.

The InvalidDataError indicates that there is a mismatch between the data and metadata. If you convert the data column to numerical, you would also need to update the metadata for that column to be numerical. The InvalidDataError means that your synthesizer has crashed so you are be unable to fit PARSynthesizer and sample from it.

when converting the data back to DateTime in the synthetic data is gives a range of dates like 09-08-1768 and 10-03-1644 ..

I am confused because this sentence implies that you already have synthetic data. How are you able to get synthetic data if the synthesizer crashed (with the InvalidDataError)? Something doesn't seem to add up.

It would be helpful if you could share the Python code that you are using to load data, modify it, create metadata, create the synthesizer, sample from it, etc. And also if you could indicate where the crash is happening.

from sdv.

Ng-ms avatar Ng-ms commented on June 18, 2024

Sorry if my earlier messages were a bit unclear. Here's more info to explain better. i have two cases/tries here :

1.Using datetime columns as context without alteration: This leads to InvalidDataError due to a mismatch between data and the defined metadata even though in the metadata these columns are specified as datetime type , preventing the fitting of the PARSynthesizer.e

metadata = SingleTableMetadata()
metadata.detect_from_dataframe(data=df)
metadata.sequence_key = 'ID_P'
metadata.update_column(column_name='ID_P', sdtype='id')
metadata.set_sequence_index(column_name='DATE_P')

#metadata.save_to_json(filepath='my_metadata_v2.json')
from sdv.metadata import SingleTableMetadata
#metadata = SingleTableMetadata.load_from_json(
 #   filepath='my_metadata_v1.json')
print(metadata)
# Generate synthetic data
print('start')
synthesizer = PARSynthesizer(metadata,epochs=150, context_columns= [' data_DM', 'DATE_DIS','date_HIP','date_DCM','date_DID'],  verbose=True,  enforce_min_max_values=True, enforce_rounding=True,  cuda=True)

2.Converting datetime to numerical for synthesis: This results in synthetic data with unrealistic dates (e.g., 09-08-1768), indicating a problem in handling or converting these numerical values back to datetime.


context_date_columns = ['data_DM', 'DATE_DIS', 'date_HIP', 'date_DCM', 'date_DID']


for col in context_date_columns:
    df[col] = pd.to_datetime(df[col], format='%d/%m/%Y').astype(int)
#metadata.save_to_json(filepath='my_metadata_v2.json')
from sdv.metadata import SingleTableMetadata
#metadata = SingleTableMetadata.load_from_json(
 #   filepath='my_metadata_v1.json')
print(metadata)
# Generate synthetic data
print('start')
synthesizer = PARSynthesizer(metadata,epochs=150, context_columns= [' data_DM', 'DATE_DIS','date_HIP','date_DCM','date_DID'],  verbose=True,  enforce_min_max_values=True, enforce_rounding=True,  cuda=True)
synthesizer.fit(df)


print('end')
synthetic_data = synthesizer.sample(num_sequences=100,sequence_length=None)
for col in context_date_columns:
    synthetic_data[col] = pd.to_datetime(synthetic_data[col], unit='ns').dt.date


from sdv.

Ng-ms avatar Ng-ms commented on June 18, 2024

Hello @npatki, do you have any ideas on how to solve this ?

from sdv.

npatki avatar npatki commented on June 18, 2024

Hi @Ng-ms,

Thanks for confirming. The errors indicate that there are mismatches between how you are converting the data from datetime to numerical, and how you're converting them back from numerical to datetime. If you are doing any conversions, you also need to update your metadata as the sdtype is no longer datetime but numerical.

Here is a code snippet that may help:

import pandas as pd

# convert datetime columns to numerical
data[COLUMN_NAME] = pd.to_datetime(data[COLUMN_NAME], format='%d/%m/%Y').astype(int)

# update these columns to be sdtype 'numerical' in the metadata, as they are no longer datetime!
metadata.update_column(column_name=COLUMN_NAME, sdtype='numerical')

# save this version of metadata!
metadata.save_to_json(filepath='metadata_converted_context.json')

# now you can fit and sample
synthesizer = PARSynthesizer(metadata, epochs=150, context_columns=[COLUMN_NAME])
synthesizer.fit(data)
synthetic_data = synthesizer.sample(num_sequences=100)

# synthetic data will have numerical values. convert them to datetime
synthetic_data[COLUMN_NAME] = pd.to_datetime(synthetic_data[COLUMN_NAME], unit='ns').dt.date

from sdv.

Ng-ms avatar Ng-ms commented on June 18, 2024

thank you @npatki
i am actully upadating the metadata, the problem i am getting very strange dates
Screenshot from 2024-03-01 17-37-49

from sdv.

npatki avatar npatki commented on June 18, 2024

Hi @Ng-ms, that is unfortunate to hear. As I mentioned in my previous message, you may want to double check how you are doing your conversion from datetime --> numerical, and back from numerical --> datetime. Good practice will be to inspect your data every step of the way. What does the input data look like? What are the min/max values in the input data for fit? Etc.

Unfortunately, there is only so much I can do with these screenshots. If you are able to provide access to your real data or metadata, as well as the full and complete code that you have currently in SDV, that will be helpful. If we are not able to replicate your issue, it is unlikely we will be able to provide any kinds of useful information. Please provide any other information you think will be helpful. Thanks.

from sdv.

npatki avatar npatki commented on June 18, 2024

Hi @Ng-ms are you still encountering this problem?

Since this issue has been inactive for a while, I'm closing it off. But please feel free to reply with any more info. We can always reopen the issue to continue investigation.

from sdv.

Ng-ms avatar Ng-ms commented on June 18, 2024

hi @npatki yes unfortunately i am still having this problem eventhough my converting for the data is right but I am still getting unlogical (out of the min and max ) dates

from sdv.

Ng-ms avatar Ng-ms commented on June 18, 2024

@npatki Hello, i am still having the same error , sometimes just one column gives this unrealistic dates and some times (if I train the model longer) I am getting more than one date columns with unrealistic dates like (1700, 1898)

from sdv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.