Coder Social home page Coder Social logo

lab-model-generation-and-validation's Introduction

logo_ironhack_blue 7

Lab | Model generation, and validation

For this lab, we still keep using the marketing_customer_analysis.csv file that you can find in the files_for_lab folder.

Get the data

We are using the marketing_customer_analysis.csv file.

Linear regression

  • Select the columns which are correlated with total_claim_amount and don't suffer from multicollinearity (see the previous lab)
  • Remove outliers
  • X-y split. (define which column you want to predict, and which ones you will use to make the prediction)
  • Use the Train-test split to create the Train, and Test sets (make sure to set the random_state option to any integer number of your choice).
  • Use the pd.DataFrame() function to create new Pandas DataFrames from the X_train, and X_test Numpy arrays obtained in the previous step (make sure to use the columns= option to set the columns names to X.columns).
  • Split the X_train Pandas DataFrame into two: numerical, and categorical using df.select_dtypes().
  • If you need to transform any column, Train your transformers and/or scalers all the numerical columns using the .fit() only in the Train set (only one transformer/scaler for all the columns, check here, and here using the .transform()
  • Save all your transformers/scalers right after the .fit() using pickle using the code shown below:
    import os
    
    path = "transformers/"
    # Check whether the specified path exists or not
    isExist = os.path.exists(path)
    if not isExist:
        # Create a new directory because it does not exist
        os.makedirs(path)
       print("The new directory is created!")
    
    filename = "filename.pkl" # Use a descriptive name for your scaler/transformer but keep the ".pkl" file extension
    with open(path+filename, "wb") as file:
      pickle.dump(variable, file) # Replace "variable" with the name of the variable that contains your transformer
  • If you used a transformer/scaler in the previous step, create new Pandas DataFrames from the Numpy arrays generated by the .transform() using the pd.DataFrame() function as you did earlier with the Numpy arrays generated by the train_test_split() function.
  • Transform the categorical columns into numbers using a:
    • OneHotEncoder for categorical nominal columns. (again only use the .fit() in the Train set, but the .transform() in the Train and the Test sets)
    • Remember to save all your transformers/scalers right after the .fit() using pickle using the code shown below:
      path = "encoders/"
      # Check whether the specified path exists or not
      isExist = os.path.exists(path)
      if not isExist:
        # Create a new directory because it does not exist
        os.makedirs(path)
        print("The new directory is created!")
      
      filename = "filename.pkl" # use a descriptive name for your encoder but keep the ".pkl" file extension
      with open(path+filename, "wb") as file:
         pickle.dump(variable, file) # Replace "variable" with the name of the variable that contains your transformer
    • Use .replace() to cast into numbers any categorical ordinal column replacing each label with a number that: respects the order of the labels and the relative "distance"
  • Concat numerical_transformer and categorical_transfomed DataFrames using pd.concat().
  • Apply another MinMaxScaler to the concatenated DataFrame.
  • Remember to save all your MinMaxScaler right after the .fit() using pickle using the code shown below:
    path = "scalers/"
    # Check whether the specified path exists or not
    isExist = os.path.exists(path)
    if not isExist:
      # Create a new directory because it does not exist
      os.makedirs(path)
      print("The new directory is created!")
    
    filename = "filename.pkl" # use a descriptive name for your encoder but keep the ".pkl" file extension
    with open(path+filename, "wb") as file:
       pickle.dump(variable, file) # Replace "variable" with the name of the variable that contains your transformer
  • Apply linear regression to the Pandas DataFrame obtained in the previous step using sklearn
  • Remember to save your linear model right after the .fit() using pickle using the code shown below:
        path = "models/"
        # Check whether the specified path exists or not
        isExist = os.path.exists(path)
        if not isExist:
          # Create a new directory because it does not exist
          os.makedirs(path)
          print("The new directory is created!")
    
         filename = "filename.pkl" # use a descriptive name for your encoder but keep the ".pkl" file extension
         with open(path+filename, "wb") as file:
            pickle.dump(variable, file) # Replace "variable" with the name of the variable that contains your transformer

Model Validation

  • Compute the following metrics for your Train and Test sets:

  • Create a Pandas DataFrame to summarize the error metrics for the Train and Test sets.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.