Coder Social home page Coder Social logo

meteo-data-mining's Introduction

Meteo Data Mining Project

Overview

This project focuses on leveraging data mining techniques to analyze meteorological (meteo) data. The objective is to extract meaningful insights and patterns from the data that can aid in understanding weather phenomena and predicting future weather conditions.

Information About Dataset :

Introduction :

The "Meteo12" dataset contains the following features: ['No', 'year', 'month', 'day', 'hour', 'PM2.5', 'PM10', 'SO2', 'NO2', 'CO', 'O3', 'TEMP', 'PRES', 'DEWP', 'RAIN', 'wd', 'WSPM', 'station'], such that each column has a meaning and this dataset does not contain the Target column.

That is to say, to apply supervised machine learning algorithms we must first apply an unsupervised algorithm to create a Target column.

These columns provide various measurements and indicators of weather conditions, air quality and atmospheric properties.

dataset contains

the meaning of each column in your weather dataset:

  1. year: The year in which the weather data was recorded.
  2. month: The month in which the weather data was recorded.
  3. day: The day of the month on which the weather data was recorded.
  4. time: The time of day the weather data was recorded.
  5. PM2.5: Particulate matter (PM) with a diameter of 2.5 micrometers or less, which can penetrate deep into the respiratory system and pose health risks when present in high concentrations .
  6. PM10: Particulate matter (PM) with a diameter of 10 micrometers or less, which can also cause adverse health effects when present in high concentrations.
  7. SO2: Sulfur dioxide, a gas primarily emitted by the burning of fossil fuels such as coal and oil. It can contribute to respiratory problems and is a precursor to acid rain.
  8. NO2: Nitrogen dioxide, a reddish-brown gas that forms when nitrogen oxides react with pollutants in the air. This can irritate the airways and worsen respiratory conditions.
  9. CO: Carbon monoxide, a colorless and odorless gas produced by the incomplete combustion of fuels containing carbon. It can be harmful when inhaled in large quantities, leading to carbon monoxide poisoning.
  10. O3: Ozone, a gas composed of three oxygen atoms. While ozone in the atmosphere protects us from the sun's ultraviolet rays, ground-level ozone is a pollutant that can cause respiratory problems and exacerbate lung conditions.
  11. TEMP: Temperature, measure of how hot or cold the air is.
  12. PRES: Atmospheric pressure, the force exerted by the weight of air molecules above a particular point on the Earth's surface.
  13. DEWP: Dew point, the temperature at which the air becomes saturated with water vapor and dew begins to form.
  14. RAIN: Precipitation, amount of rain recorded during the specified period.
  15. WSPM: Wind speed, the rate at which air moves horizontally past a given point.
  16. wd: Wind direction, indicating the direction the wind is blowing from. It is usually indicated in cardinal directions (e.g., N for North, S for South, etc.) or in degrees (e.g., 0° for North, 90° for East, etc.).
  17. station: The name or identifier of the weather station where the data was recorded. This column specifies the location or source of weather observations.

This is the Correlation Matrix : correlation matrix

This is the visualization of the data (Pair-Plot): data visualization - pair plot

Methodology

  1. Data Preprocessing: The raw meteo data is preprocessed to handle missing values, outliers, and any noise present in the dataset. This step ensures the data is in a suitable format for analysis.

  2. Feature Engineering: Relevant features are selected or engineered from the raw data to improve the performance of the models. This step involves domain knowledge and understanding of meteorology to identify key variables.

  3. Clustering (K-Means): The K-Means clustering algorithm is applied to the preprocessed data to group similar instances together. This helps in identifying distinct weather patterns and clusters within the dataset.

  4. Target Generation: Based on the clusters obtained from K-Means, a target column is generated, which represents different weather patterns or conditions. This target column serves as the label for the classification models.

  5. Model Training: Three classification algorithms - Support Vector Machine (SVM), Classification and Regression Trees (CART), and k-Nearest Neighbors (KNN) - are trained on the preprocessed data with the generated target column.

  6. Model Evaluation: The trained models are evaluated using appropriate metrics such as accuracy, precision, recall, and F1-score. This step helps in assessing the performance of each model and selecting the best-performing one for deployment.

  7. Deployment: The chosen model is deployed for real-time or batch prediction of weather patterns based on new incoming data.

Repository Structure

  • DataMeteo12 Complet.csv: Contains the raw meteo data
  • 1 - KMeans for Meteo12: Jupyter notebooks detailing the data preprocessing, feature engineering, appling K-Means Model.
  • 2 - SVM for Meteo12: Jupyter notebooks for appling SVM Model.
  • 3 - KNN for Meteo12: Jupyter notebooks for appling KNN Model.
  • 4 - CART for Meteo12: Jupyter notebooks for appling CART Model.
  • Final Report: Jupyter notebooks contains the final results and decisions.

Feel free to adjust or expand upon any section to better fit your project's specifics!

meteo-data-mining's People

Contributors

drisskhattabi6 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.