vividfog / nordpool-predict-fi Goto Github PK

View Code? Open in Web Editor NEW

64.0 64.0 8.0 70.57 MB

A Python app and a Random Forest ML model that predicts spot prices for the Nordpool FI market.

License: MIT License

Python 86.79% HTML 3.57% JavaScript 8.04% CSS 1.60%

nordpool-predict-fi's People

Contributors

Stargazers

Watchers

Forkers

anttijalomaki bysmyyr routout teme-v pkautio valemto pave87 shopro

nordpool-predict-fi's Issues

Foreca → FMI API

Hello!

What an interesting project! Just to let you know that Foreca's Wind Forecasts are available through this API which might make this project "a bit less fragile": https://www.foreca.fi/api/wind-power?id=100658225

Mandatory disclaimer: I don't know whether it is OK to use that API directly without a contract/agreement with Foreca and cannot take any responsibility of any problems between you and Foreca caused by a possible usage of this publicly accessible API :)

Best Regards,
T3m3z

ENTSO-E Olkiluoto 3 data anomaly

I wonder if @pkautio could take a look what's happening with our ENTSO-E unavailability data. We get this:

→ ENTSO-E: Combined unavailability of nuclear power plants:
                       start                       end  avail_qty  nominal_power production_resource_name
0 2024-05-29 22:00:00+03:00 2024-06-03 11:00:00+03:00       1570         1600.0              Olkiluoto 3
1 2024-06-03 12:00:00+03:00 2025-03-01 01:00:00+02:00          0         1600.0              Olkiluoto 3
2 2024-08-04 07:00:00+03:00 2024-08-26 00:00:00+03:00          0          496.0                Loviisa 2

It appears as if OL3 would be down until March 2025, and more than 2 GW would be missing from the market at the moment:

→ ENTSO-E: Avg: 2276, max: 2276, min: 2276 MW
                    timestamp  NuclearPowerMW
0   2024-08-11 01:00:00+03:00            2276
1   2024-08-11 02:00:00+03:00            2276
2   2024-08-11 03:00:00+03:00            2276
3   2024-08-11 04:00:00+03:00            2276
4   2024-08-11 05:00:00+03:00            2276
..                        ...             ...

As a temporary workaround, I've set the nuclear prediction formula to extrapolate the last-known value to the future. At the time of writing this, Fingrid reports a bit over 3 GW of production. Loviisa 2 is under maintenance, and OL3 is producing about 1 GW, according to TVO.fi.

You can run python util/entso_e.py to reproduce.

FMI version fail to run locally

Following error appear when running python nordpool_predict_fi.py --predict --commit

https://sahkotin.fi/prices?vat&start=2024-02-23T13:00:00.000Z&end=2024-03-06T13:00:00.000Z Days of data coverage (should be 7 back, 5 forward for now): 12 Traceback (most recent call last): File "/home/teme/nordpool-predict-fi/nordpool_predict_fi.py", line 242, in <module> price_df = rf_model.predict(df[['day_of_week', 'hour', 'month', 'NuclearPowerMW'] + fmisid_ws + fmisid_t]) ~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/teme/nordpool-predict-fi/venv/lib/python3.12/site-packages/pandas/core/frame.py", line 4096, in __getitem__ indexer = self.columns._get_indexer_strict(key, "columns")[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/teme/nordpool-predict-fi/venv/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 6199, in _get_indexer_strict self._raise_if_missing(keyarr, indexer, axis_name) File "/home/teme/nordpool-predict-fi/venv/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 6251, in _raise_if_missing raise KeyError(f"{not_found} not in index") KeyError: "['FMISID_WS not set in environment', 'FMISID_T not set in environment'] not in index"

Model accuracy issues

There are a few major issues with the model, the largest being the way the train/test data is split. Since it is based on temporal / time series data each datapoint that is next to each other is highly correlated. If you do a random train/test split you will leak training data into the test set and the as such the model is likely to show artificially high performance on the test set. This phenomenon is known as data leakage, where information from outside the training dataset is used to create the model. In the context of time series data, where measurements are dependent on time, the sequences of data points are often highly correlated with their immediate predecessors and successors. If a random train/test split is employed without considering the temporal nature of the data, some of the information from the training set can inadvertently be included in the test set.

To address this issue and avoid data leakage, you should use a time-based split for the train/test division. This approach ensures that the model is trained on data from a certain period and tested on data from a subsequent period, mirroring how the model would be used in real-world forecasting or time series analysis scenarios. By doing so, the test set acts as a more accurate representation of future, unseen data, and the model's performance metrics are more reliable.

Further, running the trained model on past data that is included in the training set will likely lead to overly optimistic performance metrics. This is because the model has already "seen" this data during training, and as such, it can easily predict these outcomes, which doesn't accurately reflect its ability to predict new, unseen data.

To calculate historic performance you can employ a technique known as walk-forward validation or rolling forecast origin. This method involves incrementally moving the cut-off point between the training and test sets forward in time, training the model on a fixed or expanding window of past data and testing it on the following period. Each step forward allows the model to be tested in a manner that simulates real-world forecasting situations, where only past data is available to predict future outcomes.

By continually retraining the model on the most recent data and forecasting the next period, you can assess how well the model adapts to changes in the data over time. This method provides a more robust and realistic evaluation of the model's predictive performance and its potential effectiveness in practical applications.

This technique can help identify model overfitting early on. Overfitting occurs when a model learns the noise in the training data to the extent that it negatively impacts the performance on new data. Because walk-forward validation tests the model on multiple, consecutive future points, it offers a clearer picture of how the model generalizes to new data. If the model performs well on the training set but poorly on the test sets during the walk-forward validation, it's a strong indication that the model may be overfitting.

Adding nuclear production forecast

Couple of ideas to improve the forecast:

Available transit capacity between FI-SE1, FI-SE3 and FI-EE has major impact to the prices at certain conditions

This data should be available from Entso-E as market messages. With Entso-E-py package:

from entsoe import EntsoePandasClient
import pandas as pd

client = EntsoePandasClient(api_key="")

start = pd.Timestamp('202300101', tz='Europe/Helsinki')
end = pd.Timestamp('20241231', tz='Europe/Helsinki')
country_code = 'FI' # Finland
country_code_from = 'FI' # Finland
country_code_to = 'SE_1' # Finland-Northern Sweden

transit_unavailability = client.query_unavailability_transmission(country_code_from, country_code_to, start=start, end=end, docstatus=None, periodstartupdate=None, periodendupdate=None)

Similary query for all connections and directions.

Nuclear power capacity forecast

This could be done based on UMM Remit messages. These should be available from Entso-E.

Ensto-E-py package provides ready-made interface to Entso-E.

Possibility to get the predictions as a sensor like normal nordpool sensor in HA

Is there anyway to get these predictions into a sensor in HA like the normal nordpool sensor. Would be best to update it when new prices are coming and showing the confirmed prices + predictions. Looking to integrate the predictions to emhass in HA

GPT-4 outline: Displaying past predictions (auto-generated)

This is auto-generated by GPT-4 based on discussion of various ideas, outline for a future session.

Quote:

Motivation

To enhance the interactivity and information richness of the Nordpool Spot Price Prediction eCharts visualization, we are implementing a feature that allows users to toggle the visibility of historical prediction lines. These lines represent past predictions and help users visualize the evolution of price forecasts. Each line's opacity will decrease as it goes further back in time, providing a clear visual distinction between recent and older predictions.

Plan Overview

Database Schema and Data Insertion: Set up a database schema to store snapshots of predictions and write data insertion logic.
Backend Data Preparation: Implement a backend service to provide prediction data for the frontend.
Frontend Implementation: Modify the HTML to include a toggle checkbox and implement the JavaScript to interact with the eCharts instance.
Testing: Conduct thorough testing to ensure the feature works correctly.

Part 1: Database Schema and Data Insertion

SQL Schema

-- Table to store snapshots of predictions
CREATE TABLE prediction_snapshot (
    snapshot_id INTEGER PRIMARY KEY AUTOINCREMENT,
    snapshot_date TIMESTAMP NOT NULL
);

-- Table to store the details of each snapshot
CREATE TABLE snapshot_details (
    detail_id INTEGER PRIMARY KEY AUTOINCREMENT,
    snapshot_id INTEGER NOT NULL,
    timestamp TIMESTAMP NOT NULL,
    PricePredict_cpkWh FLOAT,
    FOREIGN KEY (snapshot_id) REFERENCES prediction_snapshot(snapshot_id)
);

Data Insertion Logic

-- Insert a new snapshot record
INSERT INTO prediction_snapshot (snapshot_date) VALUES (CURRENT_TIMESTAMP);

-- Insert the snapshot details (this would be part of a loop processing your prediction data)
INSERT INTO snapshot_details (snapshot_id, timestamp, PricePredict_cpkWh) 
VALUES ((SELECT last_insert_rowid()), '2024-05-29 00:00:00', 10.5);

Part 2: Backend Data Preparation

Implement a backend service to fetch the last 4 snapshots from the database and format them as JSON.

Part 3: Frontend Implementation

HTML

<label>
  <input type="checkbox" id="historyToggle" checked>Show History Lines</label>
<div id="main" style="width: 600px;height:400px;"></div>
<script src="path_to_echarts_lib"></script>

JavaScript

var myChart = echarts.init(document.getElementById('main'));

function loadChartData(callback) {
  // Simulate fetching JSON data from the backend
  var simulatedData = []; // Replace with actual JSON data fetching logic
  callback(simulatedData);
}

function toggleHistoryLines(showHistory) {
  loadChartData(function (latestData) {
    var option = {
      series: [{
        data: latestData,
        // other series options...
      }]
    };
    myChart.setOption(option);
  });
}

document.getElementById('historyToggle').addEventListener('change', function() {
  toggleHistoryLines(this.checked);
});

toggleHistoryLines(true); // Initialize with history lines visible

Part 4: Testing

Database Tests: Verify the schema and insertion logic.
Backend Tests: Ensure the backend correctly serves prediction snapshots as JSON.
Frontend Tests: Test the toggle feature to confirm that it updates the chart as expected.

This plan includes all the details you need to enhance your chart with a new feature that improves the user's experience by providing context to the prediction data through historical lines. Make sure to adjust the database paths, JSON fetching logic, and eCharts configuration as per your project's setup.

Add additional costs (such as transfer fees) to the prediction

Add additional costs (such as transfer fees) to the prediction. So that it isn't as confusing as it is now. To color match today's real values and tomorrows prediction in Home Assistant... in my case i have around 5.39 cents of additional costs so that the prediction and real values don't match.

For users with ENTSO-E apikey - should we replace sahkotin integration with ENTSO-E data ?

sahkotin-api was quite slow today and thought about using ENTSO-E integration to replace it

package seaborn is missing from requirements

Looks a like seaborn is missing from requirements.txt

(venv) teme@HooPee:~/nordpool-predict-fi$ python nordpool_predict_fi.py --foreca Traceback (most recent call last): File "/home/teme/nordpool-predict-fi/nordpool_predict_fi.py", line 13, in <module> from util.eval import eval File "/home/teme/nordpool-predict-fi/util/eval.py", line 4, in <module> import seaborn as sns ModuleNotFoundError: No module named 'seaborn'
manual installation pip install seaborn did fix the issue

restrict model to update model when Day Ahead prices are published

Predicted day ahead price is updated when prices are available.
That give too optimistic predictions when reviewing prediction history. 14.3.2024 prediction changed a lot when actual prices was available.

Before Day Ahead prices are published

After Nordpool Day Ahead prices are available

GPT-4 outline: HACS integration (auto-generated)

Auto-generated for future reference for further study. If you've built one of these before, please provide tips and tricks. I haven't.

Quote:

I can guide you through the process of starting this project with code snippets and structure outlines, which you can then expand upon and test in your own development environment. This guidance will cover the key components you need to create, including the manifest, a basic sensor entity, and instructions for fetching data from your provided URL.

Step 1: Define Integration Details

manifest.json

This file describes your integration to Home Assistant. It includes metadata like the domain, name, version, and any dependencies.

{
  "domain": "nordpool_fi_prices",
  "name": "Nordpool FI Electricity Prices",
  "version": "1.0",
  "documentation": "https://example.com/documentation",
  "requirements": ["requests==2.25.1"],
  "dependencies": [],
  "codeowners": ["@yourusername"]
}

Step 2: Implement Data Fetching

You'll need a Python script to fetch the data. Since I can't run live external requests, let's outline what the fetching function could look like:

`api.py`

This module will handle fetching and parsing the electricity prices.

import requests

class NordpoolFIPrices:
    def __init__(self, url):
        self.url = url

    def fetch_prices(self):
        response = requests.get(self.url)
        response.raise_for_status()  # Raises an HTTPError if the response code was unsuccessful
        data = response.json()
        return data

Step 3: Create Sensor Entities

`sensor.py`

This file defines the sensor entities for Home Assistant, using the data fetched from your URL.

from homeassistant.helpers.entity import Entity
from .api import NordpoolFIPrices

class NordpoolFIElectricityPriceSensor(Entity):
    def __init__(self, api: NordpoolFIPrices):
        self.api = api
        self._state = None
        self._attributes = {}

    @property
    def name(self):
        return "Nordpool FI Electricity Price"

    @property
    def state(self):
        return self._state

    @property
    def extra_state_attributes(self):
        return self._attributes

    def update(self):
        data = self.api.fetch_prices()
        # Assuming the first entry is the current price
        current_price = data[0][1]  
        self._state = current_price
        # Add more processing logic as needed

Step 4: Testing and Validation

You would need to test this code in your local development environment, debug any issues, and ensure it works as expected within Home Assistant.

Step 5: Packaging and Documentation

Once you have the code working and tested, package it into a custom component structure:

Create a folder named nordpool_fi_prices in your Home Assistant's custom_components directory.
Inside nordpool_fi_prices, place your manifest.json, sensor.py, and any other necessary files.

Document how to install and use your integration, including any configuration necessary in configuration.yaml or via the UI if you implemented config_flow.py.

Considerations:

This outline simplifies error handling and assumes the data structure is consistent with your description. You'll need to add error checking and handling for network issues or data format changes.
The update method in the sensor does not handle asynchronous updates. Depending on the data update frequency and Home Assistant's best practices, you may want to implement async updates.

Given the complexity and need for testing, this outline should serve as a starting point for your development. Expand upon each section, implement additional error handling, and ensure it meets your needs through testing.