vividfog / nordpool-predict-fi Goto Github PK
View Code? Open in Web Editor NEWA Python app and a Random Forest ML model that predicts spot prices for the Nordpool FI market.
License: MIT License
A Python app and a Random Forest ML model that predicts spot prices for the Nordpool FI market.
License: MIT License
Hello!
What an interesting project! Just to let you know that Foreca's Wind Forecasts are available through this API which might make this project "a bit less fragile": https://www.foreca.fi/api/wind-power?id=100658225
Mandatory disclaimer: I don't know whether it is OK to use that API directly without a contract/agreement with Foreca and cannot take any responsibility of any problems between you and Foreca caused by a possible usage of this publicly accessible API :)
Best Regards,
T3m3z
I wonder if @pkautio could take a look what's happening with our ENTSO-E unavailability data. We get this:
→ ENTSO-E: Combined unavailability of nuclear power plants:
start end avail_qty nominal_power production_resource_name
0 2024-05-29 22:00:00+03:00 2024-06-03 11:00:00+03:00 1570 1600.0 Olkiluoto 3
1 2024-06-03 12:00:00+03:00 2025-03-01 01:00:00+02:00 0 1600.0 Olkiluoto 3
2 2024-08-04 07:00:00+03:00 2024-08-26 00:00:00+03:00 0 496.0 Loviisa 2
It appears as if OL3 would be down until March 2025, and more than 2 GW would be missing from the market at the moment:
→ ENTSO-E: Avg: 2276, max: 2276, min: 2276 MW
timestamp NuclearPowerMW
0 2024-08-11 01:00:00+03:00 2276
1 2024-08-11 02:00:00+03:00 2276
2 2024-08-11 03:00:00+03:00 2276
3 2024-08-11 04:00:00+03:00 2276
4 2024-08-11 05:00:00+03:00 2276
.. ... ...
As a temporary workaround, I've set the nuclear prediction formula to extrapolate the last-known value to the future. At the time of writing this, Fingrid reports a bit over 3 GW of production. Loviisa 2 is under maintenance, and OL3 is producing about 1 GW, according to TVO.fi.
You can run python util/entso_e.py
to reproduce.
Following error appear when running python nordpool_predict_fi.py --predict --commit
https://sahkotin.fi/prices?vat&start=2024-02-23T13:00:00.000Z&end=2024-03-06T13:00:00.000Z Days of data coverage (should be 7 back, 5 forward for now): 12 Traceback (most recent call last): File "/home/teme/nordpool-predict-fi/nordpool_predict_fi.py", line 242, in <module> price_df = rf_model.predict(df[['day_of_week', 'hour', 'month', 'NuclearPowerMW'] + fmisid_ws + fmisid_t]) ~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/teme/nordpool-predict-fi/venv/lib/python3.12/site-packages/pandas/core/frame.py", line 4096, in __getitem__ indexer = self.columns._get_indexer_strict(key, "columns")[1] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/teme/nordpool-predict-fi/venv/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 6199, in _get_indexer_strict self._raise_if_missing(keyarr, indexer, axis_name) File "/home/teme/nordpool-predict-fi/venv/lib/python3.12/site-packages/pandas/core/indexes/base.py", line 6251, in _raise_if_missing raise KeyError(f"{not_found} not in index") KeyError: "['FMISID_WS not set in environment', 'FMISID_T not set in environment'] not in index"
There are a few major issues with the model, the largest being the way the train/test data is split. Since it is based on temporal / time series data each datapoint that is next to each other is highly correlated. If you do a random train/test split you will leak training data into the test set and the as such the model is likely to show artificially high performance on the test set. This phenomenon is known as data leakage, where information from outside the training dataset is used to create the model. In the context of time series data, where measurements are dependent on time, the sequences of data points are often highly correlated with their immediate predecessors and successors. If a random train/test split is employed without considering the temporal nature of the data, some of the information from the training set can inadvertently be included in the test set.
To address this issue and avoid data leakage, you should use a time-based split for the train/test division. This approach ensures that the model is trained on data from a certain period and tested on data from a subsequent period, mirroring how the model would be used in real-world forecasting or time series analysis scenarios. By doing so, the test set acts as a more accurate representation of future, unseen data, and the model's performance metrics are more reliable.
Further, running the trained model on past data that is included in the training set will likely lead to overly optimistic performance metrics. This is because the model has already "seen" this data during training, and as such, it can easily predict these outcomes, which doesn't accurately reflect its ability to predict new, unseen data.
To calculate historic performance you can employ a technique known as walk-forward validation or rolling forecast origin. This method involves incrementally moving the cut-off point between the training and test sets forward in time, training the model on a fixed or expanding window of past data and testing it on the following period. Each step forward allows the model to be tested in a manner that simulates real-world forecasting situations, where only past data is available to predict future outcomes.
By continually retraining the model on the most recent data and forecasting the next period, you can assess how well the model adapts to changes in the data over time. This method provides a more robust and realistic evaluation of the model's predictive performance and its potential effectiveness in practical applications.
This technique can help identify model overfitting early on. Overfitting occurs when a model learns the noise in the training data to the extent that it negatively impacts the performance on new data. Because walk-forward validation tests the model on multiple, consecutive future points, it offers a clearer picture of how the model generalizes to new data. If the model performs well on the training set but poorly on the test sets during the walk-forward validation, it's a strong indication that the model may be overfitting.
Couple of ideas to improve the forecast:
This data should be available from Entso-E as market messages. With Entso-E-py package:
from entsoe import EntsoePandasClient
import pandas as pd
client = EntsoePandasClient(api_key="")
start = pd.Timestamp('202300101', tz='Europe/Helsinki')
end = pd.Timestamp('20241231', tz='Europe/Helsinki')
country_code = 'FI' # Finland
country_code_from = 'FI' # Finland
country_code_to = 'SE_1' # Finland-Northern Sweden
transit_unavailability = client.query_unavailability_transmission(country_code_from, country_code_to, start=start, end=end, docstatus=None, periodstartupdate=None, periodendupdate=None)
Similary query for all connections and directions.
This could be done based on UMM Remit messages. These should be available from Entso-E.
Ensto-E-py package provides ready-made interface to Entso-E.
Is there anyway to get these predictions into a sensor in HA like the normal nordpool sensor. Would be best to update it when new prices are coming and showing the confirmed prices + predictions. Looking to integrate the predictions to emhass in HA
This is auto-generated by GPT-4 based on discussion of various ideas, outline for a future session.
Quote:
To enhance the interactivity and information richness of the Nordpool Spot Price Prediction eCharts visualization, we are implementing a feature that allows users to toggle the visibility of historical prediction lines. These lines represent past predictions and help users visualize the evolution of price forecasts. Each line's opacity will decrease as it goes further back in time, providing a clear visual distinction between recent and older predictions.
-- Table to store snapshots of predictions
CREATE TABLE prediction_snapshot (
snapshot_id INTEGER PRIMARY KEY AUTOINCREMENT,
snapshot_date TIMESTAMP NOT NULL
);
-- Table to store the details of each snapshot
CREATE TABLE snapshot_details (
detail_id INTEGER PRIMARY KEY AUTOINCREMENT,
snapshot_id INTEGER NOT NULL,
timestamp TIMESTAMP NOT NULL,
PricePredict_cpkWh FLOAT,
FOREIGN KEY (snapshot_id) REFERENCES prediction_snapshot(snapshot_id)
);
-- Insert a new snapshot record
INSERT INTO prediction_snapshot (snapshot_date) VALUES (CURRENT_TIMESTAMP);
-- Insert the snapshot details (this would be part of a loop processing your prediction data)
INSERT INTO snapshot_details (snapshot_id, timestamp, PricePredict_cpkWh)
VALUES ((SELECT last_insert_rowid()), '2024-05-29 00:00:00', 10.5);
Implement a backend service to fetch the last 4 snapshots from the database and format them as JSON.
<label>
<input type="checkbox" id="historyToggle" checked>Show History Lines</label>
<div id="main" style="width: 600px;height:400px;"></div>
<script src="path_to_echarts_lib"></script>
var myChart = echarts.init(document.getElementById('main'));
function loadChartData(callback) {
// Simulate fetching JSON data from the backend
var simulatedData = []; // Replace with actual JSON data fetching logic
callback(simulatedData);
}
function toggleHistoryLines(showHistory) {
loadChartData(function (latestData) {
var option = {
series: [{
data: latestData,
// other series options...
}]
};
myChart.setOption(option);
});
}
document.getElementById('historyToggle').addEventListener('change', function() {
toggleHistoryLines(this.checked);
});
toggleHistoryLines(true); // Initialize with history lines visible
This plan includes all the details you need to enhance your chart with a new feature that improves the user's experience by providing context to the prediction data through historical lines. Make sure to adjust the database paths, JSON fetching logic, and eCharts configuration as per your project's setup.
sahkotin-api was quite slow today and thought about using ENTSO-E integration to replace it
Looks a like seaborn is missing from requirements.txt
(venv) teme@HooPee:~/nordpool-predict-fi$ python nordpool_predict_fi.py --foreca Traceback (most recent call last): File "/home/teme/nordpool-predict-fi/nordpool_predict_fi.py", line 13, in <module> from util.eval import eval File "/home/teme/nordpool-predict-fi/util/eval.py", line 4, in <module> import seaborn as sns ModuleNotFoundError: No module named 'seaborn'
manual installation pip install seaborn
did fix the issue
Auto-generated for future reference for further study. If you've built one of these before, please provide tips and tricks. I haven't.
Quote:
I can guide you through the process of starting this project with code snippets and structure outlines, which you can then expand upon and test in your own development environment. This guidance will cover the key components you need to create, including the manifest, a basic sensor entity, and instructions for fetching data from your provided URL.
This file describes your integration to Home Assistant. It includes metadata like the domain, name, version, and any dependencies.
{
"domain": "nordpool_fi_prices",
"name": "Nordpool FI Electricity Prices",
"version": "1.0",
"documentation": "https://example.com/documentation",
"requirements": ["requests==2.25.1"],
"dependencies": [],
"codeowners": ["@yourusername"]
}
You'll need a Python script to fetch the data. Since I can't run live external requests, let's outline what the fetching function could look like:
api.py
This module will handle fetching and parsing the electricity prices.
import requests
class NordpoolFIPrices:
def __init__(self, url):
self.url = url
def fetch_prices(self):
response = requests.get(self.url)
response.raise_for_status() # Raises an HTTPError if the response code was unsuccessful
data = response.json()
return data
sensor.py
This file defines the sensor entities for Home Assistant, using the data fetched from your URL.
from homeassistant.helpers.entity import Entity
from .api import NordpoolFIPrices
class NordpoolFIElectricityPriceSensor(Entity):
def __init__(self, api: NordpoolFIPrices):
self.api = api
self._state = None
self._attributes = {}
@property
def name(self):
return "Nordpool FI Electricity Price"
@property
def state(self):
return self._state
@property
def extra_state_attributes(self):
return self._attributes
def update(self):
data = self.api.fetch_prices()
# Assuming the first entry is the current price
current_price = data[0][1]
self._state = current_price
# Add more processing logic as needed
You would need to test this code in your local development environment, debug any issues, and ensure it works as expected within Home Assistant.
Once you have the code working and tested, package it into a custom component structure:
nordpool_fi_prices
in your Home Assistant's custom_components
directory.nordpool_fi_prices
, place your manifest.json
, sensor.py
, and any other necessary files.Document how to install and use your integration, including any configuration necessary in configuration.yaml
or via the UI if you implemented config_flow.py
.
Given the complexity and need for testing, this outline should serve as a starting point for your development. Expand upon each section, implement additional error handling, and ensure it meets your needs through testing.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.