Data Science Consulting for Electric Energy Consumption Analysis and Forecasting
The Customer is an international company providing managed software solutions and consulting services for businesses operating in the energy sector.
The Customer initiated the development of a cloud-based data analytics software product for electric power companies, which could facilitate electric energy consumption analysis, deliver accurate electric energy consumption forecasts (hourly, daily, and weekly), and become the basis for load forecasting and price determination.
Their project became stalled as the Customer needed a third-party review for the already developed part of software to define its strengths and weaknesses and get detailed recommendations on enhancing its analytical capabilities and designing the required machine learning (ML) models.
ScienceSoft’s team of data scientists and data engineers started with the analysis of the Customer’s business objectives and requirements for the future software product. After that, they reviewed the existing software architecture and suggested the enhanced architecture (Figure 1) in accordance with the Customer’s strategic and tactical goals.
Figure 1. Enhanced software architecture.
ScienceSoft’s experts continued with the review of ML code and suggested creating ML models according to the following process:
- Data acquisition.
- Exploratory data analysis to investigate data for patterns and anomalies, test hypothesis and check assumptions.
- Configuration of parameters and hyperparameters for training models.
- Data preprocessing (dealing with missing values, categorical values, etc.).
- Model exploration.
- Model training.
- Model evaluation and tuning.
- Model retraining based on new data.
To ensure the high accuracy of ML models, the consulting team recommended that the Customer:
- Use the full potential of exploratory data analysis (EDA) to examine the interrelationships between features, discover interesting subsets, and determine correlations between predictor and the target variables.
- Employ pure time series models, including LSTM-based neural networks like Seq2Seq models.
- Use classical ML models like LightGBM or XGBoost for time series forecasting and revealing the time-dependent data nature.
- Incorporate additional data (public holidays, local events, day length, geographical position, etc.) to move model forecasts to the next level of precision.
- Employ online machine learning to train models on newly coming data.
The Customer obtained high-level software architecture and detailed recommendations on how to create ML models for accurate forecasting. Delivered software would enable electric power companies to get accurate short-term and mid-term forecasting about electric energy consumption, improve load management and price determination processes.
Technologies and Tools
Google Cloud Platform, Microsoft SQL Server, Pandas, Python, Scikit-learn, TensorFlow, NumPy, Jupyter.