Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor the model #701

Open
10 of 52 tasks
clizbe opened this issue Jul 29, 2024 Discussed in #688 · 2 comments
Open
10 of 52 tasks

Refactor the model #701

clizbe opened this issue Jul 29, 2024 Discussed in #688 · 2 comments
Assignees
Labels
Type: epic Epic issues (collection of smaller tasks towards a goal)
Milestone

Comments

@clizbe
Copy link
Member

clizbe commented Jul 29, 2024

Discussed in #688

Originally posted by datejada July 1, 2024

Overview of the changes

  • Use TulipaIO instead of directly accessing the data
    We need to generalize the data access and move the logic to TulipaIO. A large part depends on DuckDB-specific things (e.g., connection). Maybe we'll need to create a TulipaIO structure?
  • Remove the dependency on data stored in the graph
    In preparation to remove this data from the graph, we can initially just use the data from some other place.
  • Remove the data from the graph
    Don't save data in the graph. After the issue "Remove the dependency on graph data", this will be simple
  • Create all data and tables necessary for the model before the function #885
    We can control and improve both the creation of the data/tables and the create_model function better if these concerns are separated.
    This includes the "expression" data, e.g., data that informs the incoming and outgoing flow.
    • This includes clustering stuff (representative periods, timeframe, years, etc.)
    • Partition should also be created before the model, though we are missing the "scenario specification" data as a separate entity
  • Sets
  • Variables
    • Define variable tables #884
      The way that we control the indexing - to prevent sparsity issues - is to precompute the indexes of the variables. The way that we are doing it right now is to compute tables for these indexes. Assuming that keeps being the case, we have to define what is necessary for these tables and define their names.
      These tables will probably also save the final result, so they are used outside of the model as well.
    • Refactor variables #923
      This involves:
      • move the construction from construct_dataframes to compute_variables_indices
      • change many places that access the variables through dataframes to instead unpack x = model[:x], or instead use variables. We should try both strategies and see what makes more sense
    • Move JuMP variables out of the DF
    • Instead of DF, use a TulipaIO table #909
      Make sure that we differentiate them from the input tables in DuckDB/TulipaIO
  • Constraints and expressions
    • Define (balance) constraints tables #927
      The way that the (balance) constraints work is by asset. The table defines the necessary information for each of these assets. In the current implementation, one of the columns of this table is the JuMP expression with the incoming flow (and one for the output, and possibly more for extra stuff).
      Constructing the incoming and outgoing flows is a huge issue. It was slow and we had to figure out a nice way to do it.
      Furthermore, if we want to use an external table format, we can't save the expressions in the table. So the format will need to be changed.
      Whatever we do we must be sure that it doesn't compromise performance.
      • ConstraintPartition struct
        • Table: asset, clustering (rp, tf, year), time blocks
        • Vectors for incoming, outgoing
  • Define partition tables #895
    Do we need something separate for partitions in general, or the structs above are sufficient?
  • Reimplement all "add_expression..." function
    The add_expression functions receive an asset-based constraint (e.g., balance) and the flow variable, and computes all incoming and outgoing flow for each row of the constraint.
    The current implementation stores the flow and the resulting expressions in the tables. This will not be possible with a DuckDB table, so we either store the resulting expressions in a vector alongside the constraints, or in a separate structure, or don't pre-computed the resulting expressions.
    Given that we want to precompute as much as possible separately from the model creation, computing the indexes required for the incoming and outgoing expressions might be desired, at first.
  • Renaming/Style

Checklist of how we want things to look

  • Separation of concerns (data - structures - model - solution)
  • Encapsulation/wrappers to improve readability and argument passing
  • Easier identification of what things are (e.g., variable vs sets vs expression) from context or structures
  • Uniform naming and some written coding style guidelines (How and where to time, naming of functions, function argument order, etc.)

Pipeline draft:

  1. Read the data into TulipaIO TulipaData structure (includes connection) (read_csv, etc.)
  2. Create all pre-model structures (graph, clustering stuff (rp and tf), partition stuff)
  3. Create model data and structures
    3.1 Create variables
    3.2 Create constraint partitions
  4. Create model

Input data names

create_input_dataframes

  • Verify if we can use only one df (without using a dict) for assets_profiles and assets_timeframe_profiles (better after having a pipeline with timeframe); have a pipeline example before changing that.
  • table_tree stores all the data in dfs

create_internal_structures

compute_assets_partitions

  • Update compute_assets_partitions! to use DuckDB more efficiently
    • To compute the partitions of timeframe. It should be possible to use a join or similar

compute_constraints_partitions

compute_rp_partitions

  • This function needs a full refactor when changing to table (DuckDB)
    • Big change

solve_model and solve_model!

create_model

  • When using tables the filters will be on the tables not in the graph
  • add_expression_terms_intra_rp_contraints will need a refactor, and it will determine changes we need before and after
  • Notes on code structure:
    • constraints partition: lowest/highest -> to get the correct partition;
    • sum of workspace expressions: sum/unique -> to get the correct expression
    • profile_aggregation functions are in the table in the documentation

AND THEN:

  • Rename input data to make more sense
    • How to include Multi-Year?
  • Separate the model data from the scenario data. The scenario data for multi-year is currently hard-coded in create_model!, this needs to be generalized and relocated (see Create model discount parameters #803).
  • Performance improvements related to multi-year (see Multi-year investment #462)

Maybe also:

@clizbe clizbe added the Type: epic Epic issues (collection of smaller tasks towards a goal) label Jul 29, 2024
@clizbe clizbe added this to the M3 - End Sept milestone Jul 29, 2024
@clizbe clizbe changed the title Results and comments on the TulipaEnergyModel code review Refactor the model Jul 29, 2024
@clizbe
Copy link
Member Author

clizbe commented Jul 31, 2024

solve_model and solve_model!

  • Store the solution into tables (a.k.a. DuckDB) instead of the structure solution and the graph.

@abelsiqueira @datejada Can I try tackling this one? Or should I wait?

@abelsiqueira
Copy link
Member

@clizbe you can do it, it's mostly independent from the rest.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: epic Epic issues (collection of smaller tasks towards a goal)
Projects
None yet
Development

No branches or pull requests

3 participants