DAGFlow is a tool to create and manage directed acyclic graphs for training and deploying machine learning models.
I noticed myself creating lots of duplicated functionality across many ML projects. With DAGFlow, I hope to create a framework that encourages modular processing/querying for easy reuse of components across many projects. I also wanted a low-code solution to allow deployment of ML pipelines for scientists without webdev experience.
The core object in DAGFlow is the "flow"'. A flow defines a sequence of transformations of data through "Nodes". Below is an example flow that takes an input dataframe with a SMILES field (a string representation of molecule) and adds a column containing the bond adjacency matrix of the molecule:
from Flows import createflow
from Node import nodify
from rdkit import Chem
import pandas as pd
@nodify(node_type='Source', fields={'SMILES' : 'SMILES'})
def ChemCSVReader(inp : str) -> pd.DataFrame:
df = pd.read_csv(inp)
return df
def ChemAddMol(inp : pd.DataFrame) -> pd.DataFrame:
inp['MOLS'] = inp['SMILES'].apply(Chem.MolFromSmiles)
return inp
def GetMoleculeGraph(inp : pd.DataFrame) -> pd.DataFrame:
inp['GRAPH'] = inp['MOLS'].apply(Chem.rdmolops.GetAdjacencyMatrix)
return inp
def test_flow():
df = ChemCSVReader('./solubility_data.csv')
df = ChemAddMol(df)
df = GetMoleculeGraph(df1)
return df
output = test_flow.run()
No DAG library is complete without a slick GUI. DAGFlow includes DAGWeb, a flask app for creating DAG's with a drag and drop UI. Here is the same flow as above represented in the GUI.
The GUI contains a basic type checker (right now type annotations must be included but this will be optional in future), ensuring only nodes of compatible types can be linked together.