Question about the problem encountered in "Cubic Crystal Test.ipynb" under the "examples" folder in the original M3GNet #99
-
Could I ask a question that is not in the Matbench framework? It's a question related to the original M3GNet model on https://github.com/materialsvirtuallab/m3gnet GitHub. In M3GNet, in the file data = pd.read_html("http://en.wikipedia.org/wiki/Lattice_constant")[0]
data = data["Crystal structure"][0].
~data["Crystal structure"].isin(
["Hexagonal", "Wurtzite", "Wurtzite (HCP)", "Orthorombic", "Tetragonal perovskite", "Orthorhombic perovskite"]
)
]
data.rename(columns={"Lattice constant (Å)": "a (Å)"}, inplace=True)
data.drop(columns=["Ref."], inplace=True)
data["a (Å)"] = data["a (Å)"].map(float)
data = data[["Material", "Crystal structure", "a (Å)"]]
data = data[data["Material"] ! = "NC0.99"] In the code above: data["a (Å)"] = data["a (Å)"].map(float) This line of statement is to convert the string type to float type. However, the column Lattice constant (Å) (that is, the column a (Å)) in the first table in this url: https://en.wikipedia.org/wiki/Lattice_constant includes multiple string formats, such as 3.567, a = 3.533 c = 5.693, and a = 5.27 b = 5.275 c = 7.464. What should be done about the case where more than one value is included? If we map it to float type directly, then an error will occur. Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
in the future, such questions are better asked on stackoverflow. you can pass any custom function to import re
import pandas as pd
data = pd.read_html("http://en.wikipedia.org/wiki/Lattice_constant")[0]
def extract_first_float(value: str) -> float:
try:
return float(value)
except ValueError:
match = re.search(r"[-+]?\d*\.\d+|\d+", value)
if match:
return float(match.group())
else:
return float("nan")
data["a (Å)"] = data["a (Å)"].map(extract_first_float) |
Beta Was this translation helpful? Give feedback.
-
Thanks for the reply, I will ask this kind of question on Stack Overflow in the future. |
Beta Was this translation helpful? Give feedback.
in the future, such questions are better asked on stackoverflow.
you can pass any custom function to
pd.DataFrame.map()
. here's an example of converting the first value in each string. you could change the function to return tuples with all floats instead.