hackingmaterials / robocrystallographer Goto Github PK
View Code? Open in Web Editor NEWAutomatic generation of crystal structure descriptions.
Home Page: https://hackingmaterials.github.io/robocrystallographer/
License: Other
Automatic generation of crystal structure descriptions.
Home Page: https://hackingmaterials.github.io/robocrystallographer/
License: Other
E.g.
robocrys mp-7631
The structure is three-dimensional. there are three inequivalent Si4+ sites.
Hi,
When calling:
from pymatgen.core import Structure
from robocrys import StructureCondenser
from robocrys import StructureDescriber
structure = Structure.from_file("C.cif")
sc = StructureCondenser()
describer = StructureDescriber()
condensed_structure = sc.condense_structure(structure)
description = describer.describe(condensed_structure)
I was having this error:
File ".../robocrys/describe/describer.py", line 247, in get_component_makeup_summary
en.join(orientations), s_direction
File ".../pydantic/_internal/_validate_call.py", line 100, in __call__
res = self.__pydantic_validator__.validate_python(pydantic_core.ArgsKwargs(args, kwargs))
pydantic_core._pydantic_core.ValidationError: 1 validation error for join
0.0
Input should be a valid string [type=string_type, input_value=(0, 0, 1), input_type=tuple]
For further information visit https://errors.pydantic.dev/2.4/v/string_type
Fix that helped in describer.py lines 241-249 is adding orientations_str:
if component_group.dimensionality in [1, 2]:
orientations = list(
{c.orientation for c in component_group.components}
)
orientations_str = [str(o) for o in orientations]
s_direction = en.plural("direction", len(orientations))
comp_desc += " oriented in the {} {}".format(
en.join(orientations_str), s_direction
)
I am using Python 3.10.13 and the following package versions:
matminer 0.9.1.dev5
monty 2023.11.3
pydantic 2.4.2
pydantic_core 2.10.1
pydantic-settings 2.1.0
pymatgen 2023.3.23
robocrys 0.2.8
ruamel.yaml 0.17.40
ruamel.yaml.clib 0.2.7
And that is my "C.cif" file:
# generated using pymatgen
data_C
_symmetry_space_group_name_H-M P6/mmm
_cell_length_a 2.46803014
_cell_length_b 2.46803014
_cell_length_c 19.99829300
_cell_angle_alpha 90.00000000
_cell_angle_beta 90.00000000
_cell_angle_gamma 120.00000000
_symmetry_Int_Tables_number 191
_chemical_formula_structural C
_chemical_formula_sum C2
_cell_volume 105.49320255
_cell_formula_units_Z 2
loop_
_symmetry_equiv_pos_site_id
_symmetry_equiv_pos_as_xyz
1 'x, y, z'
2 '-x, -y, -z'
3 'x-y, x, z'
4 '-x+y, -x, -z'
5 '-y, x-y, z'
6 'y, -x+y, -z'
7 '-x, -y, z'
8 'x, y, -z'
9 '-x+y, -x, z'
10 'x-y, x, -z'
11 'y, -x+y, z'
12 '-y, x-y, -z'
13 '-y, -x, -z'
14 'y, x, z'
15 '-x, -x+y, -z'
16 'x, x-y, z'
17 '-x+y, y, -z'
18 'x-y, -y, z'
19 'y, x, -z'
20 '-y, -x, z'
21 'x, x-y, -z'
22 '-x, -x+y, z'
23 'x-y, -y, -z'
24 '-x+y, y, z'
loop_
_atom_site_type_symbol
_atom_site_label
_atom_site_symmetry_multiplicity
_atom_site_fract_x
_atom_site_fract_y
_atom_site_fract_z
_atom_site_occupancy
C C0 2 0.33333333 0.66666667 0.00000000 1
Could you look into this please?
Hi,
Seems like robocrys does not work with the newer versions of Pymatgen. From version 2020.10.9 onwards.
However, with Pymatgen version 2020.9.14 It works fine.
Thanks
You have done very great job!
I think this text representation con be further employed in the machine learning area such as Large language model, and is this representation reversable that the crystal structure and the representation is one-to-one correspondance?
Think this is the wrong way around?
Example is Fe3O4/mp-19306 via current pip installed version:
Fe3O4 is Spinel-derived structured and crystallizes in the trigonal R-3m space group. The structure is three-dimensional. there are three inequivalent Fe sites. In the first Fe site, Fe(1) is bonded to two equivalent O(2) and four equivalent O(1) atoms to form FeO6 octahedra that share corners with six equivalent Fe(3)O4 tetrahedra, edges with two equivalent Fe(2)O6 octahedra, and edges with four equivalent Fe(1)O6 octahedra. Both Fe(1)–O(2) bond lengths are 2.08 Å. All Fe(1)–O(1) bond lengths are 2.11 Å. In the second Fe site, Fe(2) is bonded to six equivalent O(1) atoms to form FeO6 octahedra that share corners with six equivalent Fe(3)O4 tetrahedra and edges with six equivalent Fe(1)O6 octahedra. All Fe(2)–O(1) bond lengths are 2.06 Å. In the third Fe site, Fe(3) is bonded to one O(2) and three equivalent O(1) atoms to form corner-sharing FeO4 tetrahedra. The corner-sharing octahedra tilt angles range from 54–57°. The Fe(3)–O(2) bond length is 1.92 Å. All Fe(3)–O(1) bond lengths are 1.92 Å. There are two inequivalent O sites. In the first O site, O(1) is bonded in a rectangular see-saw-like geometry to one Fe(2), one Fe(3), and two equivalent Fe(1) atoms. In the second O site, O(2) is bonded in a distorted rectangular see-saw-like geometry to one Fe(3) and three equivalent Fe(1) atoms.
Dependabot couldn't authenticate with https://pypi.python.org/simple/.
You can provide authentication details in your Dependabot dashboard by clicking into the account menu (in the top right) and selecting 'Config variables'.
(py37) computron-7:~ ajain$ robocrys mp-975272
Adding oxidation states...
Hg is Magnesium structured and crystallizes in the orthorhombic Cmcm space group. The structure is one-dimensional and consists of one Hg ribbon oriented in the (0, 0, 1) direction. Hg(1) is bonded in a distorted bent 120 degrees geometry to two equivalent Hg(1) atoms. Both Hg(1)–Hg(1) bond lengths are 3.40 Å.
I think crystallographers will have issue with simultaneously calling something "Mg-structured" and "orthorhombic"
the description is a bit clunky when sites are symmetrically inequivalent, but have the same description nevertheless
(py37) computron-7:~ ajain$ robocrys mp-7631
SiC is Moissanite-6H structured and crystallizes in the hexagonal P6_3mc space group. The structure is three-dimensional. there are three inequivalent Si4+ sites. In the first Si4+ site, Si(1)4+ is bonded to one C(2)4- and three equivalent C(1)4- atoms to form corner-sharing SiC4 tetrahedra. The Si(1)–C(2) bond length is 1.90 Å. All Si(1)–C(1) bond lengths are 1.89 Å. In the second Si4+ site, Si(2)4+ is bonded to one C(3)4- and three equivalent C(2)4- atoms to form corner-sharing SiC4 tetrahedra. The Si(2)–C(3) bond length is 1.90 Å. All Si(2)–C(2) bond lengths are 1.89 Å. In the third Si4+ site, Si(3)4+ is bonded to one C(1)4- and three equivalent C(3)4- atoms to form corner-sharing SiC4 tetrahedra. The Si(3)–C(1) bond length is 1.90 Å. All Si(3)–C(3) bond lengths are 1.90 Å. There are three inequivalent C4- sites. In the first C4- site, C(1)4- is bonded to one Si(3)4+ and three equivalent Si(1)4+ atoms to form corner-sharing CSi4 tetrahedra. In the second C4- site, C(2)4- is bonded to one Si(1)4+ and three equivalent Si(2)4+ atoms to form corner-sharing CSi4 tetrahedra. In the third C4- site, C(3)4- is bonded to one Si(2)4+ and three equivalent Si(3)4+ atoms to form corner-sharing CSi4 tetrahedra.
robocrys MyStructure.cif
returns this error:
Traceback (most recent call last):
File "D:\pymatgen\robocrys.py", line 2, in
from robocrys import StructureCondenser, StructureDescriber
File "D:\pymatgen\robocrys.py", line 2, in
from robocrys import StructureCondenser, StructureDescriber
ImportError: cannot import name 'StructureCondenser' from partially initialized module 'robocrys' (most likely due to a circular import) (D:\pymatgen\robocrys.py)
I installed this with
pip install robocrys
The same reproduces with:
from robocrys import StructureCondenser
Traceback (most recent call last):
File "", line 1, in
File "D:\Geology\pymatgen\robocrys.py", line 2, in
from robocrys import StructureCondenser, StructureDescriber
ImportError: cannot import name 'StructureCondenser' from partially initialized module 'robocrys' (most likely due to a circular import) (D:\Geology\pymatgen\robocrys.py)
e.g. this feels a bit too verbose
(py37) computron-7:~ ajain$ robocrys mp-1138
Adding oxidation states...
LiF is Halite, Rock Salt structured and crystallizes in the cubic Fm-3m space group. The structure is three-dimensional. Li(1)1+ is bonded to six equivalent F(1)1- atoms to form a mixture of corner and edge-sharing LiF6 octahedra. The corner-sharing octahedral tilt angles are 0°. All Li(1)–F(1) bond lengths are 2.04 Å. F(1)1- is bonded to six equivalent Li(1)1+ atoms to form a mixture of corner and edge-sharing FLi6 octahedra. The corner-sharing octahedral tilt angles are 0°.
robocrys mp-744259
Generates what looks like a lot of junk (the text might be correct, but it is difficult to human read). One thing to help with this might be to organize the text by section: local environment, connectivity, tilt angles - rather than by site. Also if the text is long, to give only the basics (e.g. skip all the various bond lengths) and prompt the user if they would like to know more. But just having things in clearer sections, maybe multiple paragraphs would help.
Note that according to MP, this is garnet structured (the tag says it is), but this is not picked up by robocrystallographer.
This is usually assumed, and the descriptions would be a bit better without it IMO
Hello,
I want to get all the bond angles in an octahedral. For example, the O-Ni-O bond angles in NiO6 octahedral. I believe there must be some functions in robocrystallographer to do that. Could you please tell me what functions I should use?
Thanks
I use robocrys SiteAnalyzer method to determine the coordination environment of a site in the crystal as follows.
bonded_structure = y.get_bonded_structure(structure_from_cif)
motif_type = SiteAnalyzer(bonded_structure).get_site_geometry(0)
This method captures many environment types. However why the most common environment type, e.g. trigonal prism is not listed? As, the description of materials in materials project for MoS2 are reported as pentagonal pyramid, why no trigonal
prism?
When running robocrystallographer on a disordered structure a cryptic error message is given. Robocrys should detect if the structure is disordered and fail gracefully.
LiCoO2 is described in the literature as a layered (2d) material
See also the description of the corresponding mineral name: https://pdfs.semanticscholar.org/fc95/e3cd55b7c4760b2d1296e64bd5180e9df84f.pdf
(although I never heard of LiCoO2 referred to as Caswellsilverite, just a defect rocksalt)
There are 109 Boolean features, but only 25 numerical features. Most of the discrepancy seems to be related to the fact that numerical frac_sites_<MOTIF>
features do not contain distorted
, edge
, face
, or corner
, while the contains_<MOTIF>
features do.
Hi,
When I run the following command,
from robocrys import StructureCondenser, StructureDescriber
I am getting the following error
File "/home/m3rg2000/miniconda3/envs/my_pymatgen/lib/python3.12/site-packages/pymatgen/core/periodic_table.py", line 526, in from_Z raise ValueError(f"Unexpected atomic number {Z=}") ValueError: Unexpected atomic number Z=119
I am using the following versions:
python==3.12.1
pymatgen==2024.2.8
robocrys==0.2.8
(py37) computron-7:~ ajain$ robocrys mp-1960
Adding oxidation states...
Li2O is Fluorite structured and crystallizes in the cubic Fm-3m space group. The structure is three-dimensional. Li(1)1+ is bonded to four equivalent O(1)2- atoms to form a mixture of corner and edge-sharing LiO4 tetrahedra. There are one shorter (2.01 Å) and three longer (2.02 Å) Li(1)–O(1) bond lengths. O(1)2- is bonded in a body-centered cubic geometry to eight equivalent Li(1)1+ atoms.
Here, there is only a single Li and O, so no need to label it as Li(1) or O(1).
Framework matching only cares about positions of the atoms and not their species, therefore turn off x_diff_weight and distance_cutoffs.
Everything looks pretty good except:
(py37) computron-7:~ ajain$ robocrys mp-23116
Adding oxidation states...
BiCuSeO is Parent of FeAs superconductors structured and crystallizes in the tetragonal P4/nmm space group. The structure is two-dimensional and consists of one BiO sheet oriented in the (0, 0, 1) direction and one CuSe sheet oriented in the (0, 0, 1) direction. In the BiO sheet, Bi(1)3+ is bonded in a 4-coordinate geometry to four equivalent O(1)2- atoms. All Bi(1)–O(1) bond lengths are 2.35 Å. O(1)2- is bonded to four equivalent Bi(1)3+ atoms to form a mixture of edge and corner-sharing OBi4 tetrahedra. In the CuSe sheet, Cu(1)1+ is bonded to four equivalent Se(1)2- atoms to form a mixture of edge and corner-sharing CuSe4 tetrahedra. All Cu(1)–Se(1) bond lengths are 2.52 Å. Se(1)2- is bonded in a 4-coordinate geometry to four equivalent Cu(1)1+ atoms.
Installed robocrys
and openbabel
in a conda env, and after trying robocrys mp-856
, I get the following:
> Traceback (most recent call last):
> File "c:\users\sterg\anaconda3\envs\epdo\lib\runpy.py", line 194, in _run_module_as_main
> return _run_code(code, main_globals, None,
> File "c:\users\sterg\anaconda3\envs\epdo\lib\runpy.py", line 87, in _run_code
> exec(code, run_globals)
> File "C:\Users\sterg\anaconda3\envs\epdo\Scripts\robocrys.exe\__main__.py", line 4, in <module>
> File "c:\users\sterg\anaconda3\envs\epdo\lib\site-packages\robocrys\__init__.py", line 1, in <module>
> from robocrys.util import common_formulas
> File "c:\users\sterg\anaconda3\envs\epdo\lib\site-packages\robocrys\util.py", line 23, in <module>
> from pymatgen import Element
> ImportError: cannot import name 'Element' from 'pymatgen' (unknown location)
Probably has to do with the breaking changes of v2022. See MatSci discussion
My version is: pymatgen==2022.0.5
Pymatgen has lists of common oxidation states. It would be interesting to say, eg “This material has X in a Y oxidation state, which is unusual.”
Can we get a release for the changes that fix pymatgen compatibility? You're more than welcome to copy all my automated workflows from maggma
for automatic releases based on PR labels.
Hello,
I am using latest version of robocrys and spglib, on MAC Catalina OS with Python 3.8.1
while using from CLI, whenever I issue the command
robocrys filename --no-makeup
for POSCAR/.vasp/.cif any format.
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/bin/robocrys", line 11, in
load_entry_point('robocrys==0.2.1', 'console_scripts', 'robocrys')()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/robocrys/cli.py", line 190, in main
robocrystallographer(structure, condenser_kwargs=condenser_kwargs,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/robocrys/cli.py", line 51, in robocrystallographer
description = describer.describe(condensed_structure)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/robocrys/describe/describer.py", line 130, in describe
return " ".join(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/robocrys/describe/describer.py", line 133, in
if description[part] is not "")
KeyError: 'component_makeup'
PS: this error shows up, for options --no-makeup, --no-mineral and --no-components
** from CLI, and from Python API both**
am I doing something wrong?
Thanks
The only log message I see when running robocrystallographer is "Adding oxidation states"
When running a structure that takes a long time, e.g.:
https://www.materialsproject.org/materials/mp-744259/
It is unclear whether the long time is due to the oxidation states, or some other structure analysis step since the log doesn't tell me when it's moving on to the next step.
I would suggest either no logs, or comprehensive logs
e.g. see mp-13, Im-3m, single atom at origin -- isn't this exact body-centered cubic and not "distorted body-centered cubic" ? (v0.1.3)
Hello,
I am using latest version of robocrys and spglib, on MAC Catalina OS with Python 3.8.1
while using from CLI, whenever I issue the command
robocrys filename --symprec 0.0001
for POSCAR/.vasp/.cif and for any value of --symprec
Program terminates with following error from SPGLIB:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/bin/robocrys", line 11, in
load_entry_point('robocrys==0.2.1', 'console_scripts', 'robocrys')()
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/robocrys/cli.py", line 190, in main
robocrystallographer(structure, condenser_kwargs=condenser_kwargs,
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/robocrys/cli.py", line 50, in robocrystallographer
condensed_structure = sc.condense_structure(structure)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/robocrys/condense/condenser.py", line 103, in condense_structure
sga = SpacegroupAnalyzer(structure, symprec=self.symprec)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/pymatgen/symmetry/analyzer.py", line 90, in init
self._space_group_data = spglib.get_symmetry_dataset(
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/spglib/spglib.py", line 237, in get_symmetry_dataset
spg_ds = spg.dataset(lattice, positions, numbers, hall_number,
TypeError: must be real number, not str
PS: The same runs fine from Python API.
Am i doing anything wrong?
Thanks
As shown here, the fractional oxidate state of Au^{0.33-} is incorrectly/confusingly displayed in text form as Au^{+0.33-}. A rare edge-case, admittedly.
Are the materials project IDs for the 114,300 (>500 meV/atom) compounds used in the robocrystallographer paper available? Alternatively, any suggestions for doing this directly using the API? (I imagine doing so isn't entirely difficult)
e.g. for Tl (HCP)
(py37) computron-7:~ ajain$ robocrys mp-82
Adding oxidation states...
Tl is Magnesium structured and crystallizes in the hexagonal P6_3/mmc space group. The structure is three-dimensional. Tl(1) is bonded to twelve equivalent Tl(1) atoms to form a mixture of corner, edge, and face-sharing TlTl12 cuboctahedra. There are six shorter (3.53 Å) and six longer (3.55 Å) Tl(1)–Tl(1) bond lengths.
it says "Magnesium structured", but nicer if it said "Magnesium structure (hexagonal close packed)"
I tried to dump textual description of a few MP crystals and found for a few crystals like 'VSe2', 'MoW3(SeS3)2', 'Mo3W(Se3S)2' etc I am getting the following error :
Traceback (most recent call last):
File "/dstore/home/kdas/Crystal_Multi_Modal/src_text/script_mat_desc.py", line 41, in
description = describer.describe(condensed_structure)
File "/home/kdas/anaconda3/envs/mattext/lib/python3.10/site-packages/robocrys/describe/describer.py", line 130, in describe
description["component_makeup"] = self.get_component_makeup_summary()
File "/home/kdas/anaconda3/envs/mattext/lib/python3.10/site-packages/robocrys/describe/describer.py", line 251, in get_component_makeup_summary
en.join(orientations), s_direction
File "pydantic/decorator.py", line 40, in pydantic.decorator.validate_arguments.validate.wrapper_function
File "pydantic/decorator.py", line 133, in pydantic.decorator.ValidatedFunction.call
File "pydantic/decorator.py", line 130, in pydantic.decorator.ValidatedFunction.init_model_instance
File "pydantic/main.py", line 342, in pydantic.main.BaseModel.init
pydantic.error_wrappers.ValidationError: 1 validation error for Join
words -> 0
str type expected (type=type_error.str)
Why it is happening?
Snippet you may find useful for this (implemented lazily, I piggy-backed off latexify_spacegroup
):
def unicodeify_spacegroup(spacegroup_symbol):
# TODO: move this to pymatgen.util.string ?
subscript_unicode_map = {
0: "₀",
1: "₁",
2: "₂",
3: "₃",
4: "₄",
5: "₅",
6: "₆",
7: "₇",
8: "₈",
9: "₉",
}
symbol = latexify_spacegroup(spacegroup_symbol)
for number, unicode_number in subscript_unicode_map.items():
symbol = symbol.replace("$_{" + str(number) + "}$", unicode_number)
overline = "\u0305" # u"\u0304" (macron) is also an option
symbol = symbol.replace("$\\overline{", overline)
symbol = symbol.replace("$", "")
symbol = symbol.replace("{", "")
symbol = symbol.replace("}", "")
return symbol
See:
https://www.materialsproject.org/materials/mp-24850/
https://www.materialsproject.org/materials/mp-18820/
Note that bond lengths may not indicate layered, but these are typically considered layered (e.g., according to the definition that you can slice a plane through it and not hit any items)
Also check against:
http://www.chemtube3d.com/solidstate/SS-Ca4Mn3O10.htm
"There is three shorter"
(py37) computron-7:~ ajain$ robocrys mp-1143
Al2O3 is Corundum structured and crystallizes in the trigonal R-3c space group. The structure is three-dimensional. Al(1)3+ is bonded to six equivalent O(1)2- atoms to form a mixture of edge, face, and corner-sharing AlO6 octahedra. The corner-sharing octahedra tilt angles range from 48–60°. There is three shorter (1.87 Å) and three longer (1.99 Å) Al(1)–O(1) bond length. O(1)2- is bonded to four equivalent Al(1)3+ atoms to form a mixture of distorted edge and corner-sharing OAl4 trigonal pyramids.
Any way to detect that Li forms tunnels / channels in this structure?
https://www.materialsproject.org/materials/mp-504366/
Note that:
This might be complicated, but will be nice if it can actually detect it...
E.g. https://materialsproject.org/materials/mp-556288/
... a faceface with one LiO6 octahedra, and a faceface with one ReO6 octahedra.
e.g. see https://materialsproject.org/materials/mp-1111410/
The structure is three-dimensional and consists of eight sodium molecules and one TiHgF6 framework.
Also, perhaps if only a single framework "and one" could become "inside a" instead?
The paper discusses how robocrystallographer was applied to ~110k materials from the Materials Project. Naturally, it's also integrated directly into Materials Project. Are the JSON files and/or feature vectors available for this large dataset? I can run this mysel too, but figured if the data can be shared, I would ask. The feature vector list would be preferred. I'd also be fine to grab it via an API, but it didn't seem like that was an option.
Alternatively, perhaps robocrys mp-856
shortcuts the actual computation and pulls the JSON data directly from https://materialsproject.org?
Hello! I'd like to use RoboCrystallographer with the Jarvis-DFT 3D dataset. However, I only have a .json file. Could you please guide me on how to generate the textual description of the crystal? Thank you for your kind assistance!
Here's the code to obtain the JARVIS-DFT 3D dataset:
from jarvis.db.figshare import data
dft_3d = data(dataset='dft_3d')
e.g. frac_sites_1_coordinate, frac_sites_2_coordinate, up through frac_sites_12_coordinate
What does this mean?
Why is 12 the max here?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.