Coder Social home page Coder Social logo

image

$ whoami

I am an engineer ⚙️ who specializes in designing, developing, and deploying computational tools to solve scientific problems alongside subject matter experts in a wide range of disciplines. I use data science, AI/ML, molecular simulations, and other advanced modeling tools to make data-driven discoveries in fields like material science, nuclear chemistry, food science, and biology. I have a PhD in Chemical Engineering with a concentration in computational thermodynamics and a certificate in Computational and Information Science. You can read more about ongoing research on ResearchGate.

Broad research areas include: 🔥 Thermodynamics, 💠 Material science, 🍣 Food authenticity, 〽️ Machine Learning

$ man -a mahynski

Developing reproducible, transparent modeling pipelines and methods requires standardized open-source tools. PyChemAuth is the main package I have developed to help chemometricians, cheminformatics professionals, and other researchers build end-to-end data science workflows from exploratory data analysis, to model optimization and comparison, to public distribution. Most data-driven projects below rely on this package. Check out the course and API Examples for more information.

:atom: Developing tools for advanced stable isotope and trace element metrology

tl;dr

Stable isotope ratios of light elements (e.g., H, C, O, N, S) and trace elemental (SITE) composition profiles are often the preferred choice of features used to model determining geographic origin of many consumer products including food. They are correlated with biogeochemical fractionation processes associated with local climate, geology, and pedology resulting in different transfer rates from natural sources (e.g., water, soil, atmosphere) to plant or animal tissues. Accurate measurements and predictive models of provenance are required to validate origin and other characteristics (organic vs. conventional farming practices) of consumer products to secure supply chains.

Products


💧 Predicting fluid phase thermodynamic properties with deep learning and coarse-grained modeling

tl;dr

The design of next-generation functional materials, central to numerous modern technologies, relies heavily on accurate thermophysical property models of chemical mixtures. Molecular-level models are required to understand their behavior and basic physics. Developing these models is computationally expensive so coarse-grained (simplified) forcefields, and predictive models with a high degree of transferrability beyond their training data, are required. "Thermodynamic extrapolation" is a method I developed at NIST to extract orders of magnitude more data and predictive capabilities from existing molecular simulations; it has since been improved and advanced by others. See NIST Accolade for details.

Products

Selected Publications


🍓 Authenticating food labeling claims with machine learning and statistical modeling

tl;dr

Food fraud refers to the deliberate substitution, addition, tampering, or misrepresentation of food with the express purpose of economic gain for the seller. This has been estimated to cost the global food industry more than $10 billion per year, although expert estimates from the US FDA put the cost as high as $40 billion per year, impacting 10% of all commercially sold food, creating a risk to public health and erosion of trust. Accurate measurements and predictive models of food provenance are required to combat this. While there are many conventional chemometric tools designed for this task, the recent resurgence of interest in machine learning algorithms, which have achieved previously unparalleled accuracy on many predictive tasks, invites the question of whether similar gains can be made in this arena. Here we build and compare state-of-the-art models for food authentication to determine the impact that AI/ML algorithms can have on field which is typically plagued by small amounts of reliable data, and require a high degree of explainability to be legally implemented.

Publications


🐦 Analyzing trends in biorepositories using explainable machine learning

tl;dr

Environmental monitoring efforts often rely on the bioaccumulation of persistent, often anthropogenic, chemical compounds in organisms to create a spatiotemporal record of ecosystems. Samples from various species are collected and cryogenically stored in biobanks to create a historical record. Compounds generally accumulate in upper trophic-level organisms due to biomagnification, reaching levels that can be detected with modern chemical instruments. However, finding proper indicators of global trends is complicated owing to the complex nature and size of many ecosystems of interest; e.g., the pacific ocean. Intercorrelation between compounds often results from the origin, uptake, and transport of these contaminants throughout the ecosystem and may be affected by organism-specific processes such as biotransformation. We developed explainable machine-learning models which perform nearly as well as state-of-the-art "black boxes" to make predictions about the environment and the organisms within it. The benefits of interpretability usually outweigh the improved accuracy of more complex models, since they help reveal rational, explainable trends that engender trust in the models and are considered more reliable.

Publications


☢️ Identifying materials using non-targeted analysis methods

tl;dr

Each year less than 5% of the nearly 25 million containers arriving at US borders are selected for physical examination facilitating the import of fraudulently labelled, adulterated, and illegal substances. This fraud circumvents antidumping and countervailing duties which has cost the US government nearly $5 billion over the past 20 years and industries much more. Automated high-throughput, non-destructive general purpose scanners that can identify materials could meet this need. Prompt gamma-ray activation analysis (PGAA) is a nuclear spectroscopy technique which meets these criteria, and can provide a spectral fingerprint identifying the isotopic composition of a sample. We developed various statistical models, and CNN-based deep learning ones, illustrating that many materials can be positively identified using these spectral signals under real-world, "open set" conditions.

Publications


💠 Designing colloidal self-assembly by tiling Escher-like patterns

tl;dr

Colloidal films play a central role in technologies ranging from microelectronics to pharmaceutical delivery systems. The two-dimensional (2D) pattern of the film and its void fraction control material properties like catalytic activity, mass transfer resistance, optical properties, and hydrophobicity. Scalable production of these films relies on their self-assembly, rather than directed assembly, to make them economical and practical. Engineering colloidal self-assembly to achieve specific designs often involves tuning the shape of a colloid and creating enthalpically interacting "patches" on its surface; however, the precise connection between these factors and the final self-assembled structure is still an active area of research. We developed an approach, based on a technique known as "Escherization," to design colloids in a way that enables a priori control over the final structure's porosity and symmetry simultaneously. This is inspired by the art and mathematics behind the Dutch graphic artist M. C. Escher. Our techniques can also be used to enumerate different crystal structures and design "structure directing agents" to create arbitrary 2D patterns.

Publications

More Information

  • For an interactive experience, check out Craig Kaplan's online demo of the tiles, and modifications thereof, this theory is built on.

💬 Extractive summarization of scientific data and documents with large language models

tl;dr

Natural language processing (NLP) tools have seen incredible advances in recent years. Modern AI tools enable text extraction, document summarization, and corpus querying using natural language that provides a new avenue to interact with data. Retrieval augmented generation (RAG) is a particularly useful tool for interacting with data that has privacy concerns associated with it. RAG systems enable one to parse, query and have a "conversation" with these documents enabling one to retrieve information, create summaries and extract data. RAGs are:
  • Based on specific document(s)
  • Can cite their sources, making them more trustworthy
  • Do not require retraining or fine-tuning of an underlying large language model

With the right prompt optimization and topic modeling their performance can be increased even further for domain-specific applications.

Products


📔 Notes and HowTo are available as Gists.

$ cat /home/mahynski/.profile | more

Firefox Google Drive Linux Ubuntu Git GitHub GitFlow GitHub Actions Visual Studio Code Docker C C++ Python Shell Script Stack Overflow CMake Markdown LaTeX Colab Run on Gradient Anaconda Jupyter Notebook scikit-learn Keras HuggingFace WandB OpenAI LlamaIndex LlamaParse Langfuse NVIDIA-AI-Workbench NumPy SciPy Pandas Matplotlib Plotly Streamlit Blender Gimp Gnu Image Manipulation Program Inkscape LinkedIn Slack Dracula DEV Profile

Top Langs

Nathan A. Mahynski's Projects

pychemauth icon pychemauth

Chemometric analysis methods implemented in python

tactile icon tactile

A C++ library for representing, manipulating, and drawing periodic tilings of the plane.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.