amkrajewski / nimcso Goto Github PK

nim Composition Space Optimization is a high-performance tool leveraging metaprogramming to implement several methods for selecting components (data dimensions) in compositional datasets, as to optimize the data availability and density for applications such as machine learning.

Home Page: https://nimcso.phaseslab.org

License: MIT License

Nim 57.07% TeX 39.60% Python 3.19% Dockerfile 0.14%

data-analysis data-optimization data-science materials-informatics metaprogramming nim nim-lang

nimcso's People

Contributors

Stargazers

Watchers

Forkers

bdice

nimcso's Issues

JOSS Review Comments - Reviewer 1

Below are initial (itemized) JOSS review comments from @Henrium. I will progressively work on addressing them one-by-one here.

I have tested in both GitHub Codespaces and Linux, the package is easy to install and works as claimed.
Summary: I suggest the following to make it more accessible to "diverse, non-specialist audience": (1) introduce the background first, then what nimCSO is and what it does; (2) elaborate on the purpose and challenges.
State of field: What are some other approaches to compositional space optimization; are there relevant software? References should be added if applicable. It's not necessary to compare with them, but good to make the paper informative.
In quickstart.ipynb: the routine mostCommon is clear at first, but got confusing when it comes to "removing elements". What's the optimization objective of removing elements?
The "Algorithm-Based Search" method relies on an assumption, "elements present in already expanded ...", is it supported by any rationale, experiments, prior studies, etc.?
I didn't find "community guidelines", though it doesn't seem necessary here. Consider adding one?

JOSS Review Comments

Hi @amkrajewski, I am one of the reviewers for your paper: openjournals/joss-reviews#6731

I am copying a portion of the reviewer checklist here, with some notes that I have for improvement. I've checked most of the boxes on the original review, so I will leave those out of this list.

I am also opening a pull request with some suggested changes: #4

I will add some comments on that pull request and link to the corresponding notes in this issue.

Review checklist for @bdice

Functionality

Installation: Does installation proceed as outlined in the documentation?
- The GitHub Codespaces took a long time to open (>5 minutes). I would try consolidating the conda install commands. The nim packages on conda-forge depend on conda packages of gcc so it is possible to avoid the 'apt install gcc' steps entirely.
- apt-get install nim did not work on my Ubuntu system. Perhaps the "universe" channel is not enabled by default? I see the package here: https://packages.ubuntu.com/noble/nim but it is version 1.6, while this software claims to require Nim 2.0 in nimcso.nimble. Please verify that sufficiently recent versions of nim are available from the recommended installation paths.
Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

I looked at the "native Python" benchmark first. It appears that the NumPy benchmark is not measuring the same amount of work as the native Python benchmark. For example, the construction of elMatrix from the input data should be a part of the timing for the NumPy benchmark (and similarly for the nimCSO benchmark). It's okay to compare algorithm runtime that excludes the CSV parsing (such as the cost to create elementalList in the native Python benchmark) but it is not a fair comparison to pre-process that parsed data by converting into a matrix for NumPy or bit-packed array. Make sure that all benchmarks are measuring as close to the same thing as possible when making relative claims about performance. I understand it is important to amortize that cost of creating a bit-packed array for the input data when computing across many element lists, but it's a bit harder to compare to the native Python solution. I suggest trying to find a middle ground, or at least giving more description of what is being measured.

Documentation

A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?

The statement of need describes applications in economics and medicine as well as materials. I understand the materials science application (which is closer to my field of expertise, and is substantiated by the paper) but the economics and medicine use cases need citations.

Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.

I'm new to nim and had some confusion while installing. It seems like it is best practice to use nimble install --depsOnly rather than listing the dependencies directly in the README. Also, I tried nimble install and nimble develop but was unsure how to get the nimCSO executable onto my PATH. The compile command nim c -r -f -d:release src/nimcso produces an executable in src/nimCSO.out but that doesn't seem like the proper "final destination." I am thinking of tools like pip install that can place entry points or executables directly onto your PATH.

Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

The repository would benefit from a CONTRIBUTING.md. It does not appear that the repository has had many issues or pull requests. It would be good to explicitly instruct users to file issues and pull requests.

I strongly recommend configuring a tool like pre-commit to automatically trim trailing whitespace and format the code. Having enforceable style conventions, even a minimal set, smooths the process of contributing to an open-source project. If you'd like help here, I can explain more and file a PR.

Software paper

Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?

I think the paper would benefit from a toy example, explaining what it does. It took me some time to understand precisely what is meant by "selecting components" and phrases like "least preventing". Perhaps a toy example could look something like this?

You run a pizza store and want to simplify the menu. Customers can choose from over 30 toppings. You have sales data about the toppings on every pizza purchase from the last year. Find the best combination of 15 toppings that will satisfy the most orders. How many more orders could be satisfied if you had made a menu with the best 16 toppings? The best 14 toppings?

Obviously this is very contrived, but I think the problem statement and motivation for this work could be clearer, and connect the domain field (materials science / compositionally complex materials) to the broadly-applicable selection algorithms implemented in nimCSO.

Second part of the review begins here

State of the field: Do the authors describe how this software compares to other commonly-used packages?

I skimmed the paper on a second read but didn't see any comparisons to other software. How do people solve this problem today? Is it all custom scripts or are there other libraries out there? (e.g. in other programming languages, with different search algorithms, etc.) Especially with the mention of other fields where this is useful (economics, medicine), I think prior art probably exists.

Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?

There are a few typos that I have addressed in #4. Generally the writing quality is good.

References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Some things noted above would be better with a citation (describing examples in economics/medicine, and adding references to other software in this space). Citation style looks fine. One note: the citation for "Genetic algorithms in search, optimization and machine learning" (https://doi.org/10.5860/choice.27-0936) doesn't looks like a useful link. It goes to a login page that has no information about the title. Can you verify this?

JOSS Review Comments

Overall the package is well organized with clear documentation, installation instructions, and examples. Below are a few relatively small items that were encountered:

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.