snakedoc / superd_legacy Goto Github PK
View Code? Open in Web Editor NEWA file duplication utility in pure java.
License: Apache License 2.0
A file duplication utility in pure java.
License: Apache License 2.0
I think now would be a good time to pull thagan into unstable.
Package libglass.so with lib directory so JavaFX native dependicies on linux are met with OpenJDK
Currently OpenJDK Linux users are unable to launch the JavaFX gui without these libs.
Do we want to move the recursive Walk() to its own class?
I might experiment with this in branch thagan
Reference #12
Need to fix opening a new db connection every iteration of DedupeSQL.writeRecord(). Also need to move the creation of the prepared statement so that it's only created once and we save the i/o call. On the db, the prepared statement is only really made once, all other attempts to create it get ignored since it lives in the databases "cache of recent queries"... until it decides to get rid of it (db is closed). But we still waste cpu time doing wasted i/o to tell the db to create a statement that we know exists.
CheckDupes class is broken. I took a look through it and have no idea what you're doing (My SQL is rusty) Think you can handle this?
Will increase performance and be more reliable.
Create a Contributors file and reference it in the copyright notice. This is because as contributors work on the project, they should get public credit, but listing everyone in the copyright and license notice is messy. Moving all names into Contributors file and list by date they joined/contributed to the project. Allow a contributor to pimp something of their choice, such as personal website, company, project, product, interesting fact, etc. Allow contributors to update their text whenever they wish and issue pull request for that file. NO pull requests will be accepted if they make a change to the Contributors file as well as something else (code updates). Contributors file changes will only be accepted if it's the sole file changed in a commit. This is to avoid minor whitespace changes, etc that would clutter normal real commits.
We need to add a command line option.
How can I setup the project?
Probably lean towards the CLI for now since it makes the program more reusable in other program and later projects. the jcurses library looks very nice and we could do something simple and have a couple columns and use selectable checkboxes or soemthing.
Issue needs research.
This is to make it easier and more intuitive to use the program. Also, ToolTip for every table row might be nice, this way user can see full cell contents without having to resize columns or the window.
Look into using:
PreparedStatement.addBatch()
PreparedStatement.executeBatch()
H2.db.getConnection().commit()
This may increase performance by reducing the constant tiny I/O all the time during DedupeR.class run.
Would be cool to gather statistics on how people use the program an data it finds. We could implement a basic web service (hosted at snakedoc.net) that listens for the program to tell it data, storing it in a database then locally for analyis later on.
Data would be anonymous - only file sizes and hashes would be transmitted. This would allow us to gather stats on number of files people have in drives/directories, what size those files are, etc. Essentially we would be collecting data on how much duplicated data exists in the world, not only on specific drives, but for all users of the program (how many users have files with the same hash?). Completely out of score for the core of the program, but might be interesting none-the-less.
This is for good 'ol documentation and so we can generate the javadoc html.
We can just move the end of program stuff to another class and have it runs it's own queries to get whatever info we need out of the database. having the checkdedupes and deduper classes run program logic and do stats gathering and display really doesn't make them very reusable.
This could yield a massive performance increase in some circumstances (tons of small files, causing lots of small i/o to database quickly), etc. Instead of being a file for the db, it's all in memory, meaning it's lightning fast. Downside is database is not persistent since when program either crashes ungracefully (not caught in a try/catch and handled) or terminates, the db is gone. We could have it "sync out" to a db backup stored in the filesystem, for persistence, then read that into memory when the program starts up again.
System needs more available memory than the expected size of the db, which could be a problem on small or embedded systems. Available memory detection could be built in to check if this requirement was met at startup...
log4j library is already bundled, we just need to use it.
I'm not sure if the java Properties API will recognize /** */ as valid comments or if it would result in undefined behavior. The standard I've seen used is the normal # symbol, but this does not mean it's the only thing Properties API will read as comments, it may just be looking for LF followed by property name = property value.
Needs testing and confirmation we can read and write reliably to the properties file using /** */ as comment markers.
Don't know if you want to take a crack at this one. Seems that sometimes Walker scans a file more than once... Not sure why...
The database should throw any duplicates out because we used the UNIQUE constraint on file_path column in the table, but obviously it's best to not scan multiple times if we don't have to.
It seems to only scan files more than once occasionally... weird. See the screenshot from the console out:
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.