ark-builders / arklib-android Goto Github PK

Gradle wrapper for ARKLib, for usage in Android projects

License: MIT License

Rust 11.96% Kotlin 87.80% Shell 0.24%

android gradle kotlin kotlin-library library rust jni jni-android jni-android-library

arklib-android's Introduction

ArkLib for Android

This is a wrapper of ArkLib which enables you to build Android apps, powered by resource indexing, previews generation and user metadata support such as tags or scores.

⚠️ WARNING
The following information is only for developers.

Importing the library

Github packages with credentials is a workaround since JCenter is shutdown

Add the following script to project's build.gradle:

allprojects {
    repositories{
        maven {
            name = "GitHubPackages"
            url = "https://maven.pkg.github.com/ARK-Builders/arklib-android"
            credentials {
                username = "token"
                password = "\u0037\u0066\u0066\u0036\u0030\u0039\u0033\u0066\u0032\u0037\u0033\u0036\u0033\u0037\u0064\u0036\u0037\u0066\u0038\u0030\u0034\u0039\u0062\u0030\u0039\u0038\u0039\u0038\u0066\u0034\u0066\u0034\u0031\u0064\u0062\u0033\u0064\u0033\u0038\u0065"
            }
        }
    }
}

And add arklib-android dependency to app module's build.gradle:

implementation 'dev.arkbuilders:arklib:0.3.1'

Development of the library

Prerequisites

Rust toolchain
Kotlin toolchain
Android SDK + NDK r24 (latest)

Build Rust library

You need to have Rust targets installed:

rustup target add armv7-linux-androideabi
rustup target add aarch64-linux-android
rustup target add i686-linux-android
rustup target add x86_64-linux-android

Compile Rust (option 1)

For checking if Rust code compiles without problems, you can use this command:

./gradlew cargoBuild

The above command should generates libarklib.so file inside ./arklib/target/<arch>/<buildVariant> folder. If the build is failed, which leads to no generated .so files, there's a build alternative which doesn't require you to install extra dependencies:

https://github.com/bbqsrc/cargo-ndk

Compile Rust (option 2)

Using cargo-ndk, you can generate the libarklib.so files in two steps:

- cd arklib
- cargo ndk -o ./jniLibs build

Running the above two commands outputs same .so files as ./gradlew cargoBuild does.

Build AAR

Before make a release build, ensure you have set profile = "release" in cargo config.

./gradlew lib:assemble

The generated release build is lib/build/outputs/aar/lib-release.aar

Publish New Version

Ensure you have committed your changes.

./gradlew release

Then simply push to the repo.

Debug

Make sure you have switch to debug profile in cargo config, which could be found at lib/build.gradle

Run the command to build

./gradlew lib:assemble

Connect to a device or setup an AVD and check the functionality.

./gradlew appmock:connectedCheck

Unit tests

Unit tests require native ARK library file for host machine in project root directory.

libarklib.so for Linux
libarklib.dylib for Mac
libarklib.dll for Windows

Unit tests depend on buildRustLibForHost gradle task (Linux, Mac)

But you can do it manually:

Find out host architecture rustc -vV | sed -n 's|host: ||p'
Change to arklib directory and build the library cargo build --target $host_arch
Copy library from arklib/target/$host_arch/debug/libarklib.(so|dylib|dll) to project root directory

Shortcut for Linux:

ARCH=$(rustc -vV | sed -n 's|host: ||p') cargo build --target $ARCH && cp arklib/target/$ARCH/debug/libarklib.so .

arklib-android's People

Contributors

Stargazers

Watchers

Forkers

j4w3ny hhio618 tanaytandon12 oluiscabral maarifamaarifa

arklib-android's Issues

Implement LinkMetadataExtractor

Chunked resource index

It could be good idea to store index as collection of files, or chunks. Update of a single resource would affect smaller file, less data would be needed to synced using Syncthing then. But this feature is debatable. It might be useless when we get rid of Syncthing and implement our own sync mechanism — in that case, we would just broadcast atomic changes to other devices.

Chunked storage type

We have two storage implementations:

File-based: all key-value pairs are stored in a single file.
Folder-based: each key gets its individual file in a designated folder.

File-based storages tend to fail when dealing with map sizes nearing 10,000 entries, often resulting in slow, sometimes even flawed writing. On the other hand, folder-based storages must be inefficient when handling smaller values.

Chunked storage blends both methods: it utilizes a folder containing files, with each file holding multiple entries. We need a strategy to identify the chunks requiring updates after a storage model modification. The use of Merkle trees might be necessary for efficient synchronization of external updates.

Text layer extraction and storage

For resources of kind "Document", it would be useful to extract and store text from them. E.g. for PDF resources, text layer should be similar to what is emitted by Linux utility pdftotext. The text layer can be used later for filtering resources by specified text in content, or for various text analytics (e.g. counting words).

Implement cache for generated storages

Generated storages, i.e. those which contain only generated data, like MetadataStorage and PreviewStorage, access filesystem on each locate call. This should be cached in order to provide better performance.

Provide Android API for indexes creation and updates

This functionality will need to be accessible from ARK Navigator:
ARK-Builders/arklib#8 — aggregated indexes (and plain indexes too).

Provide Android API for previews generation

When arklib is used to generate preview of a resource, resulting bitmap must be passed into Android side, where it will be stored as necessarily. Later this will change, but for now it would be a good start.

The Pdfium usage here must be replaced with arklib usage, so bitmaps would be generated by function implemented in ARK-Builders/arklib#5. Initialization of arklib must be done only once per Android app invocation, i.e. we don't want to re-initialize the lib in 2 subsequent calls to it.

Minor refactoring of Kotlin side

Move computeId from Navigator app to arklib-android
Move public functions to object ArkLib {} so we can see in apps where we use the library

Mapping `ResourceIndex` from Rust to exactly analogous structure in Kotlin

At the moment, we extract path2id map from Rust and reconstruct new ResourceIndex from this collection.

It could be more performant to map all fields of Rust structure to Kotlin analogue.

Index projection

It should be easier and more flexible way to work with "favorites" which use index of their parent folder but exploit optimized work with resource collection. Right now, there is awkward prefix: Path parameter in methods of ResourceIndex interface.

Support SVG resources (vector graphics)

Storage pruning

If any of resources were removed, values in storages associated with them should be cleaned up.

This should be optional and decidable by an app, probably the app would have a preference for this behavior.

Indexing service

We should externalize the processing stages: indexing, metadata extraction, and previews generation. By externalization, we mean an external system entity should undertake these activities for each root folder, with the results subsequently pulled by the applications (Navigator, Shelf, Memo, etc.).

Bump version in the apps

This issue is supposed to track versions of arklib-android used in the apps.

This issue is not supposed to be closed.

Extend PropertiesStorage with `date` field

We have only "title" and "description" at the moment.

It would be nice to be able store creation date as well, since modification date isn't static attribute.

Persisted/Replicated index

Index is already implemented in https://github.com/ARK-Builders/arklib as plain file stored in .ark folders. We need to throw Room away and switch to arklib's index implementation. This will solve ARK-Builders/ARK-Navigator#142 in Navigator.

The index will be persisted and replicated, meaning that it is a file synced by Syncthing or another filesystem sync mechanism. Devices sharing the same root folder will share the index as well and should benefit from that indexing will happen less frequently.

FolderStorage$readFromDisk ClassCastException

java.lang.ClassCastException: java.util.LinkedHashMap$LinkedHashMapEntry cannot be cast to java.util.HashMap$TreeNode
	at java.util.HashMap$TreeNode.moveRootToFront(HashMap.java:1831)
	at java.util.HashMap$TreeNode.treeify(HashMap.java:1948)
	at java.util.HashMap.treeifyBin(HashMap.java:771)
	at java.util.HashMap.putVal(HashMap.java:643)
	at java.util.HashMap.put(HashMap.java:611)
	at space.taran.arklib.domain.storage.FolderStorage$readFromDisk$jobs$2$1.invokeSuspend(FolderStorage.kt:100)
	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
	at kotlinx.coroutines.internal.LimitedDispatcher.run(LimitedDispatcher.kt:42)
	at kotlinx.coroutines.scheduling.TaskImpl.run(Tasks.kt:95)
	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:570)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:677)
	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)
	Suppressed: kotlinx.coroutines.DiagnosticCoroutineContextException: [StandaloneCoroutine{Cancelling}@6564017, Dispatchers.IO]

Support Gif resources

TagStorage can get corrupted during writing

Sometimes, the file serving as tag storage becomes corrupted:

...
117421-975209069:visa
92126-3414598226:hmm
60272-1305102029:music
34362-6340143
(END OF FILE)

In the example above, resource id isn't complete and no tags following.

In fact, there is even bigger problem: in such cases, half of the storage is lost. Thanks to backup mechanism the loss can be mitigated. However, this must not happen at all.

Atomic writing should be implemented.

Storage files monitoring

Files backing storages for user data should be monitored similar to how the resources are monitored.
This would allow us to catch updates from other devices in good time.

Files backing generated storages (metadata, previews) and index file could be skipped since we always generate them from the updated resources. The idea to optimize generation out is wrong here, because even if we just copy the updates we should verify them.

Right now, the updates are handled only when we initialize presenters, i.e. we need to close folder and open again to have new values from storages.

Use IndexProjection for performant filters

Using IndexProjection we can not only open "favorite" folders, but open other folders with filter applied to resources.

These filters can be based on properties of Resource:

    val name: String,
    val extension: String,
    val modified: FileTime

or on parts of ResourceId, especially:

    val dataSize: Long

Refactor index as special kind of storage

This would mean we must represent paths and ids as a Monoid structure somehow.

Use Coil instead of Glide

We need to perform some research, but this looks like more modern, coroutine-friendly library: https://coil-kt.github.io/coil/

Also, it has SVG support (see #72).

VideoPreviewGenerator should be revised

Check the code generating previews for video resources
Move it to arklib, write bindings in this repo

Stale metadata in the cache

Right now we are not removing lost resource meta, which causes npe:

06-09 00:56:01.152 E/ACRA    (16329): ACRA caught a NullPointerException for space.taran.arknavigator
06-09 00:56:01.152 E/ACRA    (16329): java.lang.NullPointerException
06-09 00:56:01.152 E/ACRA    (16329): 	at space.taran.arklib.domain.preview.RootPreviewProcessor.initKnownResources(RootPreviewProcessor.kt:128)
06-09 00:56:01.152 E/ACRA    (16329): 	at space.taran.arklib.domain.preview.RootPreviewProcessor.access$initKnownResources(RootPreviewProcessor.kt:15)
06-09 00:56:01.152 E/ACRA    (16329): 	at space.taran.arklib.domain.preview.RootPreviewProcessor$init$2.invokeSuspend(RootPreviewProcessor.kt:44)
06-09 00:56:01.152 E/ACRA    (16329): 	at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:33)
06-09 00:56:01.152 E/ACRA    (16329): 	at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:106)
06-09 00:56:01.152 E/ACRA    (16329): 	at kotlinx.coroutines.scheduling.CoroutineScheduler.runSafely(CoroutineScheduler.kt:570)
06-09 00:56:01.152 E/ACRA    (16329): 	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.executeTask(CoroutineScheduler.kt:750)
06-09 00:56:01.152 E/ACRA    (16329): 	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.runWorker(CoroutineScheduler.kt:677)
06-09 00:56:01.152 E/ACRA    (16329): 	at kotlinx.coroutines.scheduling.CoroutineScheduler$Worker.run(CoroutineScheduler.kt:664)
06-09 00:56:01.152 E/ACRA    (16329): 	Suppressed: kotlinx.coroutines.DiagnosticCoroutineContextException: [StandaloneCoroutine{Cancelling}@a414215, Dispatchers.Default]

    private suspend fun initKnownResources() {
        _busy.emit(true)
        metadata.state().forEach { (id, meta) ->
            val path = index.getPath(id)!!
            generate(id, path, meta)
        }
    ...
    }

RootPreviewProcessor takes meta, but there is no such id in index, because resource was deleted, but meta was not deleted
See ARK-Builders/arklib#40

Redesign index-storage interactions

Index-related classes should not be coupled with storages too tightly.
Index-related classes should not call any factories or storages:
1. Storages should consume updates directly or via presenters.
2. Factories should be invoked by storages when they don't have data or it is outdated.

Examples of improper classes interaction:

ResourceIndexRepo depends on MetadataStorage

class ResourceIndexRepo(
    private val foldersRepo: FoldersRepo,
    //todo any storage should be attached next to index, not into it
    private val metadataStorageRepo: MetadataStorageRepo,
    private val messageFlow: MutableSharedFlow<Message>,
...

PlainIndex during index updates processing invokes GeneralMetadataFactory:

    private suspend fun handleUpdate(update: UpdatedResourcesId) {
        update.deleted.forEach { ... }
        val added = ...
        ...

        //todo MetadataStorage must manage Metadata for resources
        // the same as TagStorage manages their tags
                GeneralMetadataFactory.compute(path, resource)
                    .onFailure {
                        Log.e(
                                RESOURCES_INDEX,
                                "Could not detect kind for " +
                                        path.absolutePathString()
                            )
                        messageFlow.emit(Message.KindDetectFailed(path))
                    }
                    .map { metadata ->
                        resource.metadata = metadata
                        resource
                    }
...

ResourceIndexRepo during initial index loading invokes GeneralMetadataFactory:

    internal suspend fun providePlainIndex(
        root: Path
    ): PlainIndex = provideMutex.withLock {
        ...
        //todo MetadataStorage must manage Metadata for resources
        // the same as TagStorage manages all tags
                GeneralMetadataFactory.compute(path, resource)
                    .map { metadata ->
                        resource.metadata = metadata
                        resource
                    }

Bounty #1 for Homayoun

We need to solve several issues at once:

#20
Ensure that everywhere ResourceID is taken from arklib, right now it is CRC32 checksum + filesize.
When we need to print a string with both values, we can just delimit them using dash (-).
- Update ARK-Shelf
- Update ARK-Shelf-Desktop
- Update ARK-Navigator
  We should also implement migration in ARK Navigator, i.e. check some value (or index version) in app data and if there is none (or version is too low) then perform storages upgrade. Index and previews could be just dropped. Storages must be backed up before performing upgrade and old resource ids must be replaced by new resource ids. E.g. 5839051: english, greek must be replaced with 88363-5839051: english, greek in .ark/tags, etc.
#21
Basically, we just need to take code from Navigator and move it into arklib-android, ensuring any app could just import couple of modules and use metadata as well as index. We don't really need indexing in the apps right now (except Navigator). Could be useful later.
ARK-Builders/ARK-Memo#8
We need to use new metadata storage and separate title into it. Also, we can introduce created-date metadata field.
This issue is kinda similar to ARK-Builders/ARK-Shelf#21

Restrict the frequency of storage write operations

Implement a version counter for coroutines writing to the same storage: a simple integer that increments with each write to the in-memory storage. The counters can initialize at 0 during a single app launch. If a coroutine approaches the mutex and it's occupied, we compare their versions: the more recent one gains access to the storage, while the older one is cancelled.
Restrict the creation of coroutines to a rate of 1 per second.

Tracked storage and CRDT

This new kind of storage would consist of separate tracks for each device. Each track stores history of update from its device. Main storage is result of consensus between tracks. This should improve conflicts resolution. Storages also could generalize from Monoid structure to any CRDT.

Enabling app developers to extend the Properties class

Currently, an app developer can extend the Properties class, however, they need to create their own version of PropertiesStorage to handle the customized structure. We should consider enhancing PropertiesStorage to support third-party extensions without the need to duplicate the entire code within the app.

A potential solution could be to make PropertiesStorage more flexible and adaptable by generalizing it to operate over a generic P type, rather than having it tightly coupled with the Properties type. This would facilitate easier adaptability and integration with various data structures.

Once we've generalized the PropertiesStorage, we can proceed to shift the existing Properties structure into the ARK Shelf repository.

Group generated data under `cache` folder

All output from new Processor classes should be grouped under .ark/cache folder.

E.g. .ark/previews should be moved into .ark/cache/previews.
Same with metadata and thumbnails.

Other processor classes could be added in future.

Build Rust code with both Debug and Release profiles

Right now we have hard-coded Release profile:

cargo {
    ...
    profile = 'release'
}

It would be perfect if this profile was determined by Gradle.
E.g. ./gradlew assembleDebug would enable debug profile for Rust dependency.

Right now, assembling in Debug mode produces empty AAR:
https://github.com/ARK-Builders/arklib-android/actions/runs/1712815173

Provide Android API for indexes and storages

In order to use indexes, metadata, tags and other functionality from the rest of ARK apps we need to move all storages as well as index from ARK Navigator into these repo. E.g. ARK Shelf and ARK Memo will be able to store metadata separately from the resource, because of this feature.

Also, it will be easier to provide good interface for https://github.com/ARK-Builders/arklib and port the code further into Rust side.

Efficient `combineAll` method for `Monoid` interface

Bisection instead of linear pass-through should be adopted if this method will be really used.

Create sample unit/instrumented tests and run them in CI

Unit tests might be difficult to establish due to necessary to link against x86_64 native library (not sure if it's achievable).

Instrumented tests at CI should require this: https://github.com/marketplace/actions/android-emulator-runner

Continuous storage synchronization

Right now, internal and external changes are synchronized only when the storage is written. In this case, the model is merged with the file content. This is done because the storage file can have updates from external devices.

Sync deletes from remote devices
There are no conflicts resolution though, only plain merge. This results in that user can't delete any value while any of devices keeps the app in its memory. All deletes will be ignored because we perform deletes only from local model and storage files, but the storage file will have deleted value restored from the model on other device.
- #58
Sync storage continuously
When client requests storage values for a key (e.g. getTags(id: ResourceId)), the value is taken solely from the model. The model needs to be updated with external changes in advance. External changes are not merged "on the fly" yet.
- #53

Migrate ResourceId from crc32 to (crc32, filesize)

ArkLib defines ResourceId as 2 values of Long type:

CRC-32 checksum
Size of the file

Navigator and arklib-android defines ResourceId as just CRC-32 checksum:

typealias ResourceId = Long

We need to make Navigator use exactly the same what is defined in ArkLib.

Stats storage: labeling statistics

See ARK-Builders/ark-android#16 for the context.
This issue is about metrics 2 and 4 from the list:

How many times a resource was labeled with the tag
.ark/stats/tag-labeled-n

How recently a resource was labeled with the tag
.ark/stats/tag-labeled-ts

Storing the stats

It is needed to create new section of persisted/replicated storage (.ark folder) for tag statistics and update corresponding files every time we label a resource with a tag:

.ark/stats/tag-labeled-n for total amount of times a tag was used, it is mapping Tag -> Int, where values cannot be negative
.ark/stats/tag-labeled-ts for the most recent timestamps tag was used, although in fact we can just maintain it without timestamps and storing the order in which tags were used: if we labeled a resource with tag T right now, that means we put T on top of the stack, so just List[Tag] is good enough

Using the stats

These stats could be used in metrics for sorting tags in tag selector. Stats tag-labeled-ts are more important here, since it would be very handy to see latest used tags on top. At the same time, tag-labeled-n is less important but also could be handy because it would push those tags on top, which are used for temporary labelings, e.g. todo tag.

Handle index construction errors

In the file RootIndex.kt:

        if (!BindingIndex.load(root)) {
            Log.e(
                RESOURCES_INDEX,
                "Couldn't provide index from $path"
            )
            throw UnknownError()
        }

We should report the error in a way that dependent app (Navigator, Shelf) could report it to the user.

Extends DocumentMetadataGenerator for ODT and ODS types

DocumentPreviewGenerator can only count pages for PDF files right now.

Would be greate to also support spreadsheets and word-like documents.
What other kinds of metadata for documents we can have?

Separate generated metadata from user metadata

In our model, we deal with two distinct types of metadata:
1. Generated metadata.
This refers to metadata extracted directly from the data. For example, video dimensions and duration. It's okay if we partially or completely lose this metadata as it can always be re-generated, much like previews. It's crucial that this metadata is generated deterministically, ensuring it's created the same way on any device.

2. User-defined metadata.
This refers to metadata created by the user. This could include tags, scores, or more specific attributes such as a link or document's title. Losing this data is not an option. We must ensure the storage is secure and synchronized. Unlike generated metadata, we cannot regenerate or restore this type of data.

We need to implement a storage for the second kind of metadata.

Extend DocumentPreviewGenerator for ODS, ODT and MD types

DocumentPreviewGenerator can generate previews only for PDF documents.

It would be great to also have previews for spreadsheets and word-like document.

Dirty writing happen on huge storage files

Each storage should have mutex/rwlock to allow only 1 writer at a time
We must ensure that coroutines writing to storages do not die, e.g. if we switched to another screen
We should ensure each file integrity ARK-Builders/arklib#48
Consider implementing #68

Also see ARK-Builders/ARK-Navigator#174 and ARK-Builders/ARK-Navigator#173

Automatic package build and pushing into GitHub Package Maven registry

The package must be updated with every commit to main and new version uploaded to ~~Maven Central~~ Maven registry of our own GitHub Package. The package must be possible to be used from Android/Kotlin projects just adding its URL into Gradle config.

HEIC format for storing previews and thumbnails

HEIC could provide us with twice less binary data stored and sent over the network.

Pluggable resource kinds

Plain enum ResourceKind type should be replaced by a registry — map from some id type I, with bundles of code as values.

The bundle of code should include implementations of MetadataExtractor and PreviewGenerator interfaces.

This would allow us to move concrete resource kinds into external dependencies, defined by apps itself. Kind identifier I type should be something unique, like bytes vector or a string. Different apps should be able to use the same reources kind just by using the same library defining it.

CI & Cache for Gradle/Android workflows

We need to speed up our CI builds. Right now, it can take up to an hour.

Example provided by @mdrlzy
https://github.com/coil-kt/coil/blob/d0644b9a5a96627a09d5ac724b8a4fd6578fea6e/.github/workflows/ci.yml#L66

- uses: actions/cache@v2
  with:
    path: |
      ~/.gradle/caches
      ~/.gradle/wrapper
    key: ${{ runner.os }}-${{ hashFiles('**/*.gradle*') }}-${{ hashFiles('**/gradle/wrapper/gradle-wrapper.properties') }}-${{ hashFiles('**/buildSrc/**/*.kt') }}

Use Timber instead of Log

https://github.com/JakeWharton/timber

Pros:

No need to write the name of class and method, they will be added automatically
We can easily add collection of critical logs on backend

Encrypted storage

Binary interface of FolderStorage looks like a good fit for this.

Don't pass all resources from index to storages multiple times

Somehow, storages depend on knowing all existing resources. It should be reworked in more performant way.

interface ResourceIndex {
    ...
    // we pass all known resource ids to a storage because
    // 1) any storage exists globally
    // 2) we maintain only 1 storage per root
    // 3) every storage is initialized with resource ids
    suspend fun allIds(): Set<ResourceId> = allIds(null)
    ...
}