Coder Social home page Coder Social logo

Comments (6)

ImranR98 avatar ImranR98 commented on June 28, 2024 2

Yes, some settings can be confusing - I'm hoping to flesh out the Wiki page at some point.

Note: you should know what Regular Expressions are before using any of the filtering options.

The HTML Source works in this way:

  1. It looks for links on an HTML page.
  2. It filters out certain links. By default, this is any link that doesn't end in .apk but you can use "Custom APK Link Filter" to specify your own filter.
  3. It sorts the remaining links. This sorting is alphanumeric sorting on the whole link, but you can choose to sort by only the last segment of the link. This last segment is usually the filename but may not be if you used your own filter in step 2. It really should be called "Sort by last segment of link" or something.
  4. It applies yet another optional filter on all remaining links. I'm not sure there's a good use case for this, but this setting ("Filter APKs by Regular Expression") is a more general filter that all Sources have - it is inherited from the parent AppSource class. It's probably easier to use this in some situations rather than having a super complicated filter in step 2.
  5. Of the remaining links, it picks the first one (or the last one if you enabled the reverse option).
  6. Now that we have the final APK link, we need a unique release ID to go with it so that when the ID changes, we know the app has an update available. For other Sources, the unique release ID is the app version, but for the HTML Source, it might not be possible to extract a version string. So by default, this is just a hash of the link.
    • However, links often have version strings embedded within them. Obtainium can't know how to extract these on its own (different websites would have different ways of doing it), so the user can choose to specify a regular expression that can be applied to the link in order to extract the version - this is what the "Version Extraction" field is for.
    • But often it can be difficult to come up with a regular expression that accurately matches the version while excluding extra characters. For this, we have the "Match Group" option that lets the user specify which group in the regular expression we should use as a version.
    • The version extraction feature isn't really necessary - using link hashes is easier and more reliable. Some users might just want it (#861) as having the real version looks nicer/more accurate and it allows Obtainium, in most cases, to use version detection.

As for the "Intermediate Link" filter, if this is used, the HTML Source works as follows (see #820 for a situation where this is useful):

  1. It looks for links on an HTML page.
  2. It filters out any links that don't match the intermediate link filter.
  3. It grabs the first remaining link (no reverse option here), and then feeds that link as the input for the normal HTML Source process described previously.

from obtainium.

ImranR98 avatar ImranR98 commented on June 28, 2024

So for Linphone, we have a bunch of different links on the page:

  • linphone-android-x.y.z.apk
  • linphone-freecodecs-video-x.y.z.apk
  • linphone-nonfreecodecs-video-x.y.z.apk
  • maven_repository/

So:

  1. We want to get rid of all links that are not of the form linphone-android-x.y.z.apk.
    • We could do all this in a single regular expression in "Custom APK Link Filter" like this: linphone-android-.*\.apk
    • But having 2 separate filters makes it easier:
      • The default APK link filter already filters out anything not ending in .apk (in this case, maven_repository/)
      • We can then use a simpler second APK filter to get rid of the APKs we don't want: linphone-android
  2. We could just stop here.
    • No need for an intermediate link filter (we are already on the page with the final APKs on it).
    • No real need for version extraction either.
  3. If you do want version extraction, you can:
    • Use this regex in the "Version Extraction" filter: android-(.*)\.apk
      • Set "Match Group" to the default of (1). In the above regex, the first group is .* (each group is in brackets).
    • If you are sure that the version is always 3 sets of digits separated by dots, and that this pattern will never appear more than once in the link, you could avoid using groups entirely by using this regex with "Match Group" set to 0 (meaning "use the whole match"): [0-9]+\.[0-9]+\.[0-9]+

from obtainium.

ImranR98 avatar ImranR98 commented on June 28, 2024

Verified that the above instructions work. In the process I found a couple of bugs that will be fixed in the next release. Namely that the match group needs to explicitly be input by the user (the default value doesn't work), and that it's better to have a default of 0 than 1.

from obtainium.

DiagonalArg avatar DiagonalArg commented on June 28, 2024

Fantastic! Thank you so much for your thorough explanation. (And for your work on this useful piece of software.)

from obtainium.

Mihara avatar Mihara commented on June 28, 2024

So I have a very negligent case here: https://woad.sumusltd.com/download

Should I expect Obtainium to be able to handle this source, or is this a lost cause, and I must continue to check this manually?

from obtainium.

DiagonalArg avatar DiagonalArg commented on June 28, 2024

@Mihara - I would suggest you open an new issue regarding that particular source, rather than posting to this closed one.

from obtainium.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.