Coder Social home page Coder Social logo

Comments (8)

jarioksa avatar jarioksa commented on July 17, 2024

The error message may indeed be misleading (perhaps we should just say that the tree does not define full-rank metric distance matrices?). We do not use the tree as such but it will be translated to phylogenetic distance matrix (function ape::vcv). However, your phylogenetic distance matrix is such that we cannot use it in Hmsc. We should be able to invert the matrix, and that fails. I think the assumption in the error message was that ultrametric trees give metric distances that can be inverted, and the error message just replaces an older one (Error in chol.default ... the leading minor of order 4 is not positive definite – we thought that is less useful a message than the one we have now). However, I can also imagine cases where this error appears in basically metric distances, for instance, if you have identical taxa. We do not have your phylogenetic tree so we do not know what is the case here. If you send the Newick tree (privately, we do not need taxon names), I can have a look at the issue.

If you get down to computeDataParameters() function, the tree was technically OK. If there was some technical problem with tree construction, you would have got the error much earlier (in Hmsc() model definition).

from hmsc.

saraelshawa avatar saraelshawa commented on July 17, 2024

Thank you Jari for the prompt response. I've emailed you my Newick tree.
I don't think it's a technical problem with the tree construction since I don't get an error earlier.
-Sara

from hmsc.

jarioksa avatar jarioksa commented on July 17, 2024

Sara, thanks for the Newick tree. It was a large one with 1440 taxa. However, when I tried with that tree, ape told that is not ultrametric:

> library(ape)
> tre <- read.tree("Fryxell_1440_tree.tre")
## NOT ultrametric
> is.ultrametric(tre)
[1] FALSE
## try with phylogenetic correlations similarly as in Hmsc
> phylcor <- vcv(tre)
## phylcor need to be inverted, and for that its determinant should be >0
> det(phylcor)
[1] 0
## Finally check the rank
> attr(chol(phylcor, pivot=TRUE), "rank")
[1] 1400
Warning message:
In chol.default(phylcor, pivot = TRUE) :
  the matrix is either rank-deficient or indefinite
## dim is 1440, and rank 1400
> dim(phylcor)
[1] 1440 1440

This looks like several taxa are "identical" or near identical. I don't know how to handle this, but we cannot use that phylogeny. At least the following 40 taxa are duplicates of some previous ones:

> which(duplicated(phylcor))
'BOLD:ABZ7767' 'BOLD:AAA4393' 'BOLD:AAA3933' 'BOLD:AAB8468' 'BOLD:AAA7565' 
             2              3              4             10             25 
'BOLD:AAB4054' 'BOLD:ACE4385' 'BOLD:ACE4734' 'BOLD:ACE7664' 'BOLD:ABY4439' 
            34             63             75             84             90 
'BOLD:AAA8814' 'BOLD:AAB5993' 'BOLD:AAA8386' 'BOLD:AAB6246' 'BOLD:ACE7380' 
           111            112            151            152            153 
'BOLD:ACF1624' 'BOLD:AAB6095' 'BOLD:ACF4111' 'BOLD:AAB4640' 'BOLD:AAC5412' 
           161            188            194            196            197 
'BOLD:AAD7518' 'BOLD:AAA9420' 'BOLD:AAB0268' 'BOLD:ABY9168' 'BOLD:AAI9560' 
           216            246            259            270            277 
'BOLD:ACF0609' 'BOLD:AAB7992' 'BOLD:ABZ7431' 'BOLD:AAB2296' 'BOLD:AAB0890' 
           294            295            320            321            322 
'BOLD:AAB0754' 'BOLD:AAA7669' 'BOLD:ABY7901' 'BOLD:ACE9386' 'BOLD:AAU8534' 
           323            338            339            424            543 
'BOLD:ACF3126' 'BOLD:ADJ1669' 'BOLD:ADI7158' 'BOLD:ABA9093' 'BOLD:ACL7379' 
           775            781            798            940            975 

Without these duplicates, there are no numerical problems.

from hmsc.

saraelshawa avatar saraelshawa commented on July 17, 2024

Hi Jari,
Thanks for the reply. Apologies I forgot to mention that I tried converting my tree to 'ultrametric' using these suggestions (1, 2) but still had the original error I shared (after checking that the tree is considered ultrametric).

Thanks for looking into this. I'm not sure I fully understand what you mean by "several taxa are 'identical'". Do you mean several leaf nodes have the same taxa name? Because I double-checked and I do have 1,440 unique taxa names (BOLD_ids). Or did you mean that several BOLD_ids have the exact same phylogeny with other BOLD_ids? I should note that I construct the tree using the COI sequences and I have several BOLD_ids that belong to the same family/genus so their phylogeny are very similar to one another since they're closely related. Would it not be possible to use HMSC if I have several taxa with closely related phylogeny?

Thanks for your help.

-Sara

from hmsc.

jarioksa avatar jarioksa commented on July 17, 2024

Sara, they are not only "closely related", but the tree implies they are identical or the same one taxon. The edges connecting these tips (taxa) have zero length (that is, no difference), and from the tree's point of view, these are the same taxon with several alternative names (or "synonyms" from the tree's point of view). I don't have your sequence data, but this could mean that they also have identical sequences. From the Hmsc point of view, these are then duplicated taxa, and if we have a matrix with duplicated taxa, we cannot use it: mathematics won't work (technically: we need to invert the phylogenetic correlation matrix, and if it is based on duplicated taxa, it will be rank-deficient and rank-deficient matrices do not have (normal) inverse matrix).

As a simple check, it seems that the first four entries (tips) in your tree are identical ('BOLD:AAA6619' 'BOLD:ABZ7767' 'BOLD:AAA4393' 'BOLD:AAA3933'). You may check these to see if they are different. If they have different sequences, you need a way to build a tree that shows this difference, or you need a trick that makes these different, such as replacing zero length edges with a tiny value that makes these non-zero (and several recognized taxa can share the same barcode) – and it really must be tiny because now the shortest non-zero edge length is 4.593 × 10-5. Alternatively, if these are the same taxon, their data should be merged.

from hmsc.

ovaskain avatar ovaskain commented on July 17, 2024

from hmsc.

saraelshawa avatar saraelshawa commented on July 17, 2024

Hi Jari, thanks for the explanation! I will look into it.
Thanks for the suggestion Otso, I'll run the scripts with a smaller number of taxa before moving on to a bigger dataset.
-Sara

from hmsc.

danchurch avatar danchurch commented on July 17, 2024

I see this issue was closed, but I am having the same issue with similar data, and I am fairly confident that I have no repeated sequence data in my tree.

I am happy to supply simplified data to recreate the problem. Should I open a new issue, or discuss here?

from hmsc.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.