bcthomas / pullseq Goto Github PK
View Code? Open in Web Editor NEWUtility program for extracting sequences from a fasta/fastq file
License: Other
Utility program for extracting sequences from a fasta/fastq file
License: Other
Hi,
https://github.com/bcthomas/pullseq.git
returns
bash: https://github.com/bcthomas/pullseq.git: No such file or directory
Kindly help.
Additionally, when I downloaded the code I dont see ./configure
Am I doing something wrong ?
Kindly help.
Hi,
I am trying to file a MiSeq Index file based on a set of headers using the following command:
pullseq -i HallS-pool_S1_L001_I1_001.fastq -n IDs.txt -v > barcodes.fastq
This is the read out as pullseq is running:
verbose flag is set
Input is HallS-pool_S1_L001_I1_001.fastq
Names in IDs.txt will be included
Output will be 50 columns long
done reading from input (19467839 entries)
Input is FASTQ format
Processed 0 entries
Pulled 0 entries
Why might this be happening? Here is any example of the index fastq file:
@M00366:86:000000000-AJP15:1:1101:18035:1000 1:N:0:1
ATAGGAATAACC
+
CCCCCGGGGGCD
@M00366:86:000000000-AJP15:1:1101:14066:1000 1:N:0:1
GTGAGGTTCGGC
+
6A--6C@,C++F
@M00366:86:000000000-AJP15:1:1101:14848:1000 1:N:0:1
ATAATTGCCGAG
+
CCCCCGGGGGGG
@M00366:86:000000000-AJP15:1:1101:18086:1000 1:N:0:1
TCTCTACAAGTA
+
8----;-,--;-
@M00366:86:000000000-AJP15:1:1101:16316:1000 1:N:0:1
Here is what the IDs.txt file looks like:
@M00366:86:000000000-AJP15:1:1101:18846:1146 1:N:0:1
@M00366:86:000000000-AJP15:1:1101:17470:1146 1:N:0:1
@M00366:86:000000000-AJP15:1:1101:18794:1147 1:N:0:1
@M00366:86:000000000-AJP15:1:1101:9220:1147 1:N:0:1
@M00366:86:000000000-AJP15:1:1101:9734:1147 1:N:0:1
@M00366:86:000000000-AJP15:1:1101:12133:1147 1:N:0:1
@M00366:86:000000000-AJP15:1:1101:17621:1148 1:N:0:1
@M00366:86:000000000-AJP15:1:1101:20761:1148 1:N:0:1
@M00366:86:000000000-AJP15:1:1101:11504:1148 1:N:0:1
@M00366:86:000000000-AJP15:1:1101:19907:1149 1:N:0:1
@M00366:86:000000000-AJP15:1:1101:17935:1149 1:N:0:1
@M00366:86:000000000-AJP15:1:1101:17274:1149 1:N:0:1
@M00366:86:000000000-AJP15:1:1101:10546:1149 1:N:0:1
@M00366:86:000000000-AJP15:1:1101:13379:1149 1:N:0:1
@M00366:86:000000000-AJP15:1:1101:16248:1149 1:N:0:1
@M00366:86:000000000-AJP15:1:1101:13417:1149 1:N:0:1
@M00366:86:000000000-AJP15:1:1101:11550:1150 1:N:0:1
@M00366:86:000000000-AJP15:1:1101:16087:1150 1:N:0:1
Is this structured okay? Why might no output be generated? I've been searching all over for a tool like this and I was hoping this would do the trick. I hope I can get this to work.
Thank you for your help,
Colleen
Hi @bcthomas
Thanks for your work on pullseq. I maintain pullseq in Debian, and we are now in the process of removing pcre (which is unmaintained for several years) hence, could you please port the code to the newer pcre2?
Regards,
Nilesh
--Hi,
i try to convert a fastq file to fasta and some sequences are not converted, see below:
my input file: test.fastq (14 sequences)
@M00842:73:000000000-A6TKT:1:1101:18725:1757 1:N:0:0 (2+)
HKQCQNYNSSVR_ACKNLLYQARQQYKTKYKYRTRASILCNRCHNRGYKTSIL_RQ_NRLE_DFTRG
+
@M00842:73:000000000-A6TKT:1:1101:18725:1757 1:N:0:0 (3+)
TNNAKIIIVQLDKPVRIYCTRPGNNTRQSISIGPGRAFYVTGVITGDIRQAYCNVSRTDWNKILQE
+
@M00842:73:000000000-A6TKT:1:1101:18725:1757 1:N:0:0 (3-)
LL_NLIPICSTDVTICLSYIPCYDTCYIKCSPWSYTYTLSCIVAGPGTINSYRLI_LNYYNFGIVC
+
@M00842:73:000000000-A6TKT:1:1101:17239:1665 1:N:0:0 (2+)
HKQCQNYNSTVSHPCKN_FFHARQQYKKKCYVWTRANIFCNR_HNRGYKTSTL_YLLKRLE_DFTRG
+
@M00842:73:000000000-A6TKT:1:1101:17605:1728 1:N:0:0 (2+)
HKQCQNYNSTVSHACKN_LFQAWQQYKKECKDRTRANILCNR_HNRGYKTSTL_CQ_NRLE*DFTRG
+
@M00842:73:000000000-A6TKT:1:1101:17605:1728 1:N:0:0 (3+)
TNNAKIIIVQLATPVRINCSRPGNNTRKSVRIGPGQTFYATGDIIGGIRRAHCNVSRTDWNKTLQEV
+
@M00842:73:000000000-A6TKT:1:1101:17605:1728 1:N:0:0 (2-)
YLL_SLIPICSTDITMCSSYTPYYVTCCIKCLPWSYPYTLSCIVARPGTINSYRRG_LYYYNFGIVC
+
@M00842:73:000000000-A6TKT:1:1101:18725:1757 1:N:0:0 (2+)
HKQCQNYNSSVR_ACKNLLYQARQQYKTKYKYRTRASILCNRCHNRGYKTSIL_RQ_NRLE_DFTRG
+
@M00842:73:000000000-A6TKT:1:1101:18725:1757 1:N:0:0 (3+)
TNNAKIIIVQLDKPVRIYCTRPGNNTRQSISIGPGRAFYVTGVITGDIRQAYCNVSRTDWNKILQE
+
@M00842:73:000000000-A6TKT:1:1101:18725:1757 1:N:0:0 (3-)
LL_NLIPICSTDVTICLSYIPCYDTCYIKCSPWSYTYTLSCIVAGPGTINSYRLI_LNYYNFGIVC
+
@M00842:73:000000000-A6TKT:1:1101:15135:1817 1:N:0:0 (2+)
HKQCQNYNSTVSYACKN_LFQAWQQYKKECKDRTRANILCNR_HNRGYKTSTL_CQ_NRLE*DFTRG
+
@M00842:73:000000000-A6TKT:1:1101:15135:1817 1:N:0:0 (3+)
TNNAKIIIVQLATPVRINCSRPGNNTRKSVRIGPGQTFYATGDIIGDIRRAHCNVSRTDWNKTLQEV
+
@M00842:73:000000000-A6TKT:1:1101:15135:1817 1:N:0:0 (2-)
YLL_SLIPICSTDITMCSSYIPYYVTCCIECLPWSYPYTLSCIVARPGTINSYRRS_LYYYNFGIVC
+
@M00842:73:000000000-A6TKT:1:1101:13686:1838 1:N:0:0 (2+)
HKQCQNYNSTVR_ACKN_LYQAWQQYKTKYKYRTRASILCNR_HNRGYKTSIL_CQ_NRVE_DFTRG
+
then i run this command: ./pullseq -i test.fastq -c -l 150
and the output:
M00842:73:000000000-A6TKT:1:1101:18725:1757 1:N:0:0 (2+)
HKQCQNYNSSVR_ACKNLLYQARQQYKTKYKYRTRASILCNRCHNRGYKTSIL_RQ_NRLE_DFTRG
M00842:73:000000000-A6TKT:1:1101:18725:1757 1:N:0:0 (3-)
LL_NLIPICSTDVTICLSYIPCYDTCYIKCSPWSYTYTLSCIVAGPGTINSYRLI_LNYYNFGIVC
M00842:73:000000000-A6TKT:1:1101:17605:1728 1:N:0:0 (2+)
HKQCQNYNSTVSHACKN_LFQAWQQYKKECKDRTRANILCNR_HNRGYKTSTL_CQ_NRLE_DFTRG
M00842:73:000000000-A6TKT:1:1101:17605:1728 1:N:0:0 (2-)
YLL_SLIPICSTDITMCSSYTPYYVTCCIKCLPWSYPYTLSCIVARPGTINSYRRG_LYYYNFGIVC
M00842:73:000000000-A6TKT:1:1101:18725:1757 1:N:0:0 (3+)
TNNAKIIIVQLDKPVRIYCTRPGNNTRQSISIGPGRAFYVTGVITGDIRQAYCNVSRTDWNKILQE
M00842:73:000000000-A6TKT:1:1101:15135:1817 1:N:0:0 (2+)
HKQCQNYNSTVSYACKN_LFQAWQQYKKECKDRTRANILCNR_HNRGYKTSTL_CQ_NRLE_DFTRG
M00842:73:000000000-A6TKT:1:1101:15135:1817 1:N:0:0 (2-)
YLL_SLIPICSTDITMCSSYIPYYVTCCIECLPWSYPYTLSCIVARPGTINSYRRS_LYYYNFGIVC
i have only 7 sequences converted, why ?
thank you --
Hello,
while running pullseqon aarch64 I noticed it loops and reallocate memory on getl
function (file_read.c
)
it never ends because ch (line23) should be int
not char
char
is unsigned on ARM so EOF is not interpreted as expected.
see: https://stackoverflow.com/questions/13694394/while-c-getcfile-eof-loop-wont-stop-executing
regards
Eric
While making pullseq with gcc 10.2.0, I got the following error during linking:
...
gcc -g -O2 -o pullseq hash.o output.o size_filter.o search_header.o file_read.o pull_by_re.o pull_by_name.o pull_by_size.o pullseq.o -lpcre -lz
/usr/bin/ld: output.o:/home/kinestetika/bin/util/pullseq/src/global.h:27: multiple definition of QUALITY_SCORE'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:27: first defined here /usr/bin/ld: output.o:/home/kinestetika/bin/util/pullseq/src/global.h:28: multiple definition of
verbose_flag'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:28: first defined here
/usr/bin/ld: output.o:/home/kinestetika/bin/util/pullseq/src/global.h:26: multiple definition of progname'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:26: first defined here /usr/bin/ld: size_filter.o:/home/kinestetika/bin/util/pullseq/src/global.h:28: multiple definition of
verbose_flag'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:28: first defined here
/usr/bin/ld: size_filter.o:/home/kinestetika/bin/util/pullseq/src/global.h:27: multiple definition of QUALITY_SCORE'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:27: first defined here /usr/bin/ld: size_filter.o:/home/kinestetika/bin/util/pullseq/src/global.h:26: multiple definition of
progname'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:26: first defined here
/usr/bin/ld: search_header.o:/home/kinestetika/bin/util/pullseq/src/global.h:28: multiple definition of verbose_flag'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:28: first defined here /usr/bin/ld: search_header.o:/home/kinestetika/bin/util/pullseq/src/global.h:27: multiple definition of
QUALITY_SCORE'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:27: first defined here
/usr/bin/ld: search_header.o:/home/kinestetika/bin/util/pullseq/src/global.h:26: multiple definition of progname'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:26: first defined here /usr/bin/ld: file_read.o:/home/kinestetika/bin/util/pullseq/src/global.h:28: multiple definition of
verbose_flag'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:28: first defined here
/usr/bin/ld: file_read.o:/home/kinestetika/bin/util/pullseq/src/global.h:27: multiple definition of QUALITY_SCORE'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:27: first defined here /usr/bin/ld: file_read.o:/home/kinestetika/bin/util/pullseq/src/global.h:26: multiple definition of
progname'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:26: first defined here
/usr/bin/ld: pull_by_re.o:/home/kinestetika/bin/util/pullseq/src/global.h:28: multiple definition of verbose_flag'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:28: first defined here /usr/bin/ld: pull_by_re.o:/home/kinestetika/bin/util/pullseq/src/global.h:26: multiple definition of
progname'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:26: first defined here
/usr/bin/ld: pull_by_re.o:/home/kinestetika/bin/util/pullseq/src/global.h:27: multiple definition of QUALITY_SCORE'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:27: first defined here /usr/bin/ld: pull_by_name.o:/home/kinestetika/bin/util/pullseq/src/global.h:28: multiple definition of
verbose_flag'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:28: first defined here
/usr/bin/ld: pull_by_name.o:/home/kinestetika/bin/util/pullseq/src/global.h:26: multiple definition of progname'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:26: first defined here /usr/bin/ld: pull_by_name.o:/home/kinestetika/bin/util/pullseq/src/global.h:27: multiple definition of
QUALITY_SCORE'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:27: first defined here
/usr/bin/ld: pull_by_size.o:/home/kinestetika/bin/util/pullseq/src/global.h:28: multiple definition of verbose_flag'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:28: first defined here /usr/bin/ld: pull_by_size.o:/home/kinestetika/bin/util/pullseq/src/global.h:26: multiple definition of
progname'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:26: first defined here
/usr/bin/ld: pull_by_size.o:/home/kinestetika/bin/util/pullseq/src/global.h:27: multiple definition of QUALITY_SCORE'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:27: first defined here /usr/bin/ld: pullseq.o:/home/kinestetika/bin/util/pullseq/src/global.h:26: multiple definition of
progname'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:26: first defined here
/usr/bin/ld: pullseq.o:/home/kinestetika/bin/util/pullseq/src/global.h:28: multiple definition of verbose_flag'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:28: first defined here /usr/bin/ld: pullseq.o:/home/kinestetika/bin/util/pullseq/src/global.h:27: multiple definition of
QUALITY_SCORE'; hash.o:/home/kinestetika/bin/util/pullseq/src/global.h:27: first defined here
collect2: error: ld returned 1 exit status
make[2]: *** [Makefile:376: pullseq] Error 1
make[2]: Leaving directory '/home/kinestetika/bin/util/pullseq/src'
make[1]: *** [Makefile:283: all] Error 2
make[1]: Leaving directory '/home/kinestetika/bin/util/pullseq/src'
make: *** [Makefile:344: all-recursive] Error 1
configure.ac:7: error: possibly undefined macro: AM_INIT_AUTOMAKE
If this token and others are legitimate, please use m4_pattern_allow.
See the Autoconf documentation.
When run manually, it creates the configure file, as expected; however, I am writing a script to include your software in the homebrew package management system, and this error causes the script to terminate (abort the install).
It may be worth adding the need to run autoconf into the documentation, as the 1.0.0 tarball does not include the generated configure file.
Hi Brian,
I write with regards to the seqdiff
command where I'm unable to produce the expected results.
For example, I duplicated (i.e., cp
) a fastq file and ran the following command
seqdiff -1 file1.fq -2 file1b.fq -s
and received the following summary output:
first_file_total = 4255201
first_file_uniq = 0
second_file_total = 4255201
second_file_uniq = 0
common = 2250574
I then created test fastq files with 7 reads; the only difference being a deletion of the first 4 bases in the first read of the duplicate fastq file.
I received the following output summary (expected values in parentheses):
first_file_total = 7
first_file_uniq = 7 (1)
second_file_total = 0 (7)
second_file_uniq = 0 (1)
common = 0 (6)
Any thoughts or help would be greatly appreciated.
Hi,
while running ./configure
I've got message about absent libpcre2
. I've installed it with conda:
>conda install -c anaconda pcre2
And updated CFLAGS variable as described in README:
> pcre-config --cflags
-I/home/sochkalova/miniconda3/envs/das_tool/include
>export CFLAGS="-I/home/sochkalova/miniconda3/envs/das_tool/include"
>./configure
That gave me the same message about not installed libpcre2
. I don't understand what to do. Can you please help?
P.S. I work on the server where I don't have access to sudo
(just in case)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.