BufferedStreams provides buffering for IO operations. It can wrap any IO type automatically making incremental reading and writing faster.
using Pkg
Pkg.add("BufferedStreams")
Fast composable IO streams
License: MIT License
Hello BioJulia,
I'm fighting with entropy in the Julia world those days. It's becoming very,
very hard to know where to grasp a recommended / authoritative package among
Julia ones, even in 2019.
Doesn't this package would be better hosted at
JuliaIO so anyone concerned would
find it easily ?
Thanks
Do you have a set of benchmarks to show that it's actually faster?
using BufferedStreams
x = "id".*string.(rand(UInt16,1_000_000_000))
fn(x) = begin
io = BufferedOutputStream(open("c:/data/bin.bin", "w"))
write.(Ref(io), x)
close(io)
end
gn(x) = begin
io = open("c:/data/bin2.bin", "w")
write.(Ref(io), x)
close(io)
end
using BenchmarkTools
@btime fn($x)
@btime gn($x)
this doesn't indicate that it's faster on Julia 1.2
BufferedStream documentation should sufficiently document the need for source T to implement close
method. Particularly, when source referred to is based on an underlying IO or file.
I tried the anchor example found in the documentation and it does not output anything.
julia> t = join(rand([collect('a':'z')... collect('0':'9')...],100))
"oxfj939xjpifeaa0ngk97yu6tywg3syu066ynxfsfnnsh3fhxc0osv7zih8ag3k08cp59upjxpyb8ibdpyx620wbapppmiqng9c1"
julia> stream = BufferedInputStream(IOBuffer(Vector{UInt8}(t)),6)
BufferedInputStream{IOBuffer}(<6 B buffer, 0% filled>)
julia> while !eof(stream)
b = peek(stream)
if '1' <= Char(b) <= '9'
if !isanchored(stream)
anchor!(stream)
end;
elseif isanchored(stream)
println(takeanchored!(stream))
end
read(stream, UInt8)
end
I also had to add the Char
to the if statement.
In addition, I noticed that peek resets the anchor.
julia> t = join(rand([collect('a':'z')... collect('0':'9')...],100))
"osiaanxkireq3mknd8gakx3g5uwnu2mkxdw6h6tyc6s5m5nhitgle6nb0iq7jyeksbj527wmp0dtlq0mj9kn3zbvlw49u92eeqhh"
julia> stream = BufferedInputStream(IOBuffer(Vector{UInt8}(t)),6)
BufferedInputStream{IOBuffer}(<6 B buffer, 0% filled>)
julia> peek(stream)
0x6f
julia> stream.buffer
6-element Vector{UInt8}:
0x6f
0x73
0x69
0x61
0x61
0x6e
julia> anchor!(stream)
1
julia> isanchored(stream)
true
julia> peek(stream)
0x6f
julia> isanchored(stream)
false
Version:
[e1450e63] BufferedStreams v1.0.0
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
BufferedStreams extends Base.nb_available
for BufferedInputStream
. In Julia 0.7, Base.nb_available
is deprecated, but extending it gives no warning since the function itself still exists. However, in current master, and thus in Julia 1.0, the deprecation is removed, causing the function to no longer exist at all and therefore it cannot be extended, giving a load error from using BufferedStreams
.
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: https://docs.julialang.org
_ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 1.0.0-rc1.5 (2018-08-07 20:49 UTC)
_/ |\__'_|_|_|\__'_| | Commit d038f2f (0 days old master)
|__/ | x86_64-linux-gnu
julia> using BufferedStreams
[ Info: Precompiling BufferedStreams [e1450e63-4bb3-523b-b2a4-4ffa8c0fd77d]
ERROR: LoadError: LoadError: UndefVarError: nb_available not defined
Stacktrace:
[1] getproperty(::Module, ::Symbol) at ./sysimg.jl:13
[2] top-level scope at none:0
[3] include at ./boot.jl:317 [inlined]
[4] include_relative(::Module, ::String) at ./loading.jl:1038
[5] include at ./sysimg.jl:29 [inlined]
[6] include(::String) at /home/gunnar/.julia/dev/BufferedStreams/src/BufferedStreams.jl:3
[7] top-level scope at none:0
[8] include at ./boot.jl:317 [inlined]
[9] include_relative(::Module, ::String) at ./loading.jl:1038
[10] include(::Module, ::String) at ./sysimg.jl:29
[11] top-level scope at none:2
[12] eval at ./boot.jl:319 [inlined]
[13] eval(::Expr) at ./client.jl:389
[14] top-level scope at ./none:3
in expression starting at /home/gunnar/.julia/dev/BufferedStreams/src/bufferedinputstream.jl:112
in expression starting at /home/gunnar/.julia/dev/BufferedStreams/src/BufferedStreams.jl:38
ERROR: Failed to precompile BufferedStreams [e1450e63-4bb3-523b-b2a4-4ffa8c0fd77d] to /home/gunnar/.julia/compiled/v1.0/BufferedStreams/wMWKi.ji.
Stacktrace:
[1] error(::String) at ./error.jl:33
[2] macro expansion at ./logging.jl:313 [inlined]
[3] compilecache(::Base.PkgId, ::String) at ./loading.jl:1184
[4] _require(::Base.PkgId) at ./logging.jl:311
[5] require(::Base.PkgId) at ./loading.jl:852
[6] macro expansion at ./logging.jl:311 [inlined]
[7] require(::Module, ::Symbol) at ./loading.jl:834
The obvious solution would be to rename nb_available
to bytesavailable
like in Base. What is less obvious is whether BufferedStreams wants to provide a deprecation of its own and from which Julia version to make the name switch.
Hi All,
Recently, I was implementing a PDF file parser and as part of the development I needed support for various data decoding filters while reading the data using BufferedInputStream
. For example, data read from the file was in bin hexdump
or was in Base64
encoding or run length encoding
.
I had to implement a few sources for the same. However, I am wondering if some of the filters can be developed as part of this package itself as I am sure others may get benefited with these added functionality.
regards,
Sambit
Hello!
I'm trying to parse a buffered file, but
skipchars(isspace, BufferedInputStream(IOBuffer(" test")))
yields ERROR: ArgumentError: n must be non-negative in skip(::BufferedInputStream, n)
due to
https://github.com/BioJulia/BufferedStreams.jl/blob/b85ff82efcd95d3a4d72a4021f0833b579554bec/src/bufferedinputstream.jl#L126 which is different from Julia/base/iobuffer.jl
The source IO interface today is expected to implement a readbytes!
interface. The readbytes!
interface is a pull interface. Which essentially means the BufferedInputStream
can demand the source the amount of bytes it needs to provide.
Imagine a source like a codec. The source is not aware of the data size till the data is decoded. But the BufferedInputStream
request is for a number of bytes that is lesser than the data decoded. In this case, the source needs to maintain its own housekeeping buffer to retain number of bytes. While, this is there in most codec implementations, ideally as a buffer mgmt system BufferedInputStream
should be flexible to either request for a size from the source or let the source to request for additional bytes. That way Sources will not have to maintain their own housekeeping with buffers.
One example can be ReadFile
call in Windows API, where passing NULL for bytes read will provide the size of the buffer required.
Expected it to work with ZipFile.jl which is using an io stream
Errors
Implement the seekend
method
using ZipFile, CSV, DataFrames, BufferedStreams
a = DataFrame(a = 1:3)
CSV.write("c:/data/a.csv", a)
# zip the file; Windows users who do not have zip available on the PATH can manual zip the CSV
;zip c:/data/a.zip c:/data/a.csv
io = BufferedInputStream(open("c:/data/a.zip", "r"))
z = ZipFile.Reader(io)
df = CSV.read(z.files[1])
close(io)
I get this errro
ERROR: MethodError: no method matching seekend(::BufferedInputStream{IOStream})
Closest candidates are:
seekend(::Base.SecretBuffer) at secretbuffer.jl:150
seekend(::Base.Filesystem.File) at filesystem.jl:227
seekend(::IOStream) at iostream.jl:141
...
Trying to use BufferedStreams in the context of reading from ZipFiles uisng ZipFile.jl
Julia 1.3-rc1
Windows 10
ZipFile v0.8.3
BufferedStreams v1.0.0
[69666777] Arrow v0.2.3
[c52e3926] Atom v0.10.1
[39de3d68] AxisArrays v0.3.3
[fbb218c0] BSON v0.2.3
[6e4b80f9] BenchmarkTools v0.4.3
[9e28174c] BinDeps v0.8.10
[b99e7846] BinaryProvider v0.5.6
[163b9779] Blobs v0.3.0
[a74b3585] Blosc v0.5.1
[5f4fecfd] BrowseTables v0.3.0
[e1450e63] BufferedStreams v1.0.0
[336ed68f] CSV v0.5.11 [C:\Users\RTX2080\.julia\dev\CSV
]
[324d7699] CategoricalArrays v0.5.5
[aaaa29a8] Clustering v0.13.3
[944b1d66] CodecZlib v0.6.0
[34da2185] Compat v2.1.0
[3a865a2d] CuArrays v1.2.1
[d58978e5] Dagger v0.8.0
[9a962f9c] DataAPI v1.0.1
[a93c6f00] DataFrames v0.19.4
[1313f7d8] DataFramesMeta v0.5.0
[864edb3b] DataStructures v0.17.0
[e7dc6d0d] DataValues v0.4.12
[31a5f54b] Debugger v0.6.1
[7806a523] DecisionTree v0.8.3
[b4f34e82] Distances v0.8.2
[31c24e10] Distributions v0.21.1
[becb17da] Feather v0.5.3
[5789e2e9] FileIO v1.0.7
[53afe959] FlatBuffers v0.5.3
[587475ba] Flux v0.9.0
[01a6e8c0] FstFileFormat v0.1.0
[38e38edf] GLM v1.3.1
[28b8d3ca] GR v0.41.0
[4d00f742] GeometryTypes v0.7.6
[708ec375] Gumbo v0.5.1
[cd3eb016] HTTP v0.8.6
[7073ff75] IJulia v1.20.0
[6218d12a] ImageMagick v0.7.5
[916415d5] Images v0.18.0
[5903a43b] Infiltrator v0.1.0 [C:\Users\RTX2080\.julia\dev\Infiltrator
]
[7d512f48] InternedStrings v0.7.0
[41ab1584] InvertedIndices v1.0.0
[82899510] IteratorInterfaceExtensions v1.0.0
[babc3d20] JDF v0.1.0 [c:/Users/RTX2080\\git\\JDF\\
]
[9da8a3cd] JLSO v1.1.0
[e5e0dc1b] Juno v0.7.2
[b964fa9f] LaTeXStrings v1.0.3
[50d2b5c4] Lazy v0.14.0
[add582a8] MLJ v0.2.3
[f28f55f0] Memento v0.12.1
[e1d29d7a] Missings v0.4.2
[9b87118b] PackageCompiler v0.6.4+ #sd-notomls (https://github.com/JuliaLang/PackageCompiler.jl.git)
[d96e819e] Parameters v0.11.0
[626c502c] Parquet v0.3.0
[91a5bcdd] Plots v0.26.3
[2dfb63ee] PooledArrays v0.5.2
[08abe8d2] PrettyTables v0.5.1
[6f49c342] RCall v0.13.4
[ce6b1742] RDatasets v0.6.1
[17b45ede] RLEVectors v0.8.1
[189a3867] Reexport v0.2.0
[295af30f] Revise v2.1.10
[6e75b9c4] ScikitLearnBase v0.5.0
[a2af1166] SortingAlgorithms v0.3.1
[2913bbd2] StatsBase v0.32.0 [c:/git/StatsBase.jl
]
[f3b207a7] StatsPlots v0.12.0
[fd094767] Suppressor v0.1.1
[70df011a] TableReader v0.4.0
[3783bdb8] TableTraits v1.0.0
[40c74d1a] TableView v0.4.1
[bd369af6] Tables v0.2.11
[f269a46b] TimeZones v0.9.2
[9f7883ad] Tracker v0.2.3
[3bb67fe8] TranscodingStreams v0.9.5
[b8865327] UnicodePlots v1.1.0
[34922c18] VisualRegressionTests v0.3.1
[ea10d353] WeakRefStrings v0.6.1
[a5390f91] ZipFile v0.8.3
[ade2ca70] Dates
[8bb1440f] DelimitedFiles
[37e2e46d] LinearAlgebra
[44cfe95a] Pkg
[de0858da] Printf
[3fa0cd96] REPL
[9a3f8284] Random
[2f01184e] SparseArrays
[10745b16] Statistics
[8dfed614] Test
[4ec0a83e] Unicode
I think it would make sense to move this over to JuliaIO along with TranscodingStreams and all the CodecX packages.
I like to use BufferedStreams for printing to stdout because it's much faster. However, if I forget to flush
the buffer, then not all the result get printed to stdout
.
using Random, BufferedStreams
let ss = [randstring(20) for _ in 1:10^5]
@time foreach(println, ss) # 5.463939 seconds (708.34 k allocations: 16.149 MiB)
end
# Incomplete result
let ss = [randstring(20) for _ in 1:10^5]
io = BufferedOutputStream(stdout)
@time foreach(ss) do s
println(io, s)
end # 0.554888 seconds (2.63 k allocations: 180.951 KiB, 2.34% compilation time)
# BUT it's incomplete
end;
let ss = [randstring(20) for _ in 1:10^5]
io = BufferedOutputStream(stdout)
@time begin
foreach(ss) do s
println(io, s)
end
flush(io)
end # 0.547968 seconds (2.64 k allocations: 181.054 KiB, 2.37% compilation time)
end;
What is a good solution to forgetting to flush the buffer?
One approach would be a do
block that introduces the buffer at the beginning and flushes it at the end.
function buffering(f, out)
io = BufferedOutputStream(out)
try
f(io)
finally
flush(io)
end
end
let ss = [randstring(20) for _ in 1:10^5]
@time buffering(stdout) do io
foreach(ss) do s
println(io, s)
end
end # 0.586527 seconds (8.66 k allocations: 712.136 KiB, 3.35% compilation time)
end;
I noticed this when rewriting peek
in #77. Running peek
(which calls mark
and reset
) sometimes seems to screw up the buffer so that the subsequent read
fails:
julia> io = BufferedInputStream(IOBuffer("α∆"), 1)
BufferedInputStream{IOBuffer}(<1 B buffer, 0% filled>)
julia> read(io, Char)
'α': Unicode U+03B1 (category Ll: Letter, lowercase)
julia> peek(io, Char)
'∆': Unicode U+2206 (category Sm: Symbol, math)
julia> read(io, Char)
ERROR: EOFError: read end of file
Stacktrace:
[1] read
@ ~/.julia/packages/BufferedStreams/0mu59/src/bufferedinputstream.jl:186 [inlined]
[2] read(io::BufferedInputStream{IOBuffer}, #unused#::Type{Char})
@ Base ./io.jl:789
[3] top-level scope
@ REPL[72]:1
PR #77 avoids this problem specifically for peek(io, Char)
since it adds an optimized implementation of that function, but it would be good to identify and solve the underlying problem.
>>> 'Pkg.add("BufferedStreams")' log
INFO: Installing BufferedStreams v0.1.4
INFO: Package database updated
INFO: METADATA is out-of-date — you may not have the latest version of BufferedStreams
INFO: Use `Pkg.update()` to get the latest versions of your packages
>>> 'Pkg.test("BufferedStreams")' log
Julia Version 0.4.6
Commit 2e358ce (2016-06-19 17:16 UTC)
Platform Info:
System: Linux (x86_64-unknown-linux-gnu)
CPU: Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.50GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Nehalem)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.3
INFO: Computing test dependencies for BufferedStreams...
INFO: Installing BaseTestNext v0.2.1
INFO: Testing BufferedStreams
seekforward: Error During Test
Test threw an exception of type ArgumentError
Expression: all(Bool[test_seekforward(stream,position,offset) for (position,offset) = zip(positions,offsets)])
ArgumentError: n must be non-negative in seekforward
in test_seekforward at /home/vagrant/.julia/v0.4/BufferedStreams/test/runtests.jl:239
in anonymous at /home/vagrant/.julia/v0.4/BaseTestNext/src/BaseTestNext.jl:165
in do_test at /home/vagrant/.julia/v0.4/BaseTestNext/src/BaseTestNext.jl:181
[inlined code] from /home/vagrant/.julia/v0.4/BufferedStreams/test/runtests.jl:248
in anonymous at no file:0
in include at ./boot.jl:261
in include_from_node1 at ./loading.jl:320
in process_options at ./client.jl:280
in _start at ./client.jl:378
Test Summary: | Pass Error Total
BufferedInputStream | 67 1 68
read | 8 8
peek | 4 4
peekbytes! | 10 10
readbytes! | 10 10
readuntil | 2 2
arrays | 2 2
marks | 7 7
anchors | 1 1
seek | 2 2
seekforward | 2 1 3
close | 5 5
iostream | 11 11
misc. | 3 3
ERROR: LoadError: Some tests did not pass: 67 passed, 0 failed, 1 errored.
in finish at /home/vagrant/.julia/v0.4/BaseTestNext/src/BaseTestNext.jl:385
[inlined code] from /home/vagrant/.julia/v0.4/BaseTestNext/src/BaseTestNext.jl:559
in anonymous at no file:0
in include at ./boot.jl:261
in include_from_node1 at ./loading.jl:320
in process_options at ./client.jl:280
in _start at ./client.jl:378
while loading /home/vagrant/.julia/v0.4/BufferedStreams/test/runtests.jl, in expression starting on line 18
===========================[ ERROR: BufferedStreams ]===========================
failed process: Process(`/home/vagrant/julia/bin/julia --check-bounds=yes --code-coverage=none --color=no /home/vagrant/.julia/v0.4/BufferedStreams/test/runtests.jl`, ProcessExited(1)) [1]
================================================================================
INFO: Removing BaseTestNext v0.2.1
ERROR: BufferedStreams had test errors
in error at ./error.jl:21
in test at pkg/entry.jl:803
in anonymous at pkg/dir.jl:31
in cd at file.jl:22
in cd at pkg/dir.jl:31
in test at pkg.jl:71
in process_options at ./client.jl:257
in _start at ./client.jl:378
>>> End of log
Will let you know if it happens often - keep an eye on http://pkg.julialang.org/detail/BufferedStreams.html
Some links broke when this package was moved from BioJulia to JuliaIO. For instance, the badges on this page: https://juliapackages.com/p/bufferedstreams
I'm reasonably certain this package is deprecated in favor of TranscodingStreams and has been for a few years. Should we just archive it and put a big, fat sign up on the README that redirects people to TranscodingStreams?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.