Coder Social home page Coder Social logo

bufferedstreams.jl's Introduction

BufferedStreams

Build Status codecov

Description

BufferedStreams provides buffering for IO operations. It can wrap any IO type automatically making incremental reading and writing faster.

Installation

using Pkg
Pkg.add("BufferedStreams")

bufferedstreams.jl's People

Contributors

bicycle1885 avatar dcjones avatar dependabot[bot] avatar drvi avatar femtocleaner[bot] avatar gunnarfarneback avatar hydrotoast avatar jkroso avatar joshbode avatar kdm9 avatar kescobo avatar kristofferc avatar mortenpi avatar musm avatar quinnj avatar ranocha avatar ringw avatar sambitdash avatar simonster avatar sjkelly avatar stevengj avatar tkelman avatar tkoolen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bufferedstreams.jl's Issues

Package should belongs to JuliaIO

Hello BioJulia,

I'm fighting with entropy in the Julia world those days. It's becoming very,
very hard to know where to grasp a recommended / authoritative package among
Julia ones, even in 2019.

Doesn't this package would be better hosted at
JuliaIO so anyone concerned would
find it easily ?

Thanks

Benchmarks?

Do you have a set of benchmarks to show that it's actually faster?

using BufferedStreams

x = "id".*string.(rand(UInt16,1_000_000_000))

fn(x) = begin
	io = BufferedOutputStream(open("c:/data/bin.bin", "w"))
	write.(Ref(io), x)
	close(io)
end

gn(x) = begin
	io = open("c:/data/bin2.bin", "w")
	write.(Ref(io), x)
	close(io)
end

using BenchmarkTools

@btime fn($x)
@btime gn($x)

this doesn't indicate that it's faster on Julia 1.2

peek resets the anchor

I tried the anchor example found in the documentation and it does not output anything.

julia> t = join(rand([collect('a':'z')... collect('0':'9')...],100))
"oxfj939xjpifeaa0ngk97yu6tywg3syu066ynxfsfnnsh3fhxc0osv7zih8ag3k08cp59upjxpyb8ibdpyx620wbapppmiqng9c1"

julia> stream = BufferedInputStream(IOBuffer(Vector{UInt8}(t)),6)
BufferedInputStream{IOBuffer}(<6 B buffer, 0% filled>)

julia> while !eof(stream)
           b = peek(stream)
           if '1' <= Char(b) <= '9'
               if !isanchored(stream)
                   anchor!(stream)
               end;
           elseif isanchored(stream)
               println(takeanchored!(stream))
           end
           read(stream, UInt8)
       end

I also had to add the Char to the if statement.

In addition, I noticed that peek resets the anchor.

julia> t = join(rand([collect('a':'z')... collect('0':'9')...],100))
"osiaanxkireq3mknd8gakx3g5uwnu2mkxdw6h6tyc6s5m5nhitgle6nb0iq7jyeksbj527wmp0dtlq0mj9kn3zbvlw49u92eeqhh"

julia> stream = BufferedInputStream(IOBuffer(Vector{UInt8}(t)),6)
BufferedInputStream{IOBuffer}(<6 B buffer, 0% filled>)

julia> peek(stream)
0x6f

julia> stream.buffer
6-element Vector{UInt8}:
 0x6f
 0x73
 0x69
 0x61
 0x61
 0x6e

julia> anchor!(stream)
1

julia> isanchored(stream)
true

julia> peek(stream)
0x6f

julia> isanchored(stream)
false

Version:
[e1450e63] BufferedStreams v1.0.0

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Base.nb_available is deprecated in Julia 0.7 and gone in 1.0

BufferedStreams extends Base.nb_available for BufferedInputStream. In Julia 0.7, Base.nb_available is deprecated, but extending it gives no warning since the function itself still exists. However, in current master, and thus in Julia 1.0, the deprecation is removed, causing the function to no longer exist at all and therefore it cannot be extended, giving a load error from using BufferedStreams.

   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.0.0-rc1.5 (2018-08-07 20:49 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit d038f2f (0 days old master)
|__/                   |  x86_64-linux-gnu

julia> using BufferedStreams
[ Info: Precompiling BufferedStreams [e1450e63-4bb3-523b-b2a4-4ffa8c0fd77d]
ERROR: LoadError: LoadError: UndefVarError: nb_available not defined
Stacktrace:
 [1] getproperty(::Module, ::Symbol) at ./sysimg.jl:13
 [2] top-level scope at none:0
 [3] include at ./boot.jl:317 [inlined]
 [4] include_relative(::Module, ::String) at ./loading.jl:1038
 [5] include at ./sysimg.jl:29 [inlined]
 [6] include(::String) at /home/gunnar/.julia/dev/BufferedStreams/src/BufferedStreams.jl:3
 [7] top-level scope at none:0
 [8] include at ./boot.jl:317 [inlined]
 [9] include_relative(::Module, ::String) at ./loading.jl:1038
 [10] include(::Module, ::String) at ./sysimg.jl:29
 [11] top-level scope at none:2
 [12] eval at ./boot.jl:319 [inlined]
 [13] eval(::Expr) at ./client.jl:389
 [14] top-level scope at ./none:3
in expression starting at /home/gunnar/.julia/dev/BufferedStreams/src/bufferedinputstream.jl:112
in expression starting at /home/gunnar/.julia/dev/BufferedStreams/src/BufferedStreams.jl:38
ERROR: Failed to precompile BufferedStreams [e1450e63-4bb3-523b-b2a4-4ffa8c0fd77d] to /home/gunnar/.julia/compiled/v1.0/BufferedStreams/wMWKi.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] macro expansion at ./logging.jl:313 [inlined]
 [3] compilecache(::Base.PkgId, ::String) at ./loading.jl:1184
 [4] _require(::Base.PkgId) at ./logging.jl:311
 [5] require(::Base.PkgId) at ./loading.jl:852
 [6] macro expansion at ./logging.jl:311 [inlined]
 [7] require(::Module, ::Symbol) at ./loading.jl:834

The obvious solution would be to rename nb_available to bytesavailable like in Base. What is less obvious is whether BufferedStreams wants to provide a deprecation of its own and from which Julia version to make the name switch.

Default sources for common data encoding formats (hexdump, base64 etc.)

Hi All,

Recently, I was implementing a PDF file parser and as part of the development I needed support for various data decoding filters while reading the data using BufferedInputStream. For example, data read from the file was in bin hexdump or was in Base64 encoding or run length encoding.

I had to implement a few sources for the same. However, I am wondering if some of the filters can be developed as part of this package itself as I am sure others may get benefited with these added functionality.

regards,

Sambit

Source should be permitted to push data onto upstream BufferedInputStream

The source IO interface today is expected to implement a readbytes! interface. The readbytes! interface is a pull interface. Which essentially means the BufferedInputStream can demand the source the amount of bytes it needs to provide.

Imagine a source like a codec. The source is not aware of the data size till the data is decoded. But the BufferedInputStream request is for a number of bytes that is lesser than the data decoded. In this case, the source needs to maintain its own housekeeping buffer to retain number of bytes. While, this is there in most codec implementations, ideally as a buffer mgmt system BufferedInputStream should be flexible to either request for a size from the source or let the source to request for additional bytes. That way Sources will not have to maintain their own housekeeping with buffers.

One example can be ReadFile call in Windows API, where passing NULL for bytes read will provide the size of the buffer required.

BufferedStreams doesn't work with ZipFile.jl

Expected Behavior

Expected it to work with ZipFile.jl which is using an io stream

Current Behavior

Errors

Possible Solution / Implementation

Implement the seekend method

Steps to Reproduce (for bugs)

using ZipFile, CSV, DataFrames, BufferedStreams

a = DataFrame(a = 1:3)
CSV.write("c:/data/a.csv", a)

# zip the file; Windows users who do not have zip available on the PATH can manual zip the CSV
;zip c:/data/a.zip c:/data/a.csv

io = BufferedInputStream(open("c:/data/a.zip", "r"))
z = ZipFile.Reader(io)

df = CSV.read(z.files[1])
close(io)

I get this errro

ERROR: MethodError: no method matching seekend(::BufferedInputStream{IOStream})
Closest candidates are:
  seekend(::Base.SecretBuffer) at secretbuffer.jl:150
  seekend(::Base.Filesystem.File) at filesystem.jl:227
  seekend(::IOStream) at iostream.jl:141
  ...

Context

Trying to use BufferedStreams in the context of reading from ZipFiles uisng ZipFile.jl

Your Environment

Julia 1.3-rc1
Windows 10
ZipFile v0.8.3
BufferedStreams v1.0.0

[69666777] Arrow v0.2.3
[c52e3926] Atom v0.10.1
[39de3d68] AxisArrays v0.3.3
[fbb218c0] BSON v0.2.3
[6e4b80f9] BenchmarkTools v0.4.3
[9e28174c] BinDeps v0.8.10
[b99e7846] BinaryProvider v0.5.6
[163b9779] Blobs v0.3.0
[a74b3585] Blosc v0.5.1
[5f4fecfd] BrowseTables v0.3.0
[e1450e63] BufferedStreams v1.0.0
[336ed68f] CSV v0.5.11 [C:\Users\RTX2080\.julia\dev\CSV]
[324d7699] CategoricalArrays v0.5.5
[aaaa29a8] Clustering v0.13.3
[944b1d66] CodecZlib v0.6.0
[34da2185] Compat v2.1.0
[3a865a2d] CuArrays v1.2.1
[d58978e5] Dagger v0.8.0
[9a962f9c] DataAPI v1.0.1
[a93c6f00] DataFrames v0.19.4
[1313f7d8] DataFramesMeta v0.5.0
[864edb3b] DataStructures v0.17.0
[e7dc6d0d] DataValues v0.4.12
[31a5f54b] Debugger v0.6.1
[7806a523] DecisionTree v0.8.3
[b4f34e82] Distances v0.8.2
[31c24e10] Distributions v0.21.1
[becb17da] Feather v0.5.3
[5789e2e9] FileIO v1.0.7
[53afe959] FlatBuffers v0.5.3
[587475ba] Flux v0.9.0
[01a6e8c0] FstFileFormat v0.1.0
[38e38edf] GLM v1.3.1
[28b8d3ca] GR v0.41.0
[4d00f742] GeometryTypes v0.7.6
[708ec375] Gumbo v0.5.1
[cd3eb016] HTTP v0.8.6
[7073ff75] IJulia v1.20.0
[6218d12a] ImageMagick v0.7.5
[916415d5] Images v0.18.0
[5903a43b] Infiltrator v0.1.0 [C:\Users\RTX2080\.julia\dev\Infiltrator]
[7d512f48] InternedStrings v0.7.0
[41ab1584] InvertedIndices v1.0.0
[82899510] IteratorInterfaceExtensions v1.0.0
[babc3d20] JDF v0.1.0 [c:/Users/RTX2080\\git\\JDF\\]
[9da8a3cd] JLSO v1.1.0
[e5e0dc1b] Juno v0.7.2
[b964fa9f] LaTeXStrings v1.0.3
[50d2b5c4] Lazy v0.14.0
[add582a8] MLJ v0.2.3
[f28f55f0] Memento v0.12.1
[e1d29d7a] Missings v0.4.2
[9b87118b] PackageCompiler v0.6.4+ #sd-notomls (https://github.com/JuliaLang/PackageCompiler.jl.git)
[d96e819e] Parameters v0.11.0
[626c502c] Parquet v0.3.0
[91a5bcdd] Plots v0.26.3
[2dfb63ee] PooledArrays v0.5.2
[08abe8d2] PrettyTables v0.5.1
[6f49c342] RCall v0.13.4
[ce6b1742] RDatasets v0.6.1
[17b45ede] RLEVectors v0.8.1
[189a3867] Reexport v0.2.0
[295af30f] Revise v2.1.10
[6e75b9c4] ScikitLearnBase v0.5.0
[a2af1166] SortingAlgorithms v0.3.1
[2913bbd2] StatsBase v0.32.0 [c:/git/StatsBase.jl]
[f3b207a7] StatsPlots v0.12.0
[fd094767] Suppressor v0.1.1
[70df011a] TableReader v0.4.0
[3783bdb8] TableTraits v1.0.0
[40c74d1a] TableView v0.4.1
[bd369af6] Tables v0.2.11
[f269a46b] TimeZones v0.9.2
[9f7883ad] Tracker v0.2.3
[3bb67fe8] TranscodingStreams v0.9.5
[b8865327] UnicodePlots v1.1.0
[34922c18] VisualRegressionTests v0.3.1
[ea10d353] WeakRefStrings v0.6.1
[a5390f91] ZipFile v0.8.3
[ade2ca70] Dates
[8bb1440f] DelimitedFiles
[37e2e46d] LinearAlgebra
[44cfe95a] Pkg
[de0858da] Printf
[3fa0cd96] REPL
[9a3f8284] Random
[2f01184e] SparseArrays
[10745b16] Statistics
[8dfed614] Test
[4ec0a83e] Unicode

transfer to JuliaIO?

I think it would make sense to move this over to JuliaIO along with TranscodingStreams and all the CodecX packages.

UX for flushing buffer

I like to use BufferedStreams for printing to stdout because it's much faster. However, if I forget to flush the buffer, then not all the result get printed to stdout.

using Random, BufferedStreams
let ss = [randstring(20) for _ in 1:10^5]
    @time foreach(println, ss) # 5.463939 seconds (708.34 k allocations: 16.149 MiB)
end


# Incomplete result
let ss = [randstring(20) for _ in 1:10^5]
    io = BufferedOutputStream(stdout)
    @time foreach(ss) do s
        println(io, s)
    end # 0.554888 seconds (2.63 k allocations: 180.951 KiB, 2.34% compilation time)
    # BUT it's incomplete
end;

let ss = [randstring(20) for _ in 1:10^5]
    io = BufferedOutputStream(stdout)
    @time begin
        foreach(ss) do s
            println(io, s)
        end
        flush(io)
    end # 0.547968 seconds (2.64 k allocations: 181.054 KiB, 2.37% compilation time)
end;

What is a good solution to forgetting to flush the buffer?

One approach would be a do block that introduces the buffer at the beginning and flushes it at the end.

function buffering(f, out)
    io = BufferedOutputStream(out)
    try
        f(io)
    finally
        flush(io)
    end
end

let ss = [randstring(20) for _ in 1:10^5]
    @time buffering(stdout) do io
        foreach(ss) do s
            println(io, s)
        end
    end # 0.586527 seconds (8.66 k allocations: 712.136 KiB, 3.35% compilation time)
end;

lost data from mark/reset?

I noticed this when rewriting peek in #77. Running peek (which calls mark and reset) sometimes seems to screw up the buffer so that the subsequent read fails:

julia> io = BufferedInputStream(IOBuffer("α∆"), 1)
BufferedInputStream{IOBuffer}(<1 B buffer, 0% filled>)

julia> read(io, Char)
'α': Unicode U+03B1 (category Ll: Letter, lowercase)

julia> peek(io, Char)
'': Unicode U+2206 (category Sm: Symbol, math)

julia> read(io, Char)
ERROR: EOFError: read end of file
Stacktrace:
 [1] read
   @ ~/.julia/packages/BufferedStreams/0mu59/src/bufferedinputstream.jl:186 [inlined]
 [2] read(io::BufferedInputStream{IOBuffer}, #unused#::Type{Char})
   @ Base ./io.jl:789
 [3] top-level scope
   @ REPL[72]:1

PR #77 avoids this problem specifically for peek(io, Char) since it adds an optimized implementation of that function, but it would be good to identify and solve the underlying problem.

Test failure on PackageEvaluator, julia nightly

>>> 'Pkg.add("BufferedStreams")' log
INFO: Installing BufferedStreams v0.1.4
INFO: Package database updated
INFO: METADATA is out-of-date — you may not have the latest version of BufferedStreams
INFO: Use `Pkg.update()` to get the latest versions of your packages

>>> 'Pkg.test("BufferedStreams")' log
Julia Version 0.4.6
Commit 2e358ce (2016-06-19 17:16 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E3-1241 v3 @ 3.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Nehalem)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.3
INFO: Computing test dependencies for BufferedStreams...
INFO: Installing BaseTestNext v0.2.1
INFO: Testing BufferedStreams
seekforward: Error During Test
  Test threw an exception of type ArgumentError
  Expression: all(Bool[test_seekforward(stream,position,offset) for (position,offset) = zip(positions,offsets)])
  ArgumentError: n must be non-negative in seekforward
   in test_seekforward at /home/vagrant/.julia/v0.4/BufferedStreams/test/runtests.jl:239
   in anonymous at /home/vagrant/.julia/v0.4/BaseTestNext/src/BaseTestNext.jl:165
   in do_test at /home/vagrant/.julia/v0.4/BaseTestNext/src/BaseTestNext.jl:181
   [inlined code] from /home/vagrant/.julia/v0.4/BufferedStreams/test/runtests.jl:248
   in anonymous at no file:0
   in include at ./boot.jl:261
   in include_from_node1 at ./loading.jl:320
   in process_options at ./client.jl:280
   in _start at ./client.jl:378
Test Summary:       | Pass  Error  Total
BufferedInputStream |   67      1     68
  read              |    8             8
  peek              |    4             4
  peekbytes!        |   10            10
  readbytes!        |   10            10
  readuntil         |    2             2
  arrays            |    2             2
  marks             |    7             7
  anchors           |    1             1
  seek              |    2             2
  seekforward       |    2      1      3
  close             |    5             5
  iostream          |   11            11
  misc.             |    3             3
ERROR: LoadError: Some tests did not pass: 67 passed, 0 failed, 1 errored.
 in finish at /home/vagrant/.julia/v0.4/BaseTestNext/src/BaseTestNext.jl:385
 [inlined code] from /home/vagrant/.julia/v0.4/BaseTestNext/src/BaseTestNext.jl:559
 in anonymous at no file:0
 in include at ./boot.jl:261
 in include_from_node1 at ./loading.jl:320
 in process_options at ./client.jl:280
 in _start at ./client.jl:378
while loading /home/vagrant/.julia/v0.4/BufferedStreams/test/runtests.jl, in expression starting on line 18
===========================[ ERROR: BufferedStreams ]===========================

failed process: Process(`/home/vagrant/julia/bin/julia --check-bounds=yes --code-coverage=none --color=no /home/vagrant/.julia/v0.4/BufferedStreams/test/runtests.jl`, ProcessExited(1)) [1]

================================================================================
INFO: Removing BaseTestNext v0.2.1
ERROR: BufferedStreams had test errors
 in error at ./error.jl:21
 in test at pkg/entry.jl:803
 in anonymous at pkg/dir.jl:31
 in cd at file.jl:22
 in cd at pkg/dir.jl:31
 in test at pkg.jl:71
 in process_options at ./client.jl:257
 in _start at ./client.jl:378

>>> End of log

Will let you know if it happens often - keep an eye on http://pkg.julialang.org/detail/BufferedStreams.html

Archive this package in favor of TranscodingStreams?

I'm reasonably certain this package is deprecated in favor of TranscodingStreams and has been for a few years. Should we just archive it and put a big, fat sign up on the README that redirects people to TranscodingStreams?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.