twoolie / nbt Goto Github PK

View Code? Open in Web Editor NEW

360.0 360.0 75.0 768 KB

Python Parser/Writer for the NBT file format, and it's container the RegionFile.

License: MIT License

Python 100.00%

nbt's People

Contributors

Stargazers

Watchers

Forkers

kitlaan d0sboots xgkkp midnightlightning fenixin macfreek dtrauma kamyu2 mdornseif icarus-xx evilegg arancaytar hickford zmarvel elenw fwaggle psolyca atveit pombreda devmario zemoj fenhl djlw78 nyasara k1988 mingxiansen abjonnes darkmorford isawan graphenes danielniccoli indigos33k3r damajor mkuw s-leroux kerabromsmu ylehir mcrcortex steffen-kiess skyblockz hisuie08 mikehooper seurabimn coiax ianm24 5l1v3r1 dranorter thund3r81rd jlsajfj andreif justinx59 chickeaterbanana jan-leila themrengman origamiengineer rfblock a1silver random832 underscoren einekratzekatze jasonjtucker desktopfolder bensuperpc alex4200 bee-boii jobe1986 netherwhal ery4z almtesh demycode suprasummus openbagtwo

nbt's Issues

How do I install?

How do I install? Please add instructions to the readme.

I tried pip install nbt but that seems to have installed a different library with a different API.

AttributeError: 'module' object has no attribute 'NBTFile'

Issues with TAG_List(type= TAG_Compound())

So I'm having this problem, I think the problem comes from me but that seems so illogical, So I'll leave that here hoping someone would help me.
If I replace vill = TAG_List(type = TAG_Compound() with an other tag type, for example vill = TAG_List(type = TAG_Int()) It will work...
(Please notice that's I'm just a python beginner)
the code:

import nbt
from nbt.nbt import *
villages = NBTFile()
villages.name = "Data"
data = TAG_Compound()
data.tags.extend([
    TAG_Int(name="Tick", value=14694297)
    ])

vill = TAG_List(type = TAG_Compound())

vill.name = "Villages"
villages.tags.append(vill)
villages.tags.append(data)

print(villages.pretty_tree())

The Error message:

Traceback (most recent call last):
  File "C:\Users\Tom\Desktop\NBT-master\my files\VillageStacker_1.py", line 12, in <module>
    vill = TAG_List(type = TAG_Compound())
  File "C:\Python34\lib\site-packages\nbt-1.4.1-py3.4.egg\nbt\nbt.py", line 306, in __init__
    raise ValueError("No type specified for list: %s" % (name))
ValueError: No type specified for list: None

Update list of supported Python releases

Travis is currently configured to execute the tests with Python releases 2.7, 3.3, 3.4, 3.5 and 3.6. The current stable release is 3.7 and should be added to the list. Releases 3.3 and 3.4 are now referenced as archived in the Python documentation, and should be considered for removal.

Add tags for version releases

nbt/__init__.py contains a version string, but there are no releases associated with it.

This can easily be done on GitHub by just tagging a certain commit. E.g.

git tag release-1.2 ac2b23d3e4ef9e
git push --tags

If you like, I can lookup the appropriate tags and commit hashes.

"Read the full spec" link is dead in Readme.md

read the full spec at http://www.minecraft.net/docs/NBT.txt

A copy can be found here. Question is, if it's still valid 8 years later:
https://web.archive.org/web/20100310144708/www.minecraft.net/docs/NBT.txt

Problem writing chunks that have grown in size.

If you add enough data to a chunk so that it grows in size (needs another sector), then when you try to write it, the write function (region.py write_chunk) gets stuck in an infinite loop looking for a new place to put the chunk.

It seems like it just keeps reading the first sector of the file over and over.

My hack of a workaround was to just put the chunk at the end of the file since that is usually the first place it would fit anyway.

It seems it is pretty rare for a chunk to need more room. In testing my program that adds data (Populate Chests.py), I probably wrote at least a few thousand chunks by now and a grand total of two needed an extra sector.

tabs or spaces

What is the code standard for NBT on indentation? Since most files used tabs, that's what I used too. Most users seem to prefer spaces (e.g. 4b7c45e), and I'm fine with that too.

I recently noted that tests.py and some lines in the examples files uses spaces, the rest of the files uses tabs. Whatever the choice, I like to make it consistent. What do you prefer?

py3k

Hi,

I'm a noob with github, so I don't know where post it.
I'm using py3k, and I made something like a patch below. It's look like to work

from struct import pack, unpack, calcsize, error as StructError
from gzip import GzipFile
import zlib
# Replace UserDict by dict:
# from UserDict import DictMixin
import os, io

TAG_END = 0
TAG_BYTE = 1
TAG_SHORT = 2
TAG_INT = 3
TAG_LONG = 4
TAG_FLOAT = 5
TAG_DOUBLE = 6
TAG_BYTE_ARRAY = 7
TAG_STRING = 8
TAG_LIST = 9
TAG_COMPOUND = 10

class MalformedFileError(Exception):
    """Exception raised on parse error."""
    pass

class TAG(object):
    """Each Tag needs to take a file-like object for reading and writing.
    The file object will be initialised by the calling code."""
    id = None

    def __init__(self, value=None, name=None):
        self.name = name
        self.value = value

    #Parsers and Generators
    def _parse_buffer(self, buffer):
        raise NotImplementedError(self.__class__.__name__)

    def _render_buffer(self, buffer):
        raise NotImplementedError(self.__class__.__name__)

    #Printing and Formatting of tree
    def tag_info(self):
        return self.__class__.__name__ + \
               ('("%s")'%self.name if self.name else "") + \
               ": " + self.__repr__()

    def pretty_tree(self, indent=0):
        return ("\t"*indent) + self.tag_info()

class _TAG_Numeric(TAG):
    def __init__(self, value=None, name=None, buffer=None):
        super(_TAG_Numeric, self).__init__(value, name)
        self.size = calcsize(self.fmt)
        if buffer:
            self._parse_buffer(buffer)

    #Parsers and Generators
    def _parse_buffer(self, buffer):
        self.value = unpack(self.fmt, buffer.read(self.size))[0]

    def _render_buffer(self, buffer):
        buffer.write(pack(self.fmt, self.value))

    #Printing and Formatting of tree
    def __repr__(self):
        return str(self.value)

#== Value Tags ==#
class TAG_Byte(_TAG_Numeric):
    id = TAG_BYTE
    fmt = ">b"

class TAG_Short(_TAG_Numeric):
    id = TAG_SHORT
    fmt = ">h"

class TAG_Int(_TAG_Numeric):
    id = TAG_INT
    fmt = ">i"

class TAG_Long(_TAG_Numeric):
    id = TAG_LONG
    fmt = ">q"

class TAG_Float(_TAG_Numeric):
    id = TAG_FLOAT
    fmt = ">f"

class TAG_Double(_TAG_Numeric):
    id = TAG_DOUBLE
    fmt = ">d"

class TAG_Byte_Array(TAG):
    id = TAG_BYTE_ARRAY
    def __init__(self, name=None, buffer=None):
        super(TAG_Byte_Array, self).__init__(name=name)
        if buffer:
            self._parse_buffer(buffer)

    #Parsers and Generators
    def _parse_buffer(self, buffer):
        length = TAG_Int(buffer=buffer)
        self.value = buffer.read(length.value)

    def _render_buffer(self, buffer):
        length = TAG_Int(len(self.value))
        length._render_buffer(buffer)
        buffer.write(self.value)

    #Printing and Formatting of tree
    def __repr__(self):
        return "[%i bytes]" % len(self.value)

class TAG_String(TAG):
    id = TAG_STRING
    def __init__(self, value=None, name=None, buffer=None):
        super(TAG_String, self).__init__(value, name)
        if buffer:
            self._parse_buffer(buffer)

    #Parsers and Generators
    def _parse_buffer(self, buffer):
        length = TAG_Short(buffer=buffer)
        read = buffer.read(length.value)
        if len(read) != length.value:
            raise StructError()
        self.value = str(read, "utf-8")

    def _render_buffer(self, buffer):
        save_val = self.value.encode("utf-8")
        length = TAG_Short(len(save_val))
        length._render_buffer(buffer)
        buffer.write(save_val)

    #Printing and Formatting of tree
    def __repr__(self):
        return self.value

#== Collection Tags ==#
class TAG_List(TAG):
    id = TAG_LIST
    def __init__(self, type=None, value=None, name=None, buffer=None):
        super(TAG_List, self).__init__(value, name)
        if type:
            self.tagID = type.id
        else: self.tagID = None
        self.tags = []
        if buffer:
            self._parse_buffer(buffer)
        if not self.tagID:
            raise ValueError("No type specified for list")

    #Parsers and Generators
    def _parse_buffer(self, buffer):
        self.tagID = TAG_Byte(buffer=buffer).value
        self.tags = []
        length = TAG_Int(buffer=buffer)
        for x in range(length.value):
            self.tags.append(TAGLIST[self.tagID](buffer=buffer))

    def _render_buffer(self, buffer):
        TAG_Byte(self.tagID)._render_buffer(buffer)
        length = TAG_Int(len(self.tags))
        length._render_buffer(buffer)
        for i, tag in enumerate(self.tags):
            if tag.id != self.tagID:
                raise ValueError("List element %d(%s) has type %d != container type %d" %
                         (i, tag, tag.id, self.tagID))
            tag._render_buffer(buffer)

    #Printing and Formatting of tree
    def __repr__(self):
        return "%i entries of type %s" % (len(self.tags), TAGLIST[self.tagID].__name__)

    def pretty_tree(self, indent=0):
        output = [super(TAG_List,self).pretty_tree(indent)]
        if len(self.tags):
            output.append(("\t"*indent) + "{")
            output.extend([tag.pretty_tree(indent+1) for tag in self.tags])
            output.append(("\t"*indent) + "}")
        return '\n'.join(output)

class TAG_Compound(TAG, dict):
    id = TAG_COMPOUND
    def __init__(self, buffer=None):
        super(TAG_Compound, self).__init__()
        self.tags = []
        self.name = ""
        if buffer:
            self._parse_buffer(buffer)

    #Parsers and Generators
    def _parse_buffer(self, buffer):
        while True:
            type = TAG_Byte(buffer=buffer)
            if type.value == TAG_END:
                #print "found tag_end"
                break
            else:
                name = TAG_String(buffer=buffer).value
                try:
                    #DEBUG print type, name
                    tag = TAGLIST[type.value](buffer=buffer)
                    tag.name = name
                    self.tags.append(tag)
                except KeyError:
                    raise ValueError("Unrecognised tag type")

    def _render_buffer(self, buffer):
        for tag in self.tags:
            TAG_Byte(tag.id)._render_buffer(buffer)
            TAG_String(tag.name)._render_buffer(buffer)
            tag._render_buffer(buffer)
        buffer.write('\x00') #write TAG_END

    # Dict compatibility.
    # DictMixin requires at least __getitem__, and for more functionality,
    # __setitem__, __delitem__, and keys.

    def __getitem__(self, key):
        if isinstance(key,int):
            return self.tags[key]
        elif isinstance(key, str):
            for tag in self.tags:
                if tag.name == key:
                    return tag
            else:
                raise KeyError("A tag with this name does not exist")
        else:
            raise ValueError("key needs to be either name of tag, or index of tag")

    def __setitem__(self, key, value):
        if isinstance(key, int):
            # Just try it. The proper error will be raised if it doesn't work.
            self.tags[key] = value
        elif isinstance(key, str):
            value.name = key
            for i, tag in enumerate(self.tags):
                if tag.name == key:
                    self.tags[i] = value
                    return
            self.tags.append(value)

    def __delitem__(self, key):
        if isinstance(key, int):
            self.tags = self.tags[:key] + self.tags[key:]
        elif isinstance(key, str):
            for i, tag in enumerate(self.tags):
                if tag.name == key:
                    self.tags = self.tags[:i] + self.tags[i:]
                    return
            raise KeyError("A tag with this name does not exist")
        else:
            raise ValueError("key needs to be either name of tag, or index of tag")

    def keys(self):
        return [tag.name for tag in self.tags]


    #Printing and Formatting of tree
    def __repr__(self):
        return '%i Entries' % len(self.tags)

    def pretty_tree(self, indent=0):
        output = [super(TAG_Compound,self).pretty_tree(indent)]
        if len(self.tags):
            output.append(("\t"*indent) + "{")
            output.extend([tag.pretty_tree(indent+1) for tag in self.tags])
            output.append(("\t"*indent) + "}")
        return '\n'.join(output)


TAGLIST = {TAG_BYTE:TAG_Byte, TAG_SHORT:TAG_Short, TAG_INT:TAG_Int, TAG_LONG:TAG_Long, TAG_FLOAT:TAG_Float, TAG_DOUBLE:TAG_Double, TAG_BYTE_ARRAY:TAG_Byte_Array, TAG_STRING:TAG_String, TAG_LIST:TAG_List, TAG_COMPOUND:TAG_Compound}

class NBTFile(TAG_Compound):
    """Represents an NBT file object"""

    def __init__(self, filename=None, buffer=None, fileobj=None):
        super(NBTFile,self).__init__()
        self.__class__.__name__ = "TAG_Compound"
        self.filename = filename
        self.type = TAG_Byte(self.id)
        #make a file object
        if filename:
            self.file = GzipFile(filename, 'rb')
        elif buffer:
            self.file = buffer
        elif fileobj:
            self.file = GzipFile(fileobj=fileobj)
        else:
            self.file = None
        #parse the file given intitially
        if self.file:
            self.parse_file()
            if self.filename and 'close' in dir(self.file):
                self.file.close()
            self.file = None

    def parse_file(self, filename=None, buffer=None, fileobj=None):
        if filename:
            self.file = GzipFile(filename, 'rb')
        elif buffer:
            self.file = buffer
        elif fileobj:
            self.file = GzipFile(fileobj=fileobj)
        if self.file:
            try:
                type = TAG_Byte(buffer=self.file)
                if type.value == self.id:
                    name = TAG_String(buffer=self.file).value
                    self._parse_buffer(self.file)
                    self.name = name
                    self.file.close()
                else:
                    raise MalformedFileError("First record is not a Compound Tag")
            except StructError as e:
                raise MalformedFileError("Partial File Parse: file possibly truncated.")
        else: ValueError("need a file!")

    def write_file(self, filename=None, buffer=None, fileobj=None):
        if buffer:
            self.filename = None
            self.file = buffer
        elif filename:
            self.filename = filename
            self.file = GzipFile(filename, "wb")
        elif fileobj:
            self.filename = None
            self.file = GzipFile(fileobj=fileobj, mode="wb")
        elif self.filename:
            self.file = GzipFile(self.filename, "wb")
        elif not self.file:
            raise ValueError("Need to specify either a filename or a file")
        #Render tree to file
        TAG_Byte(self.id)._render_buffer(self.file)
        TAG_String(self.name)._render_buffer(self.file)
        self._render_buffer(self.file)
        #make sure the file is complete
        if 'flush' in dir(self.file): 
            self.file.flush()
        if self.filename and 'close' in dir(self.file):
            self.file.close()

(sorry for my bad english)

Add documentation

Most files in the doc folder are rather empty. I haven't find a big need for it, but perhaps we should either remove the empty files or fill them :)

Use NumPy in chunk

Proposal: use NumPy if available, otherwise the native Python version.

Two uses: to speed up 4-bit array <-> byte array conversions and XZY <-> YZX array conversion.

This does requires a test function before it can be written IMHO.

No support for TAG_Int_Array

type 11 is an array of TAG_Int's, analogous to the byte array

http://minecraft.gamepedia.com/NBT_format

I had a look at the code but I'm afraid I literally can't follow it :D

anvil_blockdata.py raise a ValueError

When you try the example anvil_blockdata.py with a existing data array in a *.mca file, the following error raise in the print_chunklayer function:

ValueError: byte must be in range(0, 256)

Cause when a data array exist, the blocks array should be a short integer and not a bytearray anymore.
I fix it like this (do not forget to import array):

def print_chunklayer(blocks, data, add, yoffset):
    blocks = array.array('h',blocks[yoffset*256:(yoffset+1)*256])
    data = array_4bit_to_byte(data[yoffset*128:(yoffset+1)*128])
    if add is not None:
        add = array_4bit_to_byte(add[yoffset*128:(yoffset+1)*128])
        for i,v in enumerate(add):
             blocks[i] +=  256*v

    assert len(blocks) == 256
    assert len(data) == 256

     for row in grouper(zip(blocks,data), 16):
         print (" ".join(("%4d:%-2d" % block) for block in row))

Anvil format.

Is anyone working already on this? If not I'll start it soon.

Handle uncompressed NBT files

Not all NBT files are gzipped. Minecraft uses two NBT's, without compression: idcounts.dat and servers.dat. More info at Nbt#Uses.

Is there a way to currently parse a NBT without uncompressing it? I get this error trying to open servers.dat with nbt.nbt.NBTFile:

>>> os.getcwd()
'/Users/winston/Library/Application Support/minecraft'
>>> os.listdir()
['.DS_Store', 'assets', 'launcher.jar', 'launcher.pack.lzma', 'launcher_profiles.json', 'libraries', 'logs', 'options.txt', 'output-client.log', 'resourcepacks', 'saves', 'screenshots', 'servers.dat', 'stats', 'textures_0.png', 'versions']
>>> nbt.VERSION
(1, 4, 1)
>>> serversnbt = nbt.nbt.NBTFile('servers.dat', 'rb')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.4/site-packages/nbt/nbt.py", line 508, in __init__
    self.parse_file()
  File "/usr/local/lib/python3.4/site-packages/nbt/nbt.py", line 532, in parse_file
    type = TAG_Byte(buffer=self.file)
  File "/usr/local/lib/python3.4/site-packages/nbt/nbt.py", line 85, in __init__
    self._parse_buffer(buffer)
  File "/usr/local/lib/python3.4/site-packages/nbt/nbt.py", line 90, in _parse_buffer
    self.value = self.fmt.unpack(buffer.read(self.fmt.size))[0]
  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/gzip.py", line 365, in read
    if not self._read(readsize):
  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/gzip.py", line 433, in _read
    if not self._read_gzip_header():
  File "/usr/local/Cellar/python3/3.4.2_1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/gzip.py", line 297, in _read_gzip_header
    raise OSError('Not a gzipped file')
OSError: Not a gzipped file

examples/map.py does not support anvil worlds

As stated on line 178, "map still only supports McRegion maps." Since this is part of the test suite, can this be updated to support Anvil?

Consistent nbt.TAG* init parameters

A small annoyance (to me). The parameters of the __init__ functions in nbt.TAG classes are very inconsistent. May I fix this?

Are there preferences for proposal 1 or proposal 2?

Current `init` methods

class TAG(object):
    def __init__(self, value=None, name=None):
class _TAG_Numeric(TAG):
    def __init__(self, value=None, name=None, buffer=None):
class TAG_Byte_Array(TAG, MutableSequence):
    def __init__(self, name=None, buffer=None):
class TAG_Int_Array(TAG, MutableSequence):
    def __init__(self, name=None, buffer=None):
class TAG_String(TAG, Sequence):
    def __init__(self, value=None, name=None, buffer=None):
class TAG_List(TAG, MutableSequence):
    def __init__(self, type=None, value=None, name=None, buffer=None):
class TAG_Compound(TAG, MutableMapping):
    def __init__(self, buffer=None):

Proposal 1

class TAG(object):
    def __init__(self, value=None, name=None, buffer=None):
class _TAG_Numeric(TAG):
    def __init__(self, value=None, name=None, buffer=None):
class TAG_Byte_Array(TAG, MutableSequence):
    def __init__(self, value=None, name=None, buffer=None):
class TAG_Int_Array(TAG, MutableSequence):
    def __init__(self, value=None, name=None, buffer=None):
class TAG_String(TAG, Sequence):
    def __init__(self, value=None, name=None, buffer=None):
class TAG_List(TAG, MutableSequence):
    def __init__(self, type=None, value=None, name=None, buffer=None):
class TAG_Compound(TAG, MutableMapping):
    def __init__(self, value=None, name=None, buffer=None):

Proposal 2

class TAG(object):
    def __init__(self, name=None, value=None, buffer=None):
class _TAG_Numeric(TAG):
    def __init__(self, name=None, value=None, buffer=None):
class TAG_Byte_Array(TAG, MutableSequence):
    def __init__(self, name=None, value=None, buffer=None):
class TAG_Int_Array(TAG, MutableSequence):
    def __init__(self, name=None, value=None, buffer=None):
class TAG_String(TAG, Sequence):
    def __init__(self, name=None, value=None, buffer=None):
class TAG_List(TAG, MutableSequence):
    def __init__(self, name=None, type=None, value=None, buffer=None):
class TAG_Compound(TAG, MutableMapping):
    def __init__(self, name=None, value=None, buffer=None):

Block example?

I'm trying to write a program that iterates over all the blocks in a world and checks compiles some statistics. But I can't figure how to get all the block data from a chunk. The block_analysis example appears to only work with the old map format.

Could someone give me an example or project that does something like this?

Example script: Item density per chunk (Anvil format compatible)

https://gist.github.com/fakuivan/33e681a4ae4892ff6dd3c77ae6eeda2e

Test infrastructure

Hi twoolie, thanks for setting up the travis test service; I'm new to it, and it looks promising.

I have two questions on the test infrastructure.

Can we move all tests to a test directory? I did this in pull request #29. The reason is that I like more tests, and also more test files, and rather not see the root folder cluttered with test files.
Would you consider all files to be part of the package, or only files in the nbt subdirectory? I presume all files are part of the distribution, but only files in the nbt subdirectory are part of the module (to be installed in site_packages). Other subdirectories should go somewhere else (e.g. doc and examples should go to the /usr/share/doc at the discretion of the package manager).

I'm asking the second question because the setup now explicitly treats tests.py as a module, by including it import tests. However, it is moved to a subfolder, from tests import tests no longer works unless we turn this into a module (e.g. by creating tests/__init__.py). I'm hesitant -- I rather treat it as a collection of test scripts, but not part of the actual module.

This brings up a third question. My current pull request #29 breaks Travis; I'm currently writing a patch for that, hence the questsions above.

What baseline should I use to write a patch against? Against the end of pull request #29, or -given that the Python3 support has direct impact on the patch- at the end of pull request #32?

Sorry for these questions; I'm trying to find out the way that is easiest for you to pull, having multiple pull requests piling does make it harder for me to get it right for you (and to be honest, I'm not very familiar with best practices in testing and pulling, so tips are welcome!)

Store TAG_Byte_Array.value as a bytearray instead of a string

The current nbt module stores the value of a TAG_Byte_Array as string. This has a few problems. It is not mutable, it is slow, it requires clumsy code using struct.unpack and struct.pack, and it may give problems when converting to Python 3 (since all strings are Unicode in Python 3).

I propose to store it as a bytearray, which is a native Python object for, well, a byte array. This is different from a byte object in that it is mutable.

The code change in nbt.py itself is rather simple, see commit 0a7000f.

However, it also requires changes in code that assumes it is a string, such as chunk.py. See c0aa823. Fixing chunk.py is easy enough, but this may also break other code that relies on this internal structure of TAG_Byte_Array.

What do you think? I'm very much in favour of changing it for speed, Python 3 compatibility and easy-of-use.

Let me demonstrate the easy of use with an example. This is the current code required to modify an item in the HeightMap:

heightBytes = mynbt['Level']['HeightMap'].value
heightdata = list(struct.unpack("256B", heightBytes))
heightdata[3] = 63
heightBytes = array.array('B', heightdata).tostring()
mynbt['Level']['HeightMap'].value = heightBytes

If the value parameter is mutable, it could greatly simplifies the above code. There is no need to pack or unpack anymore:

mynbt['Level']['HeightMap'].value[3] = 63

In fact, it is even trivial to add a __getitem__ and __setitem__ function to TAG_Byte_Array (see e8cd308), so one can simply write:

mynbt['Level']['HeightMap'][3] = 63

In my opinion, this is MUCH cleaner than the current code, and where I like NBT to head to.

My question to all users of NBT is: is the above advantage worth the downside that it will likely break code that still expects TAG_Byte_Array().value to be string? struct.unpack('256b', my_tag_byte_array.value) will raise a struct.error , although array.array('B', my_tag_byte_array.value) will continue to work just fine (although setting my_tag_byte_array.value to a string using array.array().tostring() may fail things in nbt.py)

As an added bonus, it is a bit faster too (it is actually 600 times faster, but that's negated by the slow method to parse the NBT). Here are some timing measurements:

current code: unpack (returns a tuple) and convert to list

mynbt = nbt.NBTFile(filename)                           2500   µs excluding actual I/O
blocksBytes = mynbt['Level']['Blocks'].value               6.7 µs
blocktuple = struct.unpack("32768B", blocksBytes)        350   µs
blockdata = list(blocktuple)                             180   µs

list: a slow alternative

mynbt = nbt.NBTFile(filename)                           2500   µs excluding actual I/O
blocksBytes = mynbt['Level']['Blocks'].value               6.7 µs
blockdata = [i for i in blocksBytes]                    2300   µs

bytes: an immutable byte sequence. blazingly fast.

mynbt = nbt.NBTFile(filename)                           2500   µs excluding actual I/O
blocksBytes = mynbt['Level']['Blocks'].value               6.7 µs
blockdata = bytes(blocksBytes)                             0.2 µs

bytearray: my proposal

mynbt = nbt.NBTFile(filename)                           2500   µs excluding actual I/O
blocksBytes = mynbt['Level']['Blocks'].value               6.7 µs
blockdata = bytearray(blocksBytes)                         4.2 µs

array: an equally fast alternative.

mynbt = nbt.NBTFile(filename)                           2500   µs excluding actual I/O
blocksBytes = mynbt['Level']['Blocks'].value               6.7 µs
blockdata = array.array('B', blocksBytes)                  4.4 µs

Setting a block should update light levels

nbt.chunk is capable of settings blocks, but does not update the light levels in the nbt file as well.

Edit: I consider this a small issue, and perhaps it should not be implemented by the NBT library, as it is very Minecraft specific, and applications that use NBT can implement this as well.

Server.Dat - Insert New Line

Hi everyone ...

I want to create new servers.dat file or insert server name and server ip to servers.dat file ...
I read my servers.dat but i cant write
Pls help me ..

Block database

Most Minecraft tools need some database for block IDs/data IDs, or even Biome IDs, entity IDs or item IDs.

Would this or would this not be part of NBT?

On one hand I'm not in favour of maintaing that list, and like NBT to be somewhat more generic than Minecraft-only (though I currently see little use of the NBT outside of the Minecraft community).

On the other hand, there is demand for such list, and in such cases a central repository is useful. With the easy collaboration in GitHub, I also hope to see some external input (so we the NBT maintainers don't need to maintain it themselves).

It made an attempt earlier at such name module in my experimental branch. I'm not satisfied yet. While it is extendable by external tools, it lacks any non-English support. Maintaining a non-English data list is out-of-scope in my opinion, but whatever is created should have hooks for others to easily plug in the non-English names.

However, before we even consider that, the first question is: is a block ID database in scope of NBT or not?

Change Manifest

Erm, I did some file renames, but never changed the Manifest.

I'm a little cautious to touch it myself because I'm uncertain about the correct syntax.

E.g. the current file contains

include *.txt
include examples

but the description at http://wiki.python.org/moin/Distutils/Tutorial seems to indicate it should be

include *.txt
recursive-include examples

Add support for pathlib.Path

For convenience, it should be possible to create an NBTFile using nbt.nbt.NBTFile(path), where path is a pathlib.Path object.

RegionFile broken

There is a missing parenthesis at the end of the file, and the function parse_header does not exist.

The classmethod getchunk is a stub, and would need a class as first argument.

Upon removing the parse_header function, the code continues, but NBT reports the specified file is not a gzipped file later on.

Add sample world for tests

I think it is inevitable that we need to a test world to NBT. I used a Anvil and Region-chunk in a local branch, but that does not do for testing nbt.region and nbt.world.

Such a testfile is hughe (about 6 MByte) compared to the code. It may be possible to trim it, but that requires an editor, e.g. NBT. And I'm not yet confident enough that NBT makes no mistake during that editing (that's why we need the test file!)

The question is: is 6 MB extra download a problem, or not?

No problem, just distribute with git repository and include in all setuptools-created downloads
Not a big issue, just distribute with git repository, but remove all test code from setuptools-created downloads
Somewhat of an issue, distribute in a different branch
Somewhat of an issue, distribute as separate download (at https://github.com/twoolie/NBT/downloads), and let the test script download it if it is not present.

@twoolie, others: please your opinion.

NBTFile has-a TAG_Compound

If a API change will be made for 2.0, I propose that NBTFile is not longer a subclass of TAG_Compound.

Tag_Compound already does two things: it is a Named OrderedDict object on one hand and has a specific serialisation on the other hand. NBTFile adds a third and fourth thing to it: an optional compression, and file read/write.

I suggest that the object and serialisation belongs in nbt, but the compression and file read/write does not. You will notice that region creates NBTFile objects, but completely bypasses the later two functions. Obviously for the read/write, but also for the compression: it uses zlib instead of GZip.

Finally, the many possible invocations cry for factory methods (@staticmethod). Or perhaps it's just me won't can't remember the difference between the filename, buffer and filobj parameters.

generate_heightmap should ignore non-solids

nbt.chunk.generate_heightmap() currently generated a height map that returns the highest non-air block. An examination of some minecraft files shows that the nbt['Level']['HeightMap'] contains the highest non-solid block. At least, it seems to ignore grass and flowers.

HTTPS issues with Python 2.

The current Travis automated tests currently fail for Python 2.6, pypy, as well as Python 2.7 if you are using 2.7.8. or earlier.

The reason seems rather mundane: the tests download the file https://github.com/downloads/twoolie/NBT/Sample_World.tar.gz. This file is not included in the standard distribution due to its size.

HTTPS support in Python 2 is very bad. In this particular case, it tries to download this file using the SSLv3 protocol, which was righteously disabled by github.com after the recent SSL Poodle vulnerability.

Python up to 2.7.8 only seems to try SSLv3, and does not use TLSv1.0. So it fails with an error:

>>> import urllib2
>>> request = urllib2.Request("https://github.com/downloads/twoolie/NBT/Sample_World.tar.gz")
>>> remotefile = urllib.urlopen(request)
urllib2.URLError: <urlopen error [Errno 1] _ssl.c:493: error:14077410:SSL routines:SSL23_GET_SERVER_HELLO:sslv3 alert handshake failure>

Now, this is a bit surprising, as Python 2.6 - 2.7.8 does support TLSv1.0 in the ssl module. However, the urllib2 module (which NBT uses) uses the httplib module, and the httplib module uses the ssl modules, but httplib does not expose this functionality to urllib2 (or to NBT). The situation is actually worse: up to Python 2.7.8, no certificate was ever checked for validity, SNI is not supported, and there are probably more issues if I would really dive into it. [Edit: up to Python 3.3, no certificate was ever checked for validity, see PEP 476].

The situation was actually so bad that the Python core team decided to do the only reasonable thing: they added new features to Python 2.7, and Python 2.7.9 was specifically released to backport the ssl, httplib, urllib2 and ensureip modules from Python 3 to Python 2. The release notes talk about "several significant changes unprecedented in a bugfix release".

Now this is a bit of a pickle: while the NBT code itself should still work, it currently can not be checked on Python 2.6 - 2.7.8. I propose the following:

As of now, NBT supports Python 2.7.9 and up, and Python 3.2 and up.
NBT is still compatible with Python 2.6 - 2.7.8, and we like to keep it that way, but this support may be dropped at any time. Basically, as soon as a next issue comes along which takes more than an hour to fix, support is dropped. Supporting older versions just takes too much time. Time we rather spend working on issues like #76 or #77.
I'll have a look if we can support Travis testing for 2.6-2.7.8. Either we skip some tests, we rely on an external module, such as request, for tests only, or we look into a quick fix (such as these two: Forcing TLS1+ and monkey-patch ssl.wrap_socket()).

Any feedback, positive and negative, is highly appreciated.

If the feedback is mostly in agreement, I will to create a version 1.4.2 (which should include patches for #76 and #77), with a note that it is the last version that supports Python 2.6 - 2.7.8.

Running tests.py changes bigtest.nbt

Running tests.py with a fresh commit changes bigtest.nbt. The file size changes from 507 bytes to 526 bytes. I don't know what's causing this, but I suggest to change tests.py to work on a copy or so.

Review of region.py

@Fenixin Thanks for your continued bug fixes. I haven't played Minecraft in a while and hence haven't contributed much to this project. I presume the same holds for @twoolie.

I just pushed a few bug-fixes that I had lying around for an extended period of time, but just never pushed because they weren't finished at the time. It's mostly adding more self-tests, and updated documentation. At least Travis is happy again :).

I also added some comments and "TODO"'s in nbt/region.py at the time. However, I'm still not very familiar with the region format. In case you want to have a look at what code I marked for further investigation, I just pushed it to a temporary branch: macfreek/NBT@4017af1. While most are probably just things that were unclear to me, some may the (potential) bugs.

New Chunk API

I propose to update the Chunk API with the following changes:

Better support for Anvil. For example a functions that gets or sets a block, regardless of the section (the setter may create new sections if it was not present). But also do the x,y,z calculation for the user regardless of the format (McRegion or Anvil).
Integrated functions that return the block ID and data ID. In particular addition of a function that returns the combined added block ID and block ID, so that block ID ≥ 255 are supported. Also deprecation of function that only return the data ID of a block (and keep a function that return the block ID and a function that returns the block ID + data ID)
Integrate the Chunk and BlockArray classes, since I feel it is a bit overcomplicated. (I found a way to do this while retaining backward compatibility with existing code).
Callback functions to update heightmap and lightlevels. These functions require detailed knowledge of blocks (solid/non-solid, possible emitted lightlevels for torches and lamps, etc.), which is a bit out of scope of chunk.py. This allows a tools to replace these functions with better equivalents.

Here is my refined proposal from an earlier attempt

Classes

class BaseChunk(object):
    """Abstract Chunk class."""

class McRegionChunk(BaseChunk):
    """Representation of a Chunk in McRegion format."""

class AnvilChunk(BaseChunk):
    """Representation of a Chunk in Anvil format."""

Instance variables:

def __init__(self, nbt):
    # self.nbt is a pointer to the NBT TAG_Compound instance
    self.nbt = nbt

    # self.blocks is deprecated: it used to refer to a BlockArray instance. 
    # It's functionality (and all it's variables) are now present in McRegionChunk.
    # It is present for backward compatibility.
    self.blocks = self

    # self.blockdata is a flat arrays of (block id, data) tuples in native order
    # In McRegion the length is 32768 and order is XZY, thus index = (x*16)+z)*16+y
    # In Anvil the length is n*4096 and order is YZZ, thus index = (y*16)+z)*16+x,
    #   with n = 0...16 (thus lenght 0...65536)
    # blockdata[i] = (256*addblocks[i] + blocks[i], data[i])
    self.blockdata = []

    # self.update_callbacks is an array of functions that are called just before 
    # writing the NBT file. By convention, the first function updates the heightmap
    # and the second function updates the lightlevels.
    # A callback function should take one parameter, a BaseChunk class.
    # These functions is called right after self.update_block_nbt()"""
    self.update_callbacks = [ update_heightmap, update_lightlevels ]

    # The following instance variables have been removed:
    # self.coords
    # self.blocksList
    # self.dataList

Methods:

#
# Metadata
#

def get_coords(self):
    """Return the x,z coordinates of the chunk. Multiply by 16 to get the global block coordinates."""
    """Unmodified."""

#
# Heightmap and Light level functions
#

def generate_heightmap(self, buffer=False, as_array=False):  # McRegion
def generate_heightmap(self):                                # Anvil
    """McRegion: Returns a bytearray containing the highest solid block. """
    """Anvil: Returns a list containing the highest solid block. """
    """Changed: buffer and as_array boolean parameters are only present in McRegion and 
    removed from Anvil.
    If buffer is True, the result is converted to a io.BytesIO instance.
    as_array is ignored (was: result converted to a array.array instance.)
    Reason for removal in Anvil is that the heightmap are now ints instead of bytes, 
    and these encoding conversions do no belong in chunk.py"""

def set_heightmap_callback(self, callback):
    """Set the callback function that is used to calculate the heightmap from the blockdata.
    The callback function should take one parameter, a BaseChunk class.
    The callback function is called right after self.update_block_nbt()"""
    """New"""

def set_lightlevel_callback(self, callback):
    """Set the callback function that is used to calculate the light levels from the blockdata.
    The callback function should take one parameter, a BaseChunk class.
    The callback function is called right after self.update_block_nbt()"""
    """New"""

#
# NBT functions
#

def parse_blocks(self):
    """Read NBT and fill self.blockdata, based on Blocks, Data, and AddBlocks in NBT file"""
    """Changed: now uses self.blockdata instead of self.blocksList and self.dataList"""

def update_block_nbt(self):
    """McRegion: Set self.nbt['Level']['Blocks'] and self.nbt['Level']['Data'] 
    based on self.blockdata """
    """Ǎnvil: Set self.nbt['Level']['Sections'][i]['Blocks'], 
    self.nbt['Level']['Sections'][i]['AddBlocks'] (if required) 
    and self.nbt['Level']['Sections'][i]['Data'] based on self.blockdata"""
    """New"""

def update_nbt(self):
    """Update self.nbt based on self.blockdata (including heightmap and light levels).
    This calls update_block_nbt() and the callback functions in order"""
    """New"""

def get_nbt(self):
    """Update the nbt (if block data was changed) and return it"""
    """New"""

#
# Block retrieval functions
#

def get_block(self, x, y, z):
    """Return the block id of the block at the x,y,z coordinates relative to this chunk"""
    """Changed: coord parameter removed for speed"""

def get_data(self, x, y, z):
    """Return the data id of the block at the x,y,z coordinates relative to this chunk"""
    """Changed: coord parameter removed for speed"""

def get_block_and_data(self, x, y, z):
    """Return a tuple (block id, data id) of the block at the x,y,z coordinates relative to 
    this chunk"""
    """Changed: coord parameter removed for speed"""

def get_all_blocks(self):
    """Iterate over all block ids, including all air blocks. 
    For more efficiency, use get_defined_block_ids()"""
    """Unmodified"""

def get_all_blocks_and_data(self):
    """Iterate over (block id, data) tuples, including undefined (air) blocks"""
    """Unmodified"""

def get_defined_blocks(self):
    """Iterate over all defined block ids, excluding air blocks in undefined sections"""
    """New"""

def get_defined_blocks_and_data(self):
    """Iterate over all defined (block id, data) tuples, excluding air blocks in undefined sections"""
    """New"""

#
# Structured block functions and block setting functions
#

def set_block(self, x,y,z, id, data=0):
    """Set the block to specified id and data value."""
    """Unmodified"""

def set_all_blocks_and_data(self, list):
    """McRegion: Replace all blocks with the given (block id, data) tuples. 
    All blocks should be specified in a flat list of 32768 entries in native XZY order."""
    """Anvil: Replace all blocks with the given (block id, data) tuples. 
    All blocks should be specified in a flat list of a multiple of 4096 entries in native YZX order. 
    If the list is smaller than 65536, the remaining blocks are zeroed (set to air)"""
    """WARNING: this function behaves slightly different for McRegion and Anvil"""
    """New"""

def set_blocks(self, dict=None, fill_air=False):
    """Replace blocks with specificied (x,y,z) coordinates. Each item is a (block id,data) tuple
    WARNING: the syntax of this function has changed; for lists, use set_all_blocks_and_data().
    It also now requires a (block id, data) tuple"""
    """Changed"""

#
# Deprecated and Removed functions
#

def get_blocks_struct(self):
    """Return a dict with defined (x,y,z): (block id, data) tuples. air blocks in undefined 
    sections may be excluded."""
    """Unmodified. Deprecated. May be removed in a future version."""

def get_blocks_byte_array(self, buffer=False):
    """Return blockList as a byte array"""
    """Removed"""
    raise NotImplementedError("Use get_all_blocks() instead of get_blocks_byte_array()")

def get_all_data(self):
    """Return dataList as a list"""
    """Removed"""
    raise NotImplementedError("Use zip(*self.get_all_blocks_and_data())[1] instead of get_all_data()")

def get_data_byte_array(self, buffer=False):
    """Return dataList as a byte array"""
    """Removed"""
    raise NotImplementedError()

#
# Biome functions
#

def get_biome(self, x, z):
    """Return the biome IDs at the specified x,z coordinated (relative to this chunk). 
    An ID of 255 means "Undetermined"."""
    """New"""

def get_biomes(self):
    """Return a list of biome IDs. The list if a flat array of integers in ZX order. 
    An ID of 255 means "Undetermined"."""
    """New"""
    # self.biomes is a flat array of integers (0-255) in ZX order. (i = (z * 16 + x))
    # Since biome IDs are not stored in McRegion format, it always set to 255 ("undetermined")

#
# Section functions.  These methods are only available for Anvil.
#

def get_section_blocks_and_data(self, height):
    """Return a list of 4096 (block id, data) tuples in YZX order, for the given section height. 
    The section height is 1/16 of the block height."""
    """New. Anvil only"""

def iter_defined_sections(self):
    """Iterate over each defined section in order from lowest to highest: 
    Each iteration yields a TAG_Compound defining the section.
    Check ['Y']*16 to get the height of the section.
    This is a reasonably fast routine. For even greater speed, don't use a Chunk class and 
    iterate the NBT yourself."""
    """New. Anvil only"""

def get_max_section_height(self):
    """Return the height of the highest defined section. Multiple by 16 and add 15 to get the upper 
    boundary for the height of the highest defined block. This method is only available for Anvil."""
    """New. Anvil only"""

#
# Height map routines
#

def get_min_floor_height(self):
    """Return the height of the lowest solid block reachable by sunlight in this chunk."""
    """New. Experimental. Function may be removed in later versions"""

def get_max_floor_height(self):
    """Return the height of the highest solid block in this chunk."""
    """New. Experimental. Function may be removed in later versions"""

def get_min_defined_block_height(self):
    """Return the height of the lowest non-air block in this chunk. Usually 0."""
    """New. Experimental. Function may be removed in later versions"""

def get_max_defined_block_height(self):
    """Return the highest non-air block. This is a slow routine. For a fast alternative, 
    use get_max_floor_height() 
    for the lower boundary and 16*get_max_section_height()+15 for the upper boundary()."""
    """New. Experimental. Function may be removed in later versions"""

Functions

def update_heightmap(chunk):
    """Set nbt['Level']['HeightMap'] based on self.blockdata"""

def update_lightlevels(self):
    """McRegion: Set nbt['Level']['SkyLight'] and nbt['Level']['BlockLight'] based on 
    self.blockdata"""
    """Anvil: Set self.sections[i]['SkyLight'] and self.sections[i]['BlockLight'] based 
    on self.blockdata"""

Opening empty files causes uncaught exception

Exactly as the title describes. I already have a test; I'll send a pull request as soon as I have a fix.

As to how I came across this... Beta's trying to grow support for saving NBT data back to disk, and there are a couple warts, such as whether there's a pre-existing file that needs to be saved over. Since NBT doesn't support mmaps or other magical writeback mechanisms, it'd be nice to at least not die on this kind of corner case.

Mixed compatibility with 1.12.2

I've been using the example scripts on save files from the latest version, 1.12.2, with mixed results. I'm not sure yet if the problems are in the main library or just the examples themselves. Summary of what I've found so far:
-biome_analysis.py seems to work on these save files, correctly enumerates biome types.
-mob_analysis.py seems to work, too, listing every mob in the world folder.
-regionfile_analysis.py doesn't work, gives errors like:
"chunk 2,1 is not a valid NBT file: outer object is not a TAG_Compound, but '\n'"
for each chunk in the region file. Digging into this indicates that it is successfully finding the chunks in the region file, but not successfully parsing them.
-block_analysis.py reports:
"0 total blocks in region, 0 are non-air (0.0000%)"
for all of the save files I've produced using version 1.12.2. Which seems to indicate that it's not finding anything at all.

I'm going to look into the underlying reasons more, but it looks like the major difference between biome_analysis and regionfile_analysis is that the former uses the library heavily, while the latter duplicates a lot of the work in order to get a more detailed view. Hopefully that means the library is still fully compatible with modern saves and we just need to update some of the examples.

Syntax Error In nbt.py

Line 152:

def _parse_buffer(self, buffer:

This is missing an end bracket, and should be:

def _parse_buffer(self, buffer):

Output of TAG to string

Twoolie recently accepted my pull request #31, which modifies the output of str(), repr(), pretty_tree() and tag_info() of NBT Tag objects, but raised some concern:

the whole point of pretty_tree is to do this sort of "meaningful" printing. I'd feel more comfortable if you kept the output of pretty_tree the same. perhaps move the old logic over to str if you are going to override repr?

The reason to make this change was because I was confused by the following output:

>>> f = nbt.NBTFile("bigtest.nbt")
>>> print(f)
11 Entries
>>> print(repr(f))
11 Entries

It left me wondering "what type of object is f? And what are these 11 entries?"

Given the concern raised above, let me poll here what the preferred output is.

TAG objects have four methods that return a string:

__str__()
__repr__()
tag_info()
pretty_tree()

Recent Changes

Here's a quick side-by-side comparison of the changes in output. The full output of pretty_tree() is listed at the bottom of this post.

repr() before commit #31	repr() after commit #31
`>>> repr(mynbt) '11 Entries'`	`>>> repr(mynbt) "<TAG_Compound('Level') at 0x10e9a5fd0>"`
str() before commit #31	str() after commit #31
`>>> print(f["byteTest"]) 127 >>> print(f["intTest"]) 2147483647 >>> print(f["floatTest"]) 0.4982314705848694 >>> print(f["stringTest"]) HELLO WORLD ÅÄÖ! >>> print(f["byteArrayTest"]) [7 bytes] >>> print(f["intArrayTest"]) [7 ints] >>> print(f["listTest"]) 2 entries of type TAG_Long >>> print(f["nested compound test"]) 3 Entries`	`>>> print(f["byteTest"]) 127 >>> print(f["intTest"]) 2147483647 >>> print(f["floatTest"]) 0.4982314705848694 >>> print(f["stringTest"]) HELLO WORLD ÅÄÖ! >>> print(f["byteArrayTest"]) [0, 62, 34, 16, 8, 10, 22] >>> print(f["intArrayTest"]) [0, 62, 34, 16, 8, 10, 22] >>> print(f["listTest"]) [TAG_Long: 11, TAG_Long: 12] >>> print(f["nested compound test"]) {TAG_Compound('ham'): {2 Entries}, TAG_String('name'): Compound tag #1, TAG_Long('created-on'): 1264099775885}`
tag_info() before commit #31	tag_info() after commit #31
`TAG_Byte("byteTest"): 127 TAG_Short("shortTest"): 32767 TAG_Int("intTest"): 2147483647 TAG_Long("longTest"): 9223372036854775807 TAG_Float("floatTest"): 0.4982314705848694 TAG_Double("doubleTest"): 0.4931287132182315 TAG_String("stringTest"): HELLO WORLD ÅÄÖ! TAG_Byte_Array("byteArrayTest"): [1000 bytes] TAG_Int_Array("intArrayTest"): [256 ints] TAG_List("listTest"): 5 entries of type TAG_Long TAG_Compound("nested compound test"): 2 Entries`	`TAG_Byte("byteTest"): 127 TAG_Short("shortTest"): 32767 TAG_Int("intTest"): 2147483647 TAG_Long("longTest"): 9223372036854775807 TAG_Float("floatTest"): 0.4982314705848694 TAG_Double("doubleTest"): 0.4931287132182315 TAG_String("stringTest"): HELLO WORLD ÅÄÖ! TAG_Byte_Array("byteArrayTest"): [1000 byte(s)] TAG_Int_Array("intArrayTest"): [256 int(s)] TAG_List("listTest"): [5 TAG_Long(s)] TAG_Compound("nested compound test"): {2 Entries}`
pretty_tree() before commit #31	pretty_tree() after commit #31
`TAG_Compound("Level"): 3 Entries { TAG_Byte_Array("Biomes"): [256 bytes] TAG_Int_Array("HeightMap"): [60 ints][66, 66, 65, 65, 65, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 104, 67, 67, 65, 65, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 104, 104, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67, 67] TAG_List("Pos"): 3 entries of type TAG_Double { TAG_Double: -142.961140196 TAG_Double: 67.0 TAG_Double: 85.4952413138 } }`	`TAG_Compound("Level"): 3 Entries { TAG_Byte_Array("Biomes"): [256 byte(s)] TAG_Int_Array("HeightMap"): [60 int(s)] TAG_List("Pos"): [3 TAG_Double(s)] { TAG_Double: -142.961140196 TAG_Double: 67.0 TAG_Double: 85.4952413138 } }`

Observe that the output of tag_info() and pretty_tree() is only slightly changed.

TAG_Int_Array.pretty_tree() now mimics TAG_Byte_Array.pretty_tree()
Naming is more consistent. Everything is <number> <type>(s).
brackets are more consistent. Previously only the values of arrays got square brackets. List and compound values now also have brackets.

The most important changes are to str() and repr().

The Python manual has the following requirements for the str() and repr() functions:

repr: should return a string that is acceptable to eval(). If this is not possible, a string enclosed in angle brackets that contains the name of the type of the object the address of the object.

str: return an “informal” string representation of an object. The return value must be a string object (apparently a byte string for Python 2 and a Unicode string for Python 3)

Questions

What should be the output of __repr__()?
a. 11 Entries (previous result; goes against Python guidelines on __repr__)
b. <TAG_Compound('Level') at 0x10e9a5fd0> (current solution)
c. Recursive nesting of initialisation string (Python recommended): TAG_Compound(name='Level', value=[ TAG_Long(name='longTest', value=9223372036854775807), TAG_Short(name='shortTest', value=32767), TAG_String(name='stringTest', value=u'HELLO WORLD THIS IS A TEST STRING ÅÄÖ!'), TAG_Float(name='floatTest', value=0.4982314705848694), TAG_Int(name='intTest', value=2147483647), TAG_Compound(name='nested compound test', value=[ TAG_Compound(name='ham', value=[ TAG_String(name='name', value='Hampus'), TAG_Float(name='value', value=0.75) ]) TAG_Compound(name='egg',value=[ TAG_String(name='name', value='Eggbert'), TAG_Float(name='value', value=0.5) ]) ]) TAG_List(name='listTest (long)', value=[ TAG_Long(value=11), TAG_Long(value=12), TAG_Long(value=13), TAG_Long(value=14), TAG_Long(value=15) ]) TAG_List(name='listTest (compound)', value=[ TAG_Compound(value=[ TAG_String(name='name', value='Compound tag #0'), TAG_Long(name='created-on', value=1264099775885) ]) TAG_Compound(value=[ TAG_String(name='name', value='Compound tag #1'), TAG_Long(name='created-on', value=1264099775885) ]) ]) TAG_Byte('byteTest', value=127), TAG_Byte_Array(name='byteArrayTest (the first 1000 values of (n*n*255+n*7)%100, starting with n=0 (0, 62, 34, 16, 8, ...))', value=[0, 62, 34, 16, 8, 10, 22, 44, 76, 18, 70, 32, 4, 86, 78, 80, 92, 14, 46, 88, 40, 2, 74, 56, 48, 50, 62, 84, 16, 58, 10, 72, 44, 26, 18, 20, 32, 54, 86, 28, 80, 42, 14, 96, 88, 90, 2, 24, 56, 98, 50, 12, 84, 66, 58, 60, 72, 94, 26, 68, 20, 82, 54, 36, 28, 30, 42, 64, 96, 38, 90, 52, 24, 6, 98, 0, 12, 34, 66, 8, 60, 22, 94, 76, 68, 70, 82, 4, 36, 78, 30, 92, 64, 46, 38, 40, 52, 74, 6, 48, 0, 62, 34, 16, 8, 10, 22, 44, 76, 18, 70, 32, 4, 86, 78, 80, 92, 14, 46, 88, 40, 2, 74, 56, 48, 50, 62, 84, 16, 58, 10, 72, 44, 26, 18, 20, 32, 54, 86, 28, 80, 42, 14, 96, 88, 90, 2, 24, 56, 98, 50, 12, 84, 66, 58, 60, 72, 94, 26, 68, 20, 82, 54, 36, 28, 30, 42, 64, 96, 38, 90, 52, 24, 6, 98, 0, 12, 34, 66, 8, 60, 22, 94, 76, 68, 70, 82, 4, 36, 78, 30, 92, 64, 46, 38, 40, 52, 74, 6, 48, 0, 62, 34, 16, 8, 10, 22, 44, 76, 18, 70, 32, 4, 86, 78, 80, 92, 14, 46, 88, 40, 2, 74, 56, 48, 50, 62, 84, 16, 58, 10, 72, 44, 26, 18, 20, 32, 54, 86, 28, 80, 42, 14, 96, 88, 90, 2, 24, 56, 98, 50, 12, 84, 66, 58, 60, 72, 94, 26, 68, 20, 82, 54, 36, 28, 30, 42, 64, 96, 38, 90, 52, 24, 6, 98, 0, 12, 34, 66, 8, 60, 22, 94, 76, 68, 70, 82, 4, 36, 78, 30, 92, 64, 46, 38, 40, 52, 74, 6, 48, 0, 62, 34, 16, 8, 10, 22, 44, 76, 18, 70, 32, 4, 86, 78, 80, 92, 14, 46, 88, 40, 2, 74, 56, 48, 50, 62, 84, 16, 58, 10, 72, 44, 26, 18, 20, 32, 54, 86, 28, 80, 42, 14, 96, 88, 90, 2, 24, 56, 98, 50, 12, 84, 66, 58, 60, 72, 94, 26, 68, 20, 82, 54, 36, 28, 30, 42, 64, 96, 38, 90, 52, 24, 6, 98, 0, 12, 34, 66, 8, 60, 22, 94, 76, 68, 70, 82, 4, 36, 78, 30, 92, 64, 46, 38, 40, 52, 74, 6, 48, 0, 62, 34, 16, 8, 10, 22, 44, 76, 18, 70, 32, 4, 86, 78, 80, 92, 14, 46, 88, 40, 2, 74, 56, 48, 50, 62, 84, 16, 58, 10, 72, 44, 26, 18, 20, 32, 54, 86, 28, 80, 42, 14, 96, 88, 90, 2, 24, 56, 98, 50, 12, 84, 66, 58, 60, 72, 94, 26, 68, 20, 82, 54, 36, 28, 30, 42, 64, 96, 38, 90, 52, 24, 6, 98, 0, 12, 34, 66, 8, 60, 22, 94, 76, 68, 70, 82, 4, 36, 78, 30, 92, 64, 46, 38, 40, 52, 74, 6, 48, 0, 62, 34, 16, 8, 10, 22, 44, 76, 18, 70, 32, 4, 86, 78, 80, 92, 14, 46, 88, 40, 2, 74, 56, 48, 50, 62, 84, 16, 58, 10, 72, 44, 26, 18, 20, 32, 54, 86, 28, 80, 42, 14, 96, 88, 90, 2, 24, 56, 98, 50, 12, 84, 66, 58, 60, 72, 94, 26, 68, 20, 82, 54, 36, 28, 30, 42, 64, 96, 38, 90, 52, 24, 6, 98, 0, 12, 34, 66, 8, 60, 22, 94, 76, 68, 70, 82, 4, 36, 78, 30, 92, 64, 46, 38, 40, 52, 74, 6, 48, 0, 62, 34, 16, 8, 10, 22, 44, 76, 18, 70, 32, 4, 86, 78, 80, 92, 14, 46, 88, 40, 2, 74, 56, 48, 50, 62, 84, 16, 58, 10, 72, 44, 26, 18, 20, 32, 54, 86, 28, 80, 42, 14, 96, 88, 90, 2, 24, 56, 98, 50, 12, 84, 66, 58, 60, 72, 94, 26, 68, 20, 82, 54, 36, 28, 30, 42, 64, 96, 38, 90, 52, 24, 6, 98, 0, 12, 34, 66, 8, 60, 22, 94, 76, 68, 70, 82, 4, 36, 78, 30, 92, 64, 46, 38, 40, 52, 74, 6, 48, 0, 62, 34, 16, 8, 10, 22, 44, 76, 18, 70, 32, 4, 86, 78, 80, 92, 14, 46, 88, 40, 2, 74, 56, 48, 50, 62, 84, 16, 58, 10, 72, 44, 26, 18, 20, 32, 54, 86, 28, 80, 42, 14, 96, 88, 90, 2, 24, 56, 98, 50, 12, 84, 66, 58, 60, 72, 94, 26, 68, 20, 82, 54, 36, 28, 30, 42, 64, 96, 38, 90, 52, 24, 6, 98, 0, 12, 34, 66, 8, 60, 22, 94, 76, 68, 70, 82, 4, 36, 78, 30, 92, 64, 46, 38, 40, 52, 74, 6, 48, 0, 62, 34, 16, 8, 10, 22, 44, 76, 18, 70, 32, 4, 86, 78, 80, 92, 14, 46, 88, 40, 2, 74, 56, 48, 50, 62, 84, 16, 58, 10, 72, 44, 26, 18, 20, 32, 54, 86, 28, 80, 42, 14, 96, 88, 90, 2, 24, 56, 98, 50, 12, 84, 66, 58, 60, 72, 94, 26, 68, 20, 82, 54, 36, 28, 30, 42, 64, 96, 38, 90, 52, 24, 6, 98, 0, 12, 34, 66, 8, 60, 22, 94, 76, 68, 70, 82, 4, 36, 78, 30, 92, 64, 46, 38, 40, 52, 74, 6, 48, 0, 62, 34, 16, 8, 10, 22, 44, 76, 18, 70, 32, 4, 86, 78, 80, 92, 14, 46, 88, 40, 2, 74, 56, 48, 50, 62, 84, 16, 58, 10, 72, 44, 26, 18, 20, 32, 54, 86, 28, 80, 42, 14, 96, 88, 90, 2, 24, 56, 98, 50, 12, 84, 66, 58, 60, 72, 94, 26, 68, 20, 82, 54, 36, 28, 30, 42, 64, 96, 38, 90, 52, 24, 6, 98, 0, 12, 34, 66, 8, 60, 22, 94, 76, 68, 70, 82, 4, 36, 78, 30, 92, 64, 46, 38, 40, 52, 74, 6, 48]), TAG_Double(name='doubleTest', value=0.4931287132182315) ]) ])
What should be the output of str() for collection TAGs?

For single value entities, like TAG_Numeric and TAG_String, I think it should
just return str(self.value). For collection TAGs, it can be either one of:

a. 11 Entries (previous result)
b. TAG_Compound("Level"): 11 Entries (same as tag_info)
c. non-recursive nesting of all entries (current solution): {TAG_Long('longTest'): 9223372036854775807, TAG_Short('shortTest'): 32767, TAG_String('stringTest'): HELLO WORLD THIS IS A TEST STRING ÅÄÖ!, TAG_Float('floatTest'): 0.4982314705848694, TAG_Int('intTest'): 2147483647, TAG_Compound('nested compound test'): {2 Entries}, TAG_List('listTest (long)'): [5 TAG_Long(s)], TAG_List('listTest (compound)'): [2 TAG_Compound(s)], TAG_Byte('byteTest'): 127, TAG_Byte_Array('byteArrayTest (the first 1000 values of (n*n*255+n*7)%100, starting with n=0 (0, 62, 34, 16, 8, ...))'): [1000 byte(s)], TAG_Double('doubleTest'): 0.4931287132182315}
d. As c, but with infinite nesting. Kind like the solution 1c bove.
3. What should be the output of collection values after tag_info() (and thus pretty_tree())?

I suspect we all agree on TAG_Long('created-on'): 1264099775885 for TAG_Long. However
for collection values in TAG_List, TAG_Compound, TAG_Byte_Array and TAG_Int_Array it is less
obvious.

a. without [] and {}, inconsistent names (previous solution, exactly as in original NBT.txt specification)
TAG_Byte_Array('byteArrayTest'): [1000 bytes] TAG_List("listTest (long)"): 5 entries of type TAG_Long TAG_Compound("Level"): 11 Entries

b. with [] and {}, consistent names (current solution)
TAG_Byte_Array('byteArrayTest'): [1000 byte(s)] TAG_List('listTest (long)'): [5 TAG_Long(s)] TAG_Compound('Level'): {11 Entries}

c. without [] and {}, consistent names
TAG_Byte_Array('byteArrayTest'): 1000 byte(s) TAG_List('listTest (long)'): 5 TAG_Long(s) TAG_Compound('Level'): 11 Entries

d. with [] and {}, inconsistent names
TAG_Byte_Array('byteArrayTest'): [1000 bytes] TAG_List("listTest (long)"): [5 entries of type TAG_Long] TAG_Compound("Level"): {11 Entries}
4. What should be the output of string values in tag_info() (and thus pretty_tree())?
a. TAG_String("stringTest"): HELLO WORLD THIS IS A TEST STRING ÅÄÖ! (current implementation)
b. TAG_String("stringTest"): u'HELLO WORLD THIS IS A TEST STRING ÅÄÖ!' (more clear what it is)
5. What should be the repr() string for a NBTFile object?

NBTFile is a subclass of a TAG_Compound, and instances are presented as if they where
TAG_Compounds. This may go against Python guidelines for __repr__().
(I personally don't mind the current solution, but a change is fine to, since I probably
use str() instead of repr(), and str() will continue to behave as TAG_Compound.)

a. <TAG_Compound('Level') at 0x10e9a5fd0> (current solution)
b. <NBTFile('tests/bigtest.nbt') at 0x10e9a5fd0>
c. NBTFile('tests/bigtest.nbt')
6. How should str() deal with non-ascii characters in Python 2?

str(nbt.NBTFile("tests/bigtest.nbt")) may yield a UnicodeEncodeError, if a TAG_String contains non-ascii characters, such as in the example. Python 3 handles this gracefully, but Python 2 does not. This mimics exisiting behaviour. In Python 2, str(u'¿whåt?') also raises a UnicodeEncodeError.

a. return str(self.value) (current solution, mimics Python behaviour, but may raise UnicodeEncodeError)
b. return unicode(self.value) (Python 3 solution, but may not be what users expect from str() in Python 2)
c. return self.value.encode('utf-8') (makes assumptions about encoding, which may be incorrect)
d. return self.value.encode(encoding) with encoding based on sys.stdout.encoding, locale.getpreferredencoding(), sys.getdefaultencoding() or some other magic (mimics print function)

PyPI package out of date

The PyPI package at http://pypi.python.org/pypi/NBT/ is at version 1.1, and the most recent git tag indicates that version 1.3 is available. Is this intentional? Is 1.3 stable enough for distribution?

Chunk and Region iterators in WorldFolder

A relative recent addition to NBT is world.py with the WorldFolder class. The expected use is for tools that iterate through all Chunks, without caring about the specific Region file.

A common complaint I hear is that NBT is slow. One way to speed things up is to process each region file using a different subprocess and combine the results (this would be a Map-Reduce pattern). The best way to implement this is using a callback function.

E.g.:

def count_blocks(chunk):
    """Given a chunk, return the number of block IDs in this chunk"""
    chunk_block_count = [0]*256   # array of 256 integers, one for each block ID
    for block_id in chunk.get_all_blocks():
        chunk_block_count[block_id] += 1
    return chunk_block_count

def summarize_blocks(chunk_block_counts):
    """Given multiple chunk_block_count arrays, add them together."""
    total_block_count = [0]*256   # array of 256 integers, one for each block ID
    for chunk_block_count in chunk_block_counts:
        for block_id in range(256):
            total_block_count[block_id] += chunk_block_count[block_id]
    return total_block_count

world = WorldFolder(myfolder)
block_count = world.chunk_mapreduce(count_blocks, summarize_blocks)

However, I fear that the term "mapreduce" is not well know with all programmers, and I'm looking for an easier name. Would the following be easier to understand?

world = WorldFolder(myfolder)
chunk_block_counts = world.process_chunks(count_blocks)
block_count = summarize_blocks(chunk_block_counts)

The advantage is that the parallelisation can happen behind the scenes (though the multiprocessing.Pool class already makes it very easy).

The disadvantage is that it adds a third method to the existing get_chunks and iter_chunks methods in the WorldFolder class. In addition, there probably also need a process_nbt and process_regions next to process_chunks.

In retrospect, the difference between get_chunks (which returns a list) and iter_chunks (which returns an iterator) is so minor (iterators consume less memory, but lists can be cached) that it did not warrant the double function.

I'm inclined to remove the cached get_chunks (though I liked the name better than iter_chunks).

Any opinions?

Add Biome data to Sample World

The Sample World at https://github.com/twoolie/NBT/downloads does not contain Biome data.

The world was a regular McRegion world, converted to Anvil with Mojang's converter.

It turns out that this convertor does not add biome data. An alternative method is to fire up the Minecraft client to do the conversion, but that will move the mobs before it can be closed, which I consider a disadvantage (if both worlds are equal, it's easier to compare the McRegion and Anvil parser in test scripts). Opening the file in a Minecraft server may not move the mobs if the /stop command is given immediately, but will always generate a 380x380 area around spawn, further increasing the file size.

If you have any advice on adding biome data without changing the rest of the region/chunk data, please post here.

Update tests for example scripts

Currently, most test are done on "Sample World" which is an McRegion file.
Ideally, we should have a "McRegion Sample World" (this one), a "Anvil Sample World" and a "Flattened Sample World", and run the test scripts -were applicable- on all worlds.

Also, we need to check if all example scripts are tested.

I'm not sure how to convert the Sample World in a way that does not change it other than syntax conversion (e.g. prevent the client from generating new chunks, or moving entities.). And if we can not convert it that way, how to update the test suites.

Accidental upload of branches

@twoolie Just letting you know I accidentally uploaded a bunch of old branches. I deleted them again, but this is why you probably saw a lot of travis error reports. My apologies.

1.13 nbt.region support

I think either the mca file format or the chunk format changed, any chance we can get nbt working again with 1.13?

Improper delitem definition in TAG_Int_Array

Similar to #64, the delitem in TAG_Int_Array has an extra parameter. This causes Exceptions with pop() and remove().

Unnamed TAG_Compound raises exception when it it serialized

Hey, It's Omeganx again, I'm having some troubles with the write_file() method, here is my code:

import nbt
from nbt.nbt import *
xmax, ymin, zmin = -77, 114, 252 ##the coordinates of the farm in my current world (for testing and debugging)
xlong, zlong = 40, 32  ##dimension of the iron farm
xmin = xmax-xlong
ts = 14694294 ## the last time a villager was near(just a random value)

villages = NBTFile() ##see the villager.dat file, this part try to redo it with the right coordinates (restacking the vilages, if not yet. the coordinates can be changed later)
data = TAG_Compound()
data.name = "data"
data.tags.extend([
    TAG_Int(name="Tick", value = 100000)
    ])
Villages = TAG_List(type=TAG_Compound)
for z in range(32):
    compound = TAG_Compound()
    liste = TAG_List(type=TAG_Compound, name="Doors")
    totalx, totaly, totalz = 0, 0, 0
    for x in range(xmin, xmin+11, 1):
        door = TAG_Compound()
        door.tags.extend([
            TAG_Int(name= "X", value = x ),
            TAG_Int(name= "Y", value = ymin ),
            TAG_Int(name= "Z", value = zmin+z),
            TAG_Int(name= "IDX", value = 2),
            TAG_Int(name= "IDZ", value = 0),
            TAG_Int(name= "TS", value = ts ),
            ])
        totalx+= x
        totaly+=ymin
        totalz+=zmin+z
        liste.tags.append(door)
    for x in range(xmax-11, xmax, 1):
        door = TAG_Compound()
        door.tags.extend([
            TAG_Int(name= "X", value = x ),
            TAG_Int(name= "Y", value = ymin ),
            TAG_Int(name= "Z", value = zmin+z),
            TAG_Int(name= "IDX", value = 2),
            TAG_Int(name= "IDZ", value = 0),
            TAG_Int(name= "TS", value = ts ),
            ])
        totalx+= x 
        totaly+= ymin
        totalz+= zmin+z
        liste.tags.append(door)
    compound.tags.append(liste)
    compound.tags.extend([
        TAG_Int(name="Radius", value=32),
        TAG_Int(name="Stable", value=6605526),
        TAG_Int(name="MTick", value=0),
        TAG_Int(name="Golems", value=1),
        TAG_Int(name="CX", value=int(totalx/22)),
        TAG_Int(name="CY", value=int(totaly/22)),
        TAG_Int(name="CZ", value=int(totalz/22)),
        TAG_Int(name="ACX", value=totalx),
        TAG_Int(name="ACY", value=totaly),
        TAG_Int(name="ACZ", value=totalz),
        TAG_Int(name="PopSize", value=61),
        TAG_Int(name="Tick", value=ts)
    ])
    ##Players = TAG_List(type=TAG_End, name="Players")
    ##compound.tags.append(Players)
    Villages.tags.append(compound)    
data.tags.append(Villages)
villages.tags.append(data)
print(villages.pretty_tree())
villages.write_file("villages.dat")

Everything seems to work fine untill: villages.write_file("villages.dat")
And Also how do you use "TAG_End" ?

Fails to parse TAG_LIST since 1.7 (didn't test with any snapshots)

I upgraded our server from 1.6.4 to 1.7.2 yesterday and suddenly one of my scripts to print the death counters stopped working. After looking into it it turns out it suddenly fails to parse the TAG_LIST.

Example of what I did to reproduce below. Please note that the scoreboard in minecraft is in fact filled with some objectives.

>>> from nbt import *
>>> nbtfile = nbt.NBTFile("/home/schoentoon/minecraft/survival/world/data/scoreboard.dat", 'rb')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/NBT-1.3-py2.7.egg/nbt/nbt.py", line 458, in __init__
    self.parse_file()
  File "/usr/local/lib/python2.7/dist-packages/NBT-1.3-py2.7.egg/nbt/nbt.py", line 475, in parse_file
    self._parse_buffer(self.file)
  File "/usr/local/lib/python2.7/dist-packages/NBT-1.3-py2.7.egg/nbt/nbt.py", line 345, in _parse_buffer
    tag = TAGLIST[type.value](buffer=buffer)
  File "/usr/local/lib/python2.7/dist-packages/NBT-1.3-py2.7.egg/nbt/nbt.py", line 333, in __init__
    self._parse_buffer(buffer)
  File "/usr/local/lib/python2.7/dist-packages/NBT-1.3-py2.7.egg/nbt/nbt.py", line 345, in _parse_buffer
    tag = TAGLIST[type.value](buffer=buffer)
  File "/usr/local/lib/python2.7/dist-packages/NBT-1.3-py2.7.egg/nbt/nbt.py", line 265, in __init__
    raise ValueError("No type specified for list")
ValueError: No type specified for list

[edit code formatting -- MacFreek]

/me bad

@twoolie I accidentally pushed my branches to your repositories. I deleted them within a minute, and no bad stuff happened. In case you got some odd messages (eg Travis failing for one of these branches): that's why.

Travis hangs on examplestests.MobAnalysisScriptTest

For some reason, Travis sometimes hangs on examplestests.MobAnalysisScriptTest:

See e.g.
https://travis-ci.org/twoolie/NBT/jobs/378978732
https://travis-ci.org/twoolie/NBT/jobs/378978735
https://travis-ci.org/macfreek/NBT/jobs/378973931

testAnvilWorld (examplestests.MobAnalysisScriptTest) ... 
No output has been received in the last 10m0s, this potentially indicates a stalled build or something wrong with the build itself.

In other runs, there is no problem, and I can't replicate this on my local machine either.

https://travis-ci.org/twoolie/NBT/jobs/378511443
https://travis-ci.org/twoolie/NBT/jobs/378976542
https://travis-ci.org/twoolie/NBT/jobs/378976537

Too many files open in world.iter_nbt (/w fix)

With a large world world.iter_nbt caused a too many files open exception since it never closes the region files. The simple fix is to add a few lines to close the region files in world.py at line 95 so it looks like this:

def iter_nbt(self):
    """
    Return an iterable list of all NBT. Use this function if you only
    want to loop through the chunks once, and don't need the block or data arrays.
    """
    # TODO: Implement BoundingBox
    # TODO: Implement sort order
    for region in self.iter_regions():
        for c in region.iter_chunks():
            yield c
        if hasattr(region.file,'fileobj') and region.file.fileobj: # <- added
            region.file.fileobj.close() # <- added
        region.file.close() # <- added

edit: Updated fix after encountering again in an even larger world.

Roadmap to NBT 2.0

@stumpylog @twoolie Hey Trenton, I see your active again, which is great. However, some of the changes may introduce some backward incompatibility. So we need to find a balance between fast progress (rapid prototyping) and robustness.

I propose the following:

I'll add some test cases to easily check if new features don't break old stuff. Detailed tests for each module are ideal, but for now, I'll settle for a travis test to run all examples with a sample world.
As soon as that's done and the first actual changes come up (and this seems the case with stumpylog's 07a7816 commit), I'll create a v2.x branch which should lead to NBT version 2.0.0.

Regular bug fixes and documentation enhancement should take place in the master branch, and should be ported to the v2.x branch (not the other way around please)

This can work, but requires some important coding hygiene:

Commit and make pull requests often, in particular for changes that affect a lot of code, like documentation changes.
Most development should happen against the master (1.x) branch, not the v2.x branch! This last one is important. Regular fixes and updates should (first) happen in the 1.x branch, and only commits with API changes should be in the v2.x branch, otherwise the two branches will diverge too much, slowing development in the long run.

To easy things, I've tagged all issues with API changes, and also added a 2.0 milestone.

Some items that affect lots of smaller parts n the code, like documentation changes and fixing of trailing whitespace are tedious with multiple branches, so I recommend to make these type of changes either now or wait till 2.0 is released.

Note that this is also the time to propose code restructuring. I personally think it is in a very good shape, but here are two suggestions in case someone likes to pick it up: clean-up of function names (e.g. a region.get_chunks does not return a chunk or nbt tag, but only the chunk coordinates; most of the function names in chunk.Chunk are still specific for block ids, without data ids.). Some changes to help speed things up (either by changing some API function so faster numpy functions can be used, or addition of caches).