Comments (7)
I've been thinking more about this lately and the setup I imagine is still that we have the external interface be the URLs to scan queue. The GStreamer scanning bit should be reduced a bit in scope so we only produce the tags and the duration. This result gets passed to the next queue without getting converted to a track, this format should match whatever the tags changed event emits.
The results in the post-processing queue should have per URI type annotations run, an example of this would be last modified for file://
URIs. This would also have the potential for catching and converting the playlists meaning mopidy/mopidy#701 becomes obsolete.
At the end of the post processing we should either have an other queue, or just emit the resulting track/playlist/... From this point the consumers of the scanner data become relevant again and take over either returning the metadata we found for a stream lookup or adding the result to the library.
For the GStreamer bit we should consider getting rid of the manual bus handling and just leave that to the GObject loop. This way we can scale number or GStreamer scanners without throwing more Python threads at the problem (GStreamer will still have it's own internal threads though).
Note that there are however two/three current use cases. Local scanning done "offline", the planed in process local scanning and finally metadata lookup for streams. This is important to note as the turnaround time for the stream case is much tighter than the others. As such there is a fair chance we should make the queues priority queues, or have two of them with different service levels, ensuring that a running local scan doesn't block the stream metadata lookup or otherwise consume to much resources.
For this I'm also assuming that we have single scanner pipeline running as part of audio/core which local and others are allowed to use.
For the batch scanning case we don't really care about when we get our answers out, while for the stream case we do. So an other idea that just popped into my head while writing this is to have "scan sessions" each session has a priority and a way to add tracks and then also get the results out. For the stream case we simply create a session, give it the one URI, get our result and then close the session. For the batch scanning we create a session, feed it with URIs to scan as we find them (might be slow due to network etc) and process results as we get them and then when we've found the last URI we want to scan we tell the session, at which point we can join the queues. Of course this assumes a batch oriented mind set, and for in-process a streaming continual approach would be nicer IMO.
Hopefully some of this still makes sense, as this became a bit more of braindump than I had planed.
from mopidy-local.
Makes sense to me :-)
from mopidy-local.
The current state has also caused out of memory issues for at least one user trying to scan 100k songs over SMB. In this case it was a raspi running out of memory already at the finder stage.
from mopidy-local.
I've written for other project a scanner actor:
I use Discoverer from gst.pbutils instead of python one
(i've used code from another source)
uri.py
# -*- coding: utf-8 -*-
from __future__ import division, print_function, absolute_import
def path2uri(path):
"""
Return a valid uri (file scheme) from absolute path name of a file
>>> path2uri('/home/john/my_file.wav')
'file:///home/john/my_file.wav'
>>> path2uri('C:\Windows\my_file.wav')
'file:///C%3A%5CWindows%5Cmy_file.wav'
"""
import urlparse
import urllib
return urlparse.urljoin('file:', urllib.pathname2url(path))
def source_info(source):
import os.path
src_info = {'is_file': False,
'uri': '',
'pathname': ''}
if os.path.exists(source):
src_info['is_file'] = True
# get the absolute path
src_info['pathname'] = os.path.abspath(source)
# and make a uri of it
src_info['uri'] = path2uri(src_info['pathname'])
return src_info
def get_uri(source):
"""
Check a media source as a valid file or uri and return the proper uri
"""
import gst
src_info = source_info(source)
if src_info['is_file']: # Is this a file?
return get_uri(src_info['uri'])
elif gst.uri_is_valid(source): # Is this a valid URI source for Gstreamer
uri_protocol = gst.uri_get_protocol(source)
if gst.uri_protocol_is_supported(gst.URI_SRC, uri_protocol):
return source
else:
raise IOError('Invalid URI source for Gstreamer')
else:
raise IOError('Failed getting uri for path %s: no such file' % source)
def get_media_uri_info(uri, timeout=5):
from gst.pbutils import Discoverer
from gst import SECOND as GST_SECOND, uri_get_protocol
from itelbase.utils.ping import get_ipaddr_async
GST_DISCOVER_TIMEOUT = timeout * GST_SECOND
uri_discoverer = Discoverer(GST_DISCOVER_TIMEOUT)
info = dict()
try:
clean_uri = get_uri(uri)
except IOError as e:
return {'uri': uri, 'result': 'GST_URI_ERROR', 'error-string': str(e)}
# check if dns is working
if uri_get_protocol(clean_uri) != 'file':
ipaddr = get_ipaddr_async(uri, timeout=1)
if not ipaddr:
return {'uri': uri, 'result': 'GST_DISCOVERER_ERROR', 'error-string': 'DNS ERROR'}
try:
uri_info = uri_discoverer.discover_uri(clean_uri)
except Exception as e:
# raise IOError(e)
return {'uri': uri, 'result': 'GST_DISCOVERER_ERROR', 'error-string': str(e)}
info['uri'] = uri_info.get_uri()
info['result'] = uri_info.get_result().value_name
# info['duration'] = uri_info.get_duration() / GST_SECOND # Duration in seconds
info['seekable'] = uri_info.get_seekable()
# info['stream-info'] = uri_info.get_stream_info()
# info['container-streams'] = uri_info.get_container_streams()
# info['streams'] = uri_info.get_streams()
# info['misc'] = uri_info.get_misc()
# info['tags'] = uri_info.get_tags()
# info['video'] = uri_info.get_video_streams()
audio_streams = uri_info.get_audio_streams()
info['streams'] = []
for stream in audio_streams:
stream_info = {
'type': stream.get_stream_type_nick(),
'bitrate': stream.get_bitrate(),
'channels': stream.get_channels(),
'depth': stream.get_depth(),
'samplerate': stream.get_sample_rate(),
# 'max_bitrate': stream.get_max_bitrate(),
}
info['streams'].append(stream_info)
return info
and a Resolver Actor
class Resolver(pykka.ThreadingActor):
def __init__(self, timeout, done_function):
super(Resolver, self).__init__()
self._done_function = done_function
self._default_timeout = timeout
def scan(self, element):
def _is_good(x):
if isinstance(x, dict) and 'result' in x and x['result'] == 'GST_DISCOVERER_OK':
return UriStatus.GOOD
return UriStatus.BAD
if element is not None:
# logger.info('Resolve element=%s', element)
uri = element['uri']
response = uri_mod.get_media_uri_info(uri, self._default_timeout)
result = urifactory(index=element['index'],
uri=response['uri'],
status=_is_good(response)) # info=response)
else:
result = None
# logger.info('Resolve finished result=%s', result)
self._done_function(result)
that I feed from another actor, it try to discover an Uri and call another function on completion
(I don't know if there's a better method)
from mopidy-local.
Thanks for the suggestion, we've looked into this before and at the time speed was the main reason for not using the built in one. Downside of course being having to reinvent the wheel and discovering problems already solved upstream (such a srcsinks with dynamic pads).
If this can be shown to run with acceptable speed we should probably switch.
On a side note we've also talked about splitting mopidy-local out to it's own extension instead of bundling it. This would probably also cover killing off mopidy-local-json and merging mopidy-local-sqlite into the new mopidy-local. In which case it would be up to who ever maintains that new extension to figure out what is best, and we can keep doing our own thing in core as we see fit :-)
from mopidy-local.
This resolver class could be implemented as a pool of services, if speed is crucial but for my application is faster than mopidy default and doesn't block the base class waiting for resolver to start
the only problem could be if one checker dies and do not call the done function (I need in pykka a system to launch something if thread dies....)
from mopidy-local.
Just to update this, if someone were to do some benchmarks comparing recent versions of GST's Discoverer
to our solution then we can make a decision on this. It may have been slower in 2015/2016 (GST 0.10?) but 3 years later that may no longer be the case.
from mopidy-local.
Related Issues (20)
- UTF-8 error with local extension HOT 3
- Always provide tags from Mopidy-Local if the track is indexed in the database
- TypeError: '<' not supported between instances of 'NoneType' and 'int' HOT 2
- Concept of 'local/directories'? HOT 3
- Prefer embed album art to folder album art HOT 3
- Support MusicBrainz ID queries
- get_distinct does not support field 'track'
- where is the SQLite database saved HOT 4
- Browsing directories by type 'date' returns incorrect names when format is not YYYY-MM-DD HOT 2
- Limit of searchresults are hardcoded HOT 5
- Multiple Music Directories HOT 1
- Doesn't Add any Music to Library HOT 6
- Compiling for nanoPi M4 (AArch64) – How? HOT 3
- Artist sort order HOT 7
- Not extracting images from m4a files when scanning HOT 11
- Not load from global config file HOT 1
- Prompt for confirmation when local library is empty HOT 5
- 'mopidy local scan' is using the wrong config file HOT 1
- Folder directory HOT 1
- Multiple filtering criteria when browsing
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from mopidy-local.