v12mike / fetch-external-images Goto Github PK
View Code? Open in Web Editor NEWScripts to catalogue and fetch phpBB in-line images that are externally hosted
Scripts to catalogue and fetch phpBB in-line images that are externally hosted
Output is when running the php download is :
status OK 200 for 995 : http://i32.photobucket.com/albums/d24/hamiltw/1908packwood.jpg
But there is nothing in the /images/ext folder?
extract_external_links.php
img src example: https://www.bing.com/th?id=ON.39C32F43EC4B01E928A7C0637186511F&pid=News&h=367&w=700
(phpBB 3.2 format)
will not be extracted by regex:
"if (preg_match_all('~<img src="(http[^\/]+?//([^\/]+)?/.+?[^\.]+?.([a-z]+?))">~i', $post_text, $matches))"
I've a lot more of these "dynamic image" urls which are not cached/harvested locally because they are not fetched, because this regex will fail.
I tried to modify it but later it requires tables filled like link and ext. Its a valid image within src="" or tag [img]src[/src].
Can you improve that the fetcher will really get all images between tags/srces - while ignoring the extensions and url format? That way it would support about 10-20% more images on my board which are currently not fetchable because of specified links. Maybe get rid of $link and $ext completely and just regex * between src="" and [img]src[/img] tags?
The most recent version all external images, not just those hosted on PhotoBucket. Yet the description of extract_external_links.php still mentions PhotoBucket:
/**
I've had success in using this script on various VPS's with phpBB installed, but sadly I can't get it to work on a virtual shared server. When running the download script on a shared server it produces the following output (limited to 1 MAXIMUM_FILES_TO_FETCH
):
[phpBB Debug] PHP Warning: in file [ROOT]/scripts/download_external_images.php on line 107: curl_setopt(): CURLOPT_FOLLOWLOCATION cannot be activated when an open_basedir is set
status 0 FAIL : http://i60.photobucket.com/<rest_of_url>.jpg
I believe it's common practice to set an open_basedir on shared servers, so this probably affects more servers then just the one I'm trying to fix now. Removing line 107 or setting it to 0 instead of 1 doesn't work either, since the script then can't fetch the image from the 3rd party hosting site.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.