Hi, developer of Bibliogram here!
For a couple of months now my Bibliogram request logs have contained listings for rather strange URLs, and I've finally figured out what's causing them — your extension. There's a few things to explain here, so I'll start at the start to make sure we're all on the same page. There's a summary at the bottom of this page if you're really cool and already know everything I'm about to say.
Usernames on Instagram are on the top level, for example, instagram.com/radionewzealand
. The username here is radionewzealand
. This puts them on the same level as other Instagram pages which are not usernames, like /explore
, /accounts
, /graphql
, and /embeds.js
. Therefore, if we see a URL like /privacy
, we don't know if we'll visit Instagram's privacy page, or someone with the username privacy
.
On Bibliogram I decided that I didn't want to deal with this problem, since I of course would like to have my own paths like /imageproxy
without having to worry if there is a person on Instagram with the username imageproxy
whose profile would then become inaccessible. So, I put all users onto the path /u/{username}
, so that any person on Instagram can be visited without me having to worry. Posts are still on /p/
, and I expect that when I implement the explore feature it will be on /explore/
too.
However, people making redirect extensions like yours do actually have to deal with this confusion when deciding whether or not to rewrite a request on instagram.com. It would be rather bad if someone was POSTing their login credentials to Instagram to try to log in there, but some extension rewrote this and sent them to a random person's Bibliogram instance.
Thus, to try to help extension authors, I have written this reference of all of Instagram's reserved URLs that I've found so far (there may be more that I don't know about). https://github.com/cloudrac3r/bibliogram/wiki/Reserved-URLs
Here are the actual strange requests I've received that I spoke about at the start, that you need to not rewrite, however there are very likely more that are currently being rewritten that I just haven't noticed yet:
/embed.js
(www.instagram.com) has been rewritten to /u/embed.js
(Bibliogram)
/en_US/embeds.js
(platform.instagram.com) has been rewritten to /u/en_US/embeds.js
(Bibliogram)
/accounts/confirm_email/redacted/redacted
(domain unknown) has been rewritten to /u/accounts/confirm_email/redacted/redacted
(Bibliogram). Yes, really. The text redacted
used to be a base64 string that was probably private data, but I decided not to examine it. I don't know if it was a GET or POST.
As part of this, I noticed that you were redirecting the secondary request to platform.instagram.com/en_US/embeds.js to Bibliogram. This is a script that is required on another site to display Instagram's post embeds — see here for one example. platform.instagram.com is reserved for their tracking and embedding code to be run on 3rd-party websites. It's probably best that you don't rewrite requests on this domain at all, unless you really want random instance owners to be able to XSS any website with an Instagram embed. Bibliogram does not support external embeds at the moment, and it's not a high priority feature, but I can let you know in the future if you should change this behaviour.
Somewhat related, looking at the code, I see you have some sort of expression to match /imageproxy
, /u
, and /static
. These look like Bibliogram endpoints. Can you explain what this does and why it's necessary?
Summary
- Add a better filter of requests to not rewrite, perhaps using the list on the Bibliogram wiki as a base.
- Consider not rewriting things on the domain platform.instagram.com, or things loaded as 3rd-party scripts.
- Bibliogram has no embeds feature, but I can let you know if this changes.