Coder Social home page Coder Social logo

ciscodevnet / webex-teams-archiver Goto Github PK

View Code? Open in Web Editor NEW
22.0 20.0 11.0 2.54 MB

Simple utility to archive Webex Teams rooms

License: MIT License

Python 50.95% HTML 43.48% CSS 5.56%
cisco webex webex-teams python3 cisco-spark webex-teams-sdk

webex-teams-archiver's Introduction

Webex Teams Archiver

Simple utility to archive Webex Teams rooms

image

image

image


Webex Teams Archiver extracts the messages and files out of a Webex Teams room and saves them in text, HTML, and JSON formats.

Example

from webexteamsarchiver import WebexTeamsArchiver

personal_token = "mytoken"
archiver = WebexTeamsArchiver(personal_token)

# room id from https://developer.webex.com/docs/api/v1/rooms/list-rooms
room_id = "Y2lzY29zcGFyazovL3VzL1JPT00vd2ViZXh0ZWFtc2FyY2hpdmVy"
archiver.archive_room(room_id)

Produces the following files:

$ ls 
Title_Timestamp.tgz
Title_Timestamp

$ ls Title_Timestamp/
Title_Timestamp.html
Title_Timestamp.json
Title_Timestamp.txt
attachments/
avatars/
space_details.json

Below is an example of a simple room that got archived.

image

Note 1: The HTML version of the archive requires Internet connectivity because of the CSS, which is not packaged with the archive because of licensing conflicts.

Note 2: Please note that use of the Webex Teams Archiver may violate the retention policy, if any, applicable to your use of Webex Teams.

Installation

Installing and upgrading is easy:

Install via PIP

$ pip install webexteamsarchiver

Upgrading to the latest Version

$ pip install webexteamsarchiver --upgrade

Options

The archive_room method exposes the following options:

Argument Default Value Description
text_format True Create a text version of the archive
html_format True Create an HTML version of the archive
json_format True Create a JSON version of the archive

In addition, the options kwargs supports the following additional options today:

Argument Default Value Description
compress_folder True Compress archive folder
delete_folder False Delete the archive folder when done
reverse_order True Order messages by most recent on the bottom
download_attachments True Download attachments sent to the room
download_avatars True Download avatar images
download_workers 15 Number of download workers for downloading files
timestamp_format %Y-%m-%dT%H:%M:%S Timestamp strftime format
file_format gztar Archive file format

Questions, Support & Discussion

webexteamsarchiver is a community developed and community supported project. Feedback, thoughts, questions, issues can be submitted using the issues page.

Contribution

webexteamsarchiver is a community developed project. Code contributions are welcome via PRs!

Copyright (c) 2018-2021 Cisco and/or its affiliates.

webex-teams-archiver's People

Contributors

evelynbarquero avatar fdemello avatar jbogarin avatar turc42 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

webex-teams-archiver's Issues

Request time-out for a large room

I'm trying to archive a room of mine which is 2-3 years old and is HUGE

I get a timeout error, how can I increase the timeout?

room_id = "XXX"
archiver.archive_room(room_id)
Traceback (most recent call last):
File "", line 1, in
File "/ws/shmandal-sjc/pyats_32/lib/python3.6/site-packages/webexteamsarchiver/webexteamsarchiver.py", line 158, in archive_room
text_format, html_format, json_format, timestamp_format)
File "/ws/shmandal-sjc/pyats_32/lib/python3.6/site-packages/webexteamsarchiver/webexteamsarchiver.py", line 236, in _archive
file_metadata = self.file_details(url)
File "/ws/shmandal-sjc/pyats_32/lib/python3.6/site-packages/webexteamsarchiver/webexteamsarchiver.py", line 91, in file_details
r.raise_for_status()
File "/ws/shmandal-sjc/pyats_32/lib/python3.6/site-packages/requests/models.py", line 940, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 408 Client Error: Request Time-out for url: https://api.ciscospark.com/v1/contents/Y2lzY29zcGFyazovL3VzL0NPTlRFTlQvNzAxYzkxODAtNTI0ZC0xMWU4LThlZmQtYzc1YzdjY2Y1Y2M2LzA

Add option on how to deal with file meta data problems

When there's a problem with the file meta data, the archiver currently raises an exception. The user should be given an option of how to deal with these:

  1. Skip the file
  2. Give the file a random name
  3. Raise an exception

Add timestamp to archive file name

User feedback:

I was wondering whether we could include the timestamp in the file name
so that if there are multiple archives for the same space over time
people can easily identity the individual archive files

Handle when lastActivity is not present

Handle cases where lastActivity is not present.

 File "/webexteamsarchiver/templates/default.html", line 18, in top-level template code
    {% include "room_content.html" %}
  File "/webexteamsarchiver/templates/room_content.html", line 3, in top-level template code
    Created by <a href="mailto:{{ room_creator.emails[0] }}" alt="{{ room_creator.id }}">{{ room_creator.displayName }}</a> on {{ room.created|datetime_format(timestamp_format) }} and last had activity on {{ room.lastActivity|datetime_format(timestamp_format) }}.<br />
  File "/webexteamsarchiver/jinja_env.py", line 45, in datetime_format
    return date.strftime(format)
AttributeError: 'NoneType' object has no attribute 'strftime'

Inspect and clean attachment file names

For some files, Webex Teams saves the file name as the full path to the file and this breaks the archiver when it tries to save the file to local disk.

  File "/webexteamsarchiver/webexteamsarchiver.py", line 155, in archive_room
    text_format, html_format, json_format, timestamp_format)
  File "/webexteamsarchiver/webexteamsarchiver.py", line 234, in _archive
    self._download_files("attachments", attachments, download_workers)
  File "/webexteamsarchiver/webexteamsarchiver.py", line 360, in _download_files
    future.result()
 ...
  File "/webexteamsarchiver/webexteamsarchiver.py", line 372, in _download_file
    with open(os.path.join(os.getcwd(), self.archive_folder_name, folder_name, f"{filename}"), "wb") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/user/Title_Id/attachments/file:///C:\\Users\\user\\AppData\\Local\\Temp\\folder\\01\\clip_image001.png'

Handle scenarios where users have left Webex Teams

For users that have left Webex Teams, the API doesn't return any information including the e-mail. This needs to be handled better.

  File "/webexteamsarchiver/webexteamsarchiver.py", line 155, in archive_room
    text_format, html_format, json_format, timestamp_format)
  File "/webexteamsarchiver/webexteamsarchiver.py", line 215, in _archive
    "", "", "", sanitize_name(msg.personEmail), False)
  File "/webexteamsarchiver/jinja_env.py", line 49, in sanitize_name
    return re.sub('[^A-Za-z0-9]+', '_', email)
  File "/home/user/.pyenv/versions/3.6.1/lib/python3.6/re.py", line 191, in sub
    return _compile(pattern, flags).sub(repl, string, count)
TypeError: expected string or bytes-like object

Prevent HTML injection

When a message doesn't have the html field, the archiver inserts the text field in the HTML output. This can cause HTML injection if the text contains valid HTML.

Handle messages that don't have personId/personEmail

  File "/webexteamsarchiver/webexteamsarchiver.py", line 156, in archive_room
    text_format, html_format, json_format, timestamp_format)
  File "/webexteamsarchiver/webexteamsarchiver.py", line 202, in _archive
    people[msg.personEmail] = self.sdk.people.get(msg.personId)
  File "/webexteamssdk/api/people.py", line 205, in get
    check_type(personId, basestring, may_be_none=False)
  File "/webexteamssdk/utils.py", line 165, in check_type
    raise TypeError(error_message)
TypeError: We were expecting to receive an instance of one of the following types: 'basestring'; but instead we received None which is a 'NoneType'.

Example message:

    {
      "id": "<redacted>",
      "roomId": "<redacted>",
      "roomType": "direct",
      "text": "Woah",
      "created": "2014-11-18T16:13:11.971Z"
    }

The archiver needs to handle this properly.

Handle [Errno 36] File name too long

/webexteamsarchiver/webexteamsarchiver.py", line 426, in _download_file
    with open(os.path.join(os.getcwd(), self.archive_folder_name, folder_name, f"{filename}"), "wb") as f:
OSError: [Errno 36] File name too long: '/path/to/attachments/[FILENAME].pdf'

Archiver needs to have its own CSS

The archiver currently links to the Webex Teams CSS because we were not able to license the CSS to be included in this project. This creates a few problems:

  • When the Webex Teams CSS changes, and it changes quite frequently, it may break current archives.
  • It prevents the archive from being a true offline copy.
  • It may require changes to the jinja templates.
  • It requires updating teams.css, although this process is currently automated.

Developing our own CSS, even a very simple one, would solve all of these problems.

Requirements:

  • The CSS should try to mimic, as much as possible, the look and feel of Webex Teams web.
  • It should contain classes that cover special classes used in Webex Teams messages (e.g. spark-mention).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.