Coder Social home page Coder Social logo

mnemocron / telegramchatstats Goto Github PK

View Code? Open in Web Editor NEW
48.0 3.0 13.0 116 KB

Generate some statistics and plots from your exported Telegram chat data (using Bokeh plots with python 3)

License: MIT License

Python 99.74% Shell 0.26%
telegram statistics graphing chat-statistics bokehplots

telegramchatstats's Introduction

Telegram Chat Statistics


forthebadge made-with-python

volkswagen status

Generate graphs and statistics from your exported Telegram messages.

Examples

images/emojis

image/months

image/hours

image/weekday


Usage

First you need to export your Telegram data to a result.json file. You can do this in the settings of the Telegram desktop client.

./telegram-statistics.py -i result.json -n "name"

Open the file result_2019-05-30.json and parse the chat history with Name Surname starting from 2018-01-01 up to now and generate the substring plot for the emojis "๐Ÿ˜˜๐Ÿ’—๐Ÿ’™๐Ÿ’“๐Ÿงก๐Ÿ˜˜๐Ÿ’•๐Ÿ˜š๐Ÿ˜๐Ÿฅฐ"

./telegram-statistics.py -i ../result_2019-05-30.json -n "Name Surname" -d 2018-01-01 -w "๐Ÿ˜˜;๐Ÿ’—;๐Ÿ’™;๐Ÿ’“;๐Ÿงก;๐Ÿ˜˜;๐Ÿ’•;๐Ÿ˜š;๐Ÿ˜;๐Ÿฅฐ"

Import Whatsapp

There is a convert-whatsapp.py to import a whatsapp exported Whatsapp Chat with Name.txt into a Telegram style json format. To find the correct [Name Surname] take the name in the first line in the Whatsapp export txt. However, the Whatsapp export is not as detailed as the Telegram export, so many numbers cannot be calculated.

./convert-whatsapp.py -i "Whatsapp Chat with Name.txt"
./telegram-statistics -i whatsapp-result.json -n "Name Surname"

Where "name" is the name displayed in Telegram (usually the surname).

Generated Files

The script generates multiple files.

  • emojis.txt contains unicode encoded emojis and their count
  • raw_metrics.json raw numerical data (contains all text of both persons / large file)

HTML Files (Plots):

  • plot_hours.html bokeh plot of message frequency over the hours of one day
  • plot_month.html bokeh plot of number of messages sent per month
  • plot_month_characters.html bokeh plot of characters sent per month
  • plot_weekdays.html bokeh plot of message frequency over one week
  • plot_month_calls.html bokeh plot of number of calls per month
  • plot_month_call_time.html bokeh plot of total seconds on call per month
  • plot_month_photos.html bokeh plot of number of photos sent per month
  • plot_month_replytime.html bokeh plot of average monthly replytime (Beta)
  • plot_month_word_occurrence.html bokeh plot of combined substring occurences over time

Raw Files (one for each person):

  • raw_months_person_Person A.csv csv vaues of month data
  • raw_weekdays_person_Person A.csv csv vaues of weekday data
  • raw_months_chars_person_Person A.csv csv vaues of monthly character count data
  • raw_monthly_pictures_person_Person A.csv csv vaues of monthly picture count data
  • raw_monthly_calls_person_Person A.csv csv vaues of monthly number of calls
  • raw_monthly_call_duration_person_Person A.csv csv values of monthly call duration
  • raw_monthly_time_to_reply_person_Person A.csv csv vaues of monthly reply time

Metrics

per chat

  • total number of messages
  • total number of words
  • total number of characters
  • count occurrence of each word
  • number of unique words

per person

  • total number of messages
  • total number of words
  • total number of characters
  • average number of words per message
  • average number of characters per message
  • count occurrence of each word
  • count occurrence of each emoji
  • number of messages formated with markdown
  • number of messages of type [animation, audio_file, sticker, video_message, voice_message]
  • number of photos
  • number of unique words

Requirements

  • python 3
  • bokeh
  • numpy
  • pandas

Contributing

I was inspired to do this project by a post on reddit.com/r/LongDistance

I would love to hear if you have made some statistics yourself. Feel free to message me on reddit

If you want to implement new metrics feel free to fork and send a pull request. Here are some things that I think could be improved or added:

  • normalize weekly / hourly data to "average number" per day/hour instead of "total number"
  • number of edited messages

Possible Issues

  • csv separator is currently a semicolon ;
  • other country specific errors (eg. with dates)

License

MIT License

Copyright (c) 2018 Simon Burkhardt

telegramchatstats's People

Contributors

cycatz avatar katzenbiber avatar mnemocron avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

telegramchatstats's Issues

Reverse conversion: from Telegram to Whatsapp

Hey I love your conversion feature from Whatsapp txt to Telegram json, do you think it could be possible to do the opposite thing? It would be very useful since we can now import Whatsapp chats into Telegram and I'd like to reimport some Telegram chats into Telegram

Please document the type of input that this script expects

Specifically, it consumes the file which is the result of wholesale export of data from Telegram Desktop. I've fed it an export of a single chat, and it choke repeatedly, expecting different file structure.

Of course it'd be even better if you can have the script automatically detect which export it's dealing with - the single chat export is much faster to generate.

Thanks!

KeyError: 'legend'

Hi,

I get the following error:

Traceback (most recent call last):
  File "C:\Users\yshub\Documents\GitHub\TelegramChatStats\telegram-statistics.py", line 267, in <module>
    main()
  File "C:\Users\yshub\Documents\GitHub\TelegramChatStats\telegram-statistics.py", line 211, in main
    raw = calculate_graphs(chat_data, date_filter, wordlist)
  File "C:\Users\yshub\Documents\GitHub\TelegramChatStats\telegram-statistics.py", line 149, in calculate_graphs
    return _message_graphs(chat_data, date_filter, wordlist)
  File "C:\Users\yshub\Documents\GitHub\TelegramChatStats\_message_graphs.py", line 278, in _message_graphs
    histogram_month(
  File "C:\Users\yshub\Documents\GitHub\TelegramChatStats\_message_graphs.py", line 435, in histogram_month
    fig.vbar(
  File "C:\Users\yshub\miniconda3\lib\site-packages\bokeh\plotting\_decorators.py", line 87, in wrapped
    return create_renderer(glyphclass, self.plot, **kwargs)
  File "C:\Users\yshub\miniconda3\lib\site-packages\bokeh\plotting\_renderer.py", line 133, in create_renderer
    update_legend(plot, legend_kwarg, glyph_renderer)
  File "C:\Users\yshub\miniconda3\lib\site-packages\bokeh\plotting\_legends.py", line 57, in update_legend
    _LEGEND_KWARG_HANDLERS[kwarg](value, legend, glyph_renderer)
KeyError: 'legend'

If I'm not mistaken, it might have to do with the current Bokeh version?
Any help is highly appreciated! And thanks in advance :)

KeyError: 'from'

Hi, I've followed the instruction, but get the following error message, could you please point to possible solution?

importing raw data...
input data is a single chat export
calculating metrics...
Traceback (most recent call last):
  File "telegram-statistics.py", line 267, in <module>
    main()
  File "telegram-statistics.py", line 209, in main
    calculate_metrics(chat_data, date_filter)
  File "telegram-statistics.py", line 137, in calculate_metrics
    metrics = _message_numerics(chat_data, date_filter)
  File "/work/_message_numerics.py", line 24, in _message_numerics
    metrics["A"]["name"] = chat["messages"][1]["from"]
KeyError: 'from'

TypeError: argument of type 'NoneType' is not iterable

python3 telegram-statistics.py -i result.json
importing raw data...
input data is a single chat export
calculating metrics...
Traceback (most recent call last):
  File "/Users/exey/Desktop/TelegramChatStats/telegram-statistics.py", line 196, in <module>
    main()
  File "/Users/exey/Desktop/TelegramChatStats/telegram-statistics.py", line 175, in main
    calculate_metrics(chat_data, date_filter)
  File "/Users/exey/Desktop/TelegramChatStats/telegram-statistics.py", line 107, in calculate_metrics
    metrics = _message_numerics(chat_data, date_filter)
  File "/Users/exey/Desktop/TelegramChatStats/_message_numerics.py", line 29, in _message_numerics
    if metrics['A']['name'] in message['from']:
TypeError: argument of type 'NoneType' is not iterable```

Whatsapp date formats suck

German: dd.mm.yy vs American mm/dd/yy
The exported text files from Whatsapp are very inconsistent on which date format they use.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.