Coder Social home page Coder Social logo

Comments (13)

nmoya avatar nmoya commented on August 17, 2024

Hello @nalbanders !

Thanks for your interest. I don't have a log file right now. Could you please refresh my memory and check if a log file contains a timestamp of the time that a message was sent/received?

If so, a first step should be to parse this string as a timestamp structure in python. Python provides several libraries to work with date and time.

You can have a glimpse of what can be done with other file of mine here: https://github.com/nmoya/glaucobot/blob/master/glaucobot/datelib.py

My best suggestion is that you should not perform manual calculations over timestamps. Always use a well tested library to work with date and time.

I am interested in working together to add this feature if you like.

Cheers,

PS. Also, if you are getting started with computing, check this video: https://www.youtube.com/watch?v=-5wpm-gesOY

from whatsapp-parser.

nalbanders avatar nalbanders commented on August 17, 2024

Hi Nikolas,

Thanks for the reply. I will try to work it in as you suggest.

It's actually a fun project I am working on which your script is helping a
lot. I am a student at MIT. I'd be happy to have a call and discuss the
project and see if you'd have any interest working together. I am doing two
different studies interpreting relationships based on conversation data.
Always looking to connect with people who are interested and who are
skilled like yourself.

Here is my LinkedIn profile
https://www.linkedin.com/pub/armen-nalband/13/65/656

On Fri, Apr 24, 2015 at 3:37 PM, Nikolas Moya [email protected]
wrote:

Hello @nalbanders https://github.com/nalbanders !

Thanks for your interest. I don't have a log file right now. Could you
please refresh my memory and check if a log file contains a timestamp of
the time that a message was sent/received?

If so, a first step should be to parse this string as a timestamp
structure in python. Python provides several libraries to work with date
and time.

You can have a glimpse of what can be done with other file of mine here:
https://github.com/nmoya/glaucobot/blob/master/glaucobot/datelib.py

My best suggestion is that you should not perform manual calculations over
timestamps. Always use a well tested library to work with date and time.

I am interested in working together to add this feature if you like.

Cheers,

PS. Also, if you are getting started with computing, check this video:
https://www.youtube.com/watch?v=-5wpm-gesOY


Reply to this email directly or view it on GitHub
#2 (comment).

from whatsapp-parser.

nmoya avatar nmoya commented on August 17, 2024

Hello @nalbanders ,

Sure! Let's schedule a call and discuss more about the project. Are you available on Wednesday? My Skype/Hangout is nikolasmoya.

I also sent a connect invitation on Linkedin.

from whatsapp-parser.

nalbanders avatar nalbanders commented on August 17, 2024

Great, how about Wednesday at 5:30 EST? I am in Boston, what city are you?
On Apr 27, 2015 2:09 PM, "Nikolas Moya" [email protected] wrote:

Hello @nalbanders https://github.com/nalbanders ,

Sure! Let's schedule a call and discuss more about the project. Are you
available on Wednesday? My Skype/Hangout is nikolasmoya.

I also sent a connect invitation on Linkedin.


Reply to this email directly or view it on GitHub
#2 (comment).

from whatsapp-parser.

nmoya avatar nmoya commented on August 17, 2024

I am in Curitiba (BRT). Let's try a little bit later, like, after work, how about [17, ..., 21h] EST?

from whatsapp-parser.

nalbanders avatar nalbanders commented on August 17, 2024

Sorry, I meant 5:30PM (17:30). I believe you are one hour ahead so that
would be 18:30 your time. Does that work?

If not we can do 6:30PM EST (18:30)

On Mon, Apr 27, 2015 at 5:22 PM, Nikolas Moya [email protected]
wrote:

I am in Curitiba (BRT). Let's try a little bit later, like, after work,
how about [17, ..., 21h] EST?


Reply to this email directly or view it on GitHub
#2 (comment).

from whatsapp-parser.

nmoya avatar nmoya commented on August 17, 2024

Oh, alright then. 5:30 PM EST is great for me.

from whatsapp-parser.

nalbanders avatar nalbanders commented on August 17, 2024

Ok, I will call you then tomorrow on Skype. I just sent you a contact
request.

Looking forward to it,
Armen

On Mon, Apr 27, 2015 at 5:33 PM, Nikolas Moya [email protected]
wrote:

Oh, alright then. 5:30 PM EST is great for me.


Reply to this email directly or view it on GitHub
#2 (comment).

from whatsapp-parser.

nalbanders avatar nalbanders commented on August 17, 2024

Wednesday*

On Mon, Apr 27, 2015 at 5:35 PM, Armen Nalband [email protected] wrote:

Ok, I will call you then tomorrow on Skype. I just sent you a contact
request.

Looking forward to it,
Armen

On Mon, Apr 27, 2015 at 5:33 PM, Nikolas Moya [email protected]
wrote:

Oh, alright then. 5:30 PM EST is great for me.


Reply to this email directly or view it on GitHub
#2 (comment)
.

from whatsapp-parser.

nalbanders avatar nalbanders commented on August 17, 2024

Hey, some context for our call tomorrow

Attached the main.py file that I modified

My goal is to be able to understand the relationships of the user based on
communication data (be able to predict who they care about most/least) Here
are some graphs I generated with the script. Attached is an output csv I am
building that I will use to do regression analysis (logistic, CART, Random
Forest) in R.
[image: Inline image 3]
[image: Inline image 1]

On a separate note, I have other development projects going on, always open
for skilled people like yourself to get involved if you find yourself
interested.

Here is a wireframe of an app I am creating. We can chat about it
separately.
https://www.justinmind.com/usernote/tests/14265484/14740364/14740366/index.html

On Mon, Apr 27, 2015 at 5:36 PM, Armen Nalband [email protected] wrote:

Wednesday*

On Mon, Apr 27, 2015 at 5:35 PM, Armen Nalband [email protected] wrote:

Ok, I will call you then tomorrow on Skype. I just sent you a contact
request.

Looking forward to it,
Armen

On Mon, Apr 27, 2015 at 5:33 PM, Nikolas Moya [email protected]
wrote:

Oh, alright then. 5:30 PM EST is great for me.


Reply to this email directly or view it on GitHub
#2 (comment)
.

from whatsapp-parser.

nalbanders avatar nalbanders commented on August 17, 2024

Attachment

On Wed, Apr 29, 2015 at 1:19 AM, Armen Nalband [email protected] wrote:

Hey, some context for our call tomorrow

Attached the main.py file that I modified

My goal is to be able to understand the relationships of the user based on
communication data (be able to predict who they care about most/least) Here
are some graphs I generated with the script. Attached is an output csv I am
building that I will use to do regression analysis (logistic, CART, Random
Forest) in R.
[image: Inline image 3]
[image: Inline image 1]

On a separate note, I have other development projects going on, always
open for skilled people like yourself to get involved if you find yourself
interested.

Here is a wireframe of an app I am creating. We can chat about it
separately.

https://www.justinmind.com/usernote/tests/14265484/14740364/14740366/index.html

On Mon, Apr 27, 2015 at 5:36 PM, Armen Nalband [email protected] wrote:

Wednesday*

On Mon, Apr 27, 2015 at 5:35 PM, Armen Nalband [email protected] wrote:

Ok, I will call you then tomorrow on Skype. I just sent you a contact
request.

Looking forward to it,
Armen

On Mon, Apr 27, 2015 at 5:33 PM, Nikolas Moya [email protected]
wrote:

Oh, alright then. 5:30 PM EST is great for me.


Reply to this email directly or view it on GitHub
#2 (comment)
.

from future import division
from datetime import datetime
import codecs
import date
import re
import operator
import sys
import json
import csv
#import numpy
from pprint import pprint

class Chat():
def init(self, filename):
self.filename = filename
self.raw_messages = []

    self.datelist = []
    self.timelist = []
    self.senderlist = []
    self.messagelist = []
    self.chatTimeList = []
    self.rootResponseTimeList = []
    self.contactResponseTimeList = []
    self.rootBurstList = []
    self.contactBurstList = []
    #self.responseTimeList.append(0)

def open_file(self):
    arq = codecs.open(self.filename, "r", "utf-8-sig")
    content = arq.read()
    arq.close()
    lines = content.split("\n")
    lines = [l for l in lines if len(l) != 1]
    for l in lines:
        self.raw_messages.append(l.encode("utf-8"))

def feed_lists(self):
    for l in self.raw_messages:
        msg_date, sep, msg = l.partition(": ")
        raw_date, sep, time = msg_date.partition(" ")
        sender, sep, message = msg.partition(": ")
        #print ("\n\n\nRAW: ")
        #print (raw_date)
        raw_date = raw_date.replace(",", "")
        #print (raw_date)
        #print ("\n\n\n")
        if message:
            self.datelist.append(raw_date) 
            self.timelist.append(time) #here is the time object; save it              
            colonIndex = [x.start() for x in re.finditer(':', l)]
            #print ind
            chatTimeString = l[0:colonIndex[2]] #grab the characters that make up the date and time (Everthing until the third colon
            chatTime = datetime.strptime(chatTimeString, "%m/%d/%y, %I:%M:%S %p") #convert to a data object, format of the whatsapp data 8/2/14, 12:59:24 PM
            self.chatTimeList.append(chatTime)                               
            self.senderlist.append(sender)
            self.messagelist.append(message)
        else:
            self.messagelist.append(l)
    t0=self.chatTimeList[0]
    senderIndex=0;
    burstCount=1; #variable to count the number of messages in a row sent by sender

    rootName = "ROOT"
    contactName = "CONTACT"

    for t1 in self.chatTimeList[1:]: #perform the operations that are dependant on multiple messages (response time, bursts)
        dt = t1-t0
        if self.senderlist[senderIndex] != self.senderlist[senderIndex-1]: #is sender the same as the last message?
            #sender changed, store the burst count and reset 
            print("sender changed: %s") %(self.senderlist[senderIndex])
            print("response time: %d\n" %(dt.seconds) )
            if self.senderlist[senderIndex] == rootName:    #is sender the root?
                self.rootBurstList.append(burstCount)
                self.rootResponseTimeList.append(dt.seconds)                    
            elif self.senderlist[senderIndex] == contactName: #is sender the contact?
                self.contactBurstList.append(burstCount)
                self.contactResponseTimeList.append(dt.seconds)
            else:   
                sys.exit("ERROR CHANGE NAMES IN CHAT TO ROOT AND CONTACT\n")                    
            burstCount = 1  

            #save 

        else:
            burstCount+=1 #accumulate the number of messages sent in a row  
            print"repeat sender: %d %s\n" %(burstCount, self.senderlist[senderIndex])


        #self.responseTimeList.append(dt.seconds)
        t0 = t1            
        senderIndex+=1


def print_history(self, end=0):
    if end == 0:
        end = len(self.messagelist)
    for i in range(len(self.messagelist[:end])):
        print self.datelist[i], self.timelist[i],\
            self.senderlist[i], self.messagelist[i]

def get_senders(self):
    senders_set = set(self.senderlist)
    return [e for e in senders_set]

def count_messages_per_weekday(self):
    counter = dict()
    for i in range(len(self.datelist)):
        month, day, year = self.datelist[i].split("/") #AN edited date order
        parsed_date = "%s-%s-%s" % (year, month, day)
        #print ("DATE: ")
        #print (parsed_date)
        #print ("\n\n")
        weekday = date.date_to_weekday(parsed_date)
        if weekday not in counter:
            counter[weekday] = 1
        else:
            counter[weekday] += 1
    return counter

def count_messages_per_shift(self):
    shifts = {
        "latenight": 0,
        "morning": 0,
        "afternoon": 0,
        "evening": 0
    }
    for i in range(len(self.timelist)):
        hour = int(self.timelist[i].split(":")[0])
        if hour >= 0 and hour <= 6:
            shifts["latenight"] += 1

        elif hour > 6 and hour <= 11:
            shifts["morning"] += 1

        elif hour > 11 and hour <= 17:
            shifts["afternoon"] += 1

        elif hour > 17 and hour <= 23:
            shifts["evening"] += 1
    return shifts

def count_messages_pattern(self, patternlist):
    counters = dict()
    pattern_dict = dict()
    senders = self.get_senders()
    for pattern in patternlist:
        counters[pattern] = dict()
        for s in senders:
            counters[pattern][s] = 0
        pattern_dict[pattern] = re.compile(re.escape(pattern), re.I) #re=regular expression, .I = ignore case, .compile = convert to object 
    for i in range(len(self.messagelist)):
        for pattern in patternlist:
            search_result = pattern_dict[pattern].\
                findall(self.messagelist[i])
            length = len(search_result)
            if length > 0:
                if pattern not in counters:
                    counters[pattern][self.senderlist[i]] = length
                else:
                    counters[pattern][self.senderlist[i]] += length
    return counters

def print_patterns_dict(self, pattern_dict):
    for pattern in pattern_dict:
        print pattern
        for s in pattern_dict[pattern]:
            print s, ": ", pattern_dict[pattern][s]
        print ""

def message_proportions(self):
    senders = self.get_senders()
    counter = dict()
    total = 0
    for i in ["messages", "words", "chars", "qmarks", "media"]:
        counter[i] = dict()
        for s in senders:
            counter[i][s] = 0
    for i in range(len(self.senderlist)):
        counter["messages"][self.senderlist[i]] += 1
        counter["words"][self.senderlist[i]] += \
            len(self.messagelist[i].split(" "))
        counter["chars"][self.senderlist[i]] += len(self.messagelist[i])
        counter["qmarks"][self.senderlist[i]] += self.messagelist[i].count('?')
        counter["media"][self.senderlist[i]] += (self.messagelist[i].count('<media omitted>')+self.messagelist[i].count('<image omitted>')+self.messagelist[i].count('<audio omitted>'))
        total += 1
    counter["total_messages"] = 0
    counter["total_words"] = 0
    counter["total_chars"] = 0
    counter["total_qmarks"] = 0
    counter["total_media"] = 0

    for s in senders:
        counter["total_messages"] += counter["messages"][s]
        counter["total_words"] += counter["words"][s]
        counter["total_chars"] += counter["chars"][s]
        counter["total_qmarks"] += counter["qmarks"][s]
        counter["total_media"] += counter["media"][s]
    return counter

def average_message_length(self):
    msg_prop = self.message_proportions()
    counter = dict()
    for s in self.get_senders():
        counter[s] = msg_prop["words"][s] / msg_prop["messages"][s]
    return counter

def most_used_words(self, top=10, threshold=3):
    words = dict()
    for i in range(len(self.messagelist)):
        message_word = self.messagelist[i].split(" ")
        for w in message_word:
            if len(w) > threshold:
                w = w.decode("utf8")
                w = w.replace("\r", "")
                w = w.lower()
                if w not in words:
                    words[w] = 1
                else:
                    words[w] += 1
    sorted_words = sorted(words.iteritems(), key=operator.itemgetter(1),
                          reverse=True)
    counter = 0
    output = sorted_words[:top]
    return output

def printDict(dic, parent, depth):
tup = sorted(dic.iteritems(), key=operator.itemgetter(1))
isLeaf = True
for key in tup:
if isinstance(dic[key[0]], dict):
isLeaf = False
if isLeaf and depth!=0:
print " "_(depth-1)_2, parent
for key in tup:
if isinstance(dic[key[0]], dict):
printDict(dic[key[0]], key[0], depth+1)
else:
print " "_depth_2, str(key[0]), "->", dic[key[0]]

def main():
if len(sys.argv) < 2:
print "Run: python main.py [regex. patterns]"
sys.exit(1)
c = Chat(sys.argv[1])
c.open_file()
c.feed_lists()
output = dict()

print "\n--PROPORTIONS"
output["proportions"] = c.message_proportions()
printDict(output["proportions"], "proportions", 0)

print "\n--SHIFTS"
output["shifts"] = c.count_messages_per_shift()
printDict(output["shifts"], "shifts", 0)

print "\n--WEEKDAY"
output["weekdays"] = c.count_messages_per_weekday()
printDict(output["weekdays"], "weekday", 0)

print "\n--AVERAGE MESSAGE LENGTH"
output["lengths"] = c.average_message_length()
printDict(output["lengths"], "lengths", 0)

print "\n--PATTERNS"
output["patterns"] = c.count_messages_pattern(sys.argv[2:])
printDict(output["patterns"], "patterns", 0)

print "\n--TOP 15 MOST USED WORDS (length >= 3)"
output["most_used_words"] = c.most_used_words(top=15, threshold=3)
output["most_used_words"] = sorted(output["most_used_words"], key=operator.itemgetter(1), reverse=True)
#print output["most_used_words"]
#for muw in output["most_used_words"]:
#    print muw[0]

print "TIMESTAMPS\n %s\n\n" %c.chatTimeList[0:4]
print "Root Response time sample \n %s...\n" %c.rootResponseTimeList[0:4]
print "Contact Response time sample \n %s...\n" %c.contactResponseTimeList[0:4]
print "Root bursts \n %s\n" %c.rootBurstList
print "Contact bursts \n %s\n" %c.contactBurstList

print "Median response time =%s\n\n" %(numpy.median(c.responseTimeList))

output["senders"] = c.get_senders()
#filename = sys.argv[1].split("/")[-1]
#arq = open("./logs/"+filename+".json", "w")
#arq = open("filename.json", "w")
nameTest = sys.argv[1] 
arq = open("C:/Python27/"+nameTest+".json", "w")
arq.write(json.dumps(output))
pprint(output)
arq.close()

with open('names.csv', 'w') as csvfile:

fieldnames = ['msgs_root', 'msgs_contact', 'chars_root', 'chars_contact', 'qmarks_root', 'qmarks_contact']

writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

writer.writeheader()

#    writer.writerow({'msgs_root': c.message_proportions , 'last_name': 'Beans'})

main()

from whatsapp-parser.

nmoya avatar nmoya commented on August 17, 2024

Hello @nalbanders !

I will add you as a contributor to the repository so that you have write access. Could you please commit your adapted main file with a different name?

Also, something came up tomorrow at 6:30 PM EST. Do you mind changing our call to 4:30 PM EST or 5 PM EST? If it is not possible, it's alright, but I will need to leave at 6:15PM EST and then we can reschedule a new call if 45 minutes are not enough. I will be on Skype tomorrow's afternoon, so if you arrive earlier, we can start earlier otherwise we keep the original schedule :-)

Also, your graphs did not show up. I was looking forward to see them! :(
Great job on the modifications in the main file!

from whatsapp-parser.

nalbanders avatar nalbanders commented on August 17, 2024

Ok, will try to call at 5 instead.

Will push to git tomorrow.

Thanks,
A
On Apr 29, 2015 1:53 AM, "Nikolas Moya" [email protected] wrote:

Hello @nalbanders https://github.com/nalbanders !

I will add you as a contributor to the repository so that you have write
access. Could you please commit your adapted main file with a different
name?

Also, something came up tomorrow at 6:30 PM EST. Do you mind changing our
call to 4:30 PM EST or 5 PM EST? If it is not possible, it's alright, but I
will need to leave at 6:15PM EST and then we can reschedule a new call if
45 minutes are not enough. I will be on Skype tomorrow's afternoon, so if
you arrive earlier, we can start earlier otherwise we keep the original
schedule :-)

Great job in your modifications in the main file!


Reply to this email directly or view it on GitHub
#2 (comment).

from whatsapp-parser.

Related Issues (16)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.