Coder Social home page Coder Social logo

freq's People

Contributors

dmium avatar markbaggett avatar pcoccoli avatar philhagen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

freq's Issues

Contributing

I'm not sure what your process for contributing to this project is. I have a few things that I would like to tackle to help out the project, is there a specific process you have in place for doing this, or is it simply fork, make the changes, and then PR the changes?

pip package

Did you want to create this as a pip installable package?

package reorg

Hey Mark,

Over the next couple of weeks I wanted to work on this project. One of the things I'd like to do is reorg the structure of the code here to something like this:

/root (freq)
- freq (source)
- bin (.exe, systemd, upstart, etc)
- test
- setup.py
- README.md

This would organize the code a little better, allow for package install and upload to pip, and creating better installers for local systems. What are your thoughts on this?

Combining freq score with Alexa top 1M

Hi,

Is it possible to add new feature that checks order of the domain in Alexa? This way, we can compare the freq score with Alexa and optimize/reduce false positives. It would be perfect if we can query just for the freq score, Alexa order or both.

duplicate words can be tallied

if a new word is added to the table using tall_str and that word has already been processed, then that can cause duplicate data essentially. The tool should keep a set of already processed words and check that there aren't duplicates created.

A simple solution could be to keep a set() internal to FreqCounter and check that the word hasn't already been processed.

No license

This repo has no license applied. I believe it is intended to be publicly available, in which case it is probably important to clarify that intent (and explicitly limit author liability) with a common OSS license.

proxy

hi
i was wondering if you can help me how to use http proxy inside your freq_server.py code.
TNX

upstart config file

One of the CentOS 6 servers uses upstart. Here's the conf file I put in /etc/init/ to run this as a service:


description     "Freq Server"
start on filesystem or runlevel [2345]
stop on runlevel [!2345]

respawn

script
      export HOME="/usr/bin"
      echo $$ > /var/run/freqserver.pid
      exec python /opt/freqserver/freq_server.py -ip <IP> <port> /opt/freqserver/english_lowercase.freq
end script

pre-start script
      echo "[`date`] Freq Server starting" >> /var/log/messages
end script

pre-stop script
      echo "[`date`] Freq Server stopping" >> /var/log/messages
end script

Test Cases

Is there a set of test cases that can be run to ensure the lib is working properly?

Bulk Mode Missing

One of my favorites was passing bulk data to the older version.

diff --git a/freq.py b/freq.py
index 6466ee0..f01a460 100755
--- a/freq.py
+++ b/freq.py
@@ -160,6 +160,7 @@ if __name__ == "__main__":
     parser.add_argument('-m','--measure',required=False,help='Measure likelihood of a given string',dest='measure')
     parser.add_argument('-n','--normal',required=False,help='Update the table based on the following normal string',dest='normal')
     parser.add_argument('-f','--normalfile',required=False,help='Update the table based on the contents of the normal file',dest='normalfile')
+    parser.add_argument('-b','--bulk',required=False,help='File containing a list of strings to test',dest='bulkfile')
     parser.add_argument('-p','--print',action='store_true',required=False,help='Print a table of the most likely letters in order',dest='printtable')
     parser.add_argument('-c','--create',action='store_true',required=False,help='Create a new empty frequency table',dest='create')
     parser.add_argument('-v','--verbose',action='store_true',required=False,help='show calculation process',dest='verbose')
@@ -199,4 +200,12 @@ if __name__ == "__main__":
             for eachline in filehandle:
                 fc.tally_str(eachline.decode("latin1"))
     if args.measure: print(fc.probability(args.measure))
+    if args.bulkfile:
+        try:
+            with open(args.bulkfile, 'r') as bulkFile:
+                for line in bulkFile:
+                    line = line.strip()
+                    print("%-30s %s" %(line, fc.probability(line)))
+        except FileNotFoundError:
+            sys.stderr.write("Failed to find bulk file: %s\n" %(args.bulkfile))
     fc.save(args.freqtable)

I wrote a Haskell implementation

I work at a company called Layer 3 Communications in Atlanta, we use Haskell. I wrote a Haskell implementation of freq, which you can see here: https://github.com/chessai/freq

We have a datatype called a FreqTrain which is essentially a nested Map, based off the original code located here. The problem with it is that lookups are slow, because lookups on a Map are log(n), and it's nested, so that's log^2(n), and we perform a number of lookups equal to the length of the string minus 1, or (k - 1) * log^2(n). However, we get O(1) lookup by turning a 'FreqTrain' into a 'Freq', and inlining the results of what the lookup would be to get even better constant factors. Benchmarks show that 'Freq's are about 100 times faster to read from than 'FreqTrain's. Also, training from files takes about 2 seconds.

Error on raise exception

There seems to be a typo on the following line, but this line would only cause an if the frequency table specified could not be found.

Technically it is valid syntax, but the object does not exist.

raiseeerver.fcs[eachtable]

get 2 scores returned

Hi Mark,

I just pulled the so-frreqserver (which runs the same command line as yours)

docker pull securityonionsolutions/so-freqserver
docker run -p 10004:10004 securityonionsolutions/so-freqserver

How come it returns 2 scores when I query it? When I installed freq_server from scratch I never had that.

root@ubuntu:/opt# curl http://localhost:10004/measure/crimsoncore.be
(10.0239, 8.6545)

Cheers,
Luk

Loading a table is slow

I created a freqtable from the top 1 million domain names but loading it takes 5 minutes:

$ time freq.py -v -m 'google' test.freq
Ignoring Case: True
Ignoring Characters: 
	~`!@#$%^&*()_+-
All pairs: ['go', 'og', 'le', 'oo', 'gl']
Probability of go: [0.19683947542892255]
Probability of og: [0.19683947542892255, 0.07902049839017421]
Probability of le: [0.19683947542892255, 0.07902049839017421, 0.23942299582507484]
Probability of oo: [0.19683947542892255, 0.07902049839017421, 0.23942299582507484, 0.10635868072983433]
Probability of gl: [0.19683947542892255, 0.07902049839017421, 0.23942299582507484, 0.10635868072983433, 0.1602173315437978]
Average Probability: 15.637179638356075% 


Letter1:248460000 Letter2:48691000  - This pair g:248460000 o:48691000
Letter1:822381000 Letter2:93875000  - This pair o:573921000 g:45184000
Letter1:1203860000 Letter2:184714000  - This pair l:381479000 e:90839000
Letter1:1777781000 Letter2:245530000  - This pair o:573921000 o:60816000
Letter1:2026241000 Letter2:285162000  - This pair g:248460000 l:39632000
Total Word Probability: 285162000/2026241000 = 14.073449308349797
(15.6372, 14.0734)

real	5m6.152s
user	5m2.743s
sys	0m3.197s

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.