Thank you for publishing your code. I have some questions to DeepExp

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

feature vector of states about machine_learning_security HOT 5 CLOSED

MasanoriYamada commented on July 1, 2024 1

feature vector of states

from machine_learning_security.

Comments (5)

13o-bbr-bbq commented on July 1, 2024 1

@MasanoriYamada
thanks for your advises.
now, i think want to improve the accuracy of exploitation.

current version of DeepExploit is only using the normalization.
but, i think that accuracy is not as good as i thought. so, i'll try to one-hot encoding instead of the normalization.
i think want to improve definition of after performing action.
if DeepExploit succeeds exploitation, status of metasploit's console is changed (one example: msf -> meterpreter). so, i think want to use console's status to definition of after performing action.

i'd be pleased if you could give me other advice.

from machine_learning_security.

MasanoriYamada commented on July 1, 2024 1

@13o-bbr-bbq
Thanks for quick reply.

current version of DeepExploit is only using the normalization.
but, i think that accuracy is not as good as i thought. so, i'll try to one-hot encoding instead of the normalization.

OK.
Since the pace of ports and versions are higher dimensions, you may have to use domain knowledge.
i.e. some ports are handled specially.

2.i think want to improve definition of after performing action.
if DeepExploit succeeds exploitation, status of metasploit's console is changed (one example: msf -> meterpreter). so, i think want to use console's status to definition of after performing action.

OK! I understand.
I am not an expert on penetration testing, so please teach me.
How many post-exploitation do you do in your experience?

(Because the action space is large, I am concerned about the depth of exploration)

from machine_learning_security.

MasanoriYamada commented on July 1, 2024 1

i have a question. is domain knowledge a technique used in transfer learning?

No, my mean that domain knowledge is network and security knowledge.

start --[exploitation (using RL)]--> first server --[exploitation (using RL)]--> second server --[exploitation (using RL)]--> third server --> ...

OK! I understand.
If you are penetrating under the assumption that an attack should not be detected.
Post exploit is nice, but it may be good to give a negative reward if an attack is detected.
Because in this situation the delayed reward for reinforcement learning is more efficient.

from machine_learning_security.

13o-bbr-bbq commented on July 1, 2024 1

No, my mean that domain knowledge is network and security knowledge.

oh, i misunderstood.
i'll improve state of ports and versions using my security knowledge.

but it may be good to give a negative reward if an attack is detected.

you're completely right. in penetration test, it is bad that an attack is detected.
so, i'll consider method of negative reward that attack is detected (anti virus etc).

thanks for your advice!

from machine_learning_security.

13o-bbr-bbq commented on July 1, 2024

@MasanoriYamada
thanks for reply!

Since the pace of ports and versions are higher dimensions, you may have to use domain knowledge.

great thanks for your advice.
i have a question. is domain knowledge a technique used in transfer learning?

How many post-exploitation do you do in your experience?
(Because the action space is large, I am concerned about the depth of exploration)

actually there are various patterns post-exploitation (penetrate internal servers via compromised server, extract credential information on compromised server, etc.). but, the purpose of the current DeepExploit is to penetrate the internal server.
therefore, if DeepExploit succeed exploitation of first server, it tries to penetrate the internal server via the first server. and, if DeepExploit succeed exploitation of the internal server (=second server) via first server, it tries to penetrate other internal servers via second server as well. repeat this for the number of servers.

my assumption is as follows.

start --[exploitation (using RL)]--> first server --[exploitation (using RL)]--> second server --[exploitation (using RL)]--> third server --> ...

but, it is not realistic to repeat penetration infinitely, so in practice i define the maximum number of target servers.

from machine_learning_security.

feature vector of states about machine_learning_security HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent