Coder Social home page Coder Social logo

Comments (8)

ubersandro avatar ubersandro commented on July 28, 2024

Hello, I am writing here since I was on the point of starting a new issue but maybe we have the same problem. I am experiencing domain freezing while running Codemon for monitoring the whole userspace in Windows 10 20H1 and my output look really similar to that above. It happens sometimes and, at the moment, I cannot really reproduce arbitrarily the error. I suppose there is some trouble in managing events. My guess is that some event is not correctly handled because of some sort of lack of atomicity in removing/adding events and the domain is suspended during singlestepping but I have no idea on how to verify that this is the case.
Thanks in advance for the help,
Alessandro

from drakvuf.

tklengyel avatar tklengyel commented on July 28, 2024

Debugging that type of error is really difficult. What may help is to verify if this is a new issue or if you had the same problem with older versions. If its an issue only happening with a newer version then some recent change might have broke the logic to fix, which should easier. If its happening with older versions as well, then the logic was already broken and its much harder to figure out why.

from drakvuf.

ubersandro avatar ubersandro commented on July 28, 2024

Ok ok, I would like to try to debug it but I am not very proficient yet working with Xen. As I was saying, my suspicion is that event management is somehow broken. Maybe passing through the vm_event interface I could figure out what makes my domU hang dumping events and checking which one is not managed by the stack libvmi+drakvuf+codemon. As an alternative, I could try to write a more concise stress test for memaccess events to try to understand what is wrong. Do you have any advice for me, Tamas?

from drakvuf.

carttam avatar carttam commented on July 28, 2024

Debugging that type of error is really difficult. What may help is to verify if this is a new issue or if you had the same problem with older versions. If its an issue only happening with a newer version then some recent change might have broke the logic to fix, which should easier. If its happening with older versions as well, then the logic was already broken and its much harder to figure out why.

I tested version 1.0 and this problem was also present. I noticed that by setting PRINT_DEBUG, the output of the callback event and struct event was similar to the previous times it was called. With many tests, I could not find any properties under which this error occurs.
I just realized that if, for example, in previous executions of Xen, after the ReturnHook that was frozen in the chrome.exe process, I only filter (running Drakvuf with -C --context-process chrome.exe), Xen does not freeze.
I tried to create a problem like the current state by breaking the code, such as changing the event output value or changing event->interrupt_event.reinject or drakvuf->in_callback, all of which led to the crash of Drakvuf itself and Xen did not freeze.
Has such a problem happened before? Or do you know the reasons that can cause this problem?
Thank you for your great project, I hope it can be solved

from drakvuf.

carttam avatar carttam commented on July 28, 2024

At last, I was able to make Xen freeze at the beginning of the execution by commenting this part of the codes.

drakvuf/src/libdrakvuf/vmi.c

Lines 1144 to 1145 in 67477d0

remove_trap(drakvuf, &container->breakpoint.guard);
remove_trap(drakvuf, &container->breakpoint.guard2);

drakvuf/src/libdrakvuf/vmi.c

Lines 1515 to 1525 in 67477d0

if ( !inject_trap_mem(drakvuf, &container->breakpoint.guard, 0) )
{
PRINT_DEBUG("[IDX] Failed to create guard trap for the breakpoint!\n");
goto err_exit;
}
if ( !inject_trap_mem(drakvuf, &container->breakpoint.guard2, 1) )
{
PRINT_DEBUG("[IDX] Failed to create guard2 trap for the breakpoint!\n");
goto err_exit;
}

from drakvuf.

Amnpardaz-Hypervisor avatar Amnpardaz-Hypervisor commented on July 28, 2024

Hello ,
With many tests, I realized that the problem arises from the vmi_slat_change_gfn function to change the GFN to 0, I still don't know why this happens.
Anyway, using the vmi_set_mem_event function to change the access level to VMI_MEMACCESS_N solved the problem.

drakvuf/src/libdrakvuf/vmi.c

Lines 1184 to 1198 in 1859dc9

if ( !traps_on_gfn )
{
if ( VMI_FAILURE == vmi_slat_change_gfn(vmi, drakvuf->altp2m_idrx, container->breakpoint.guard3.memaccess.gfn, ~(addr_t)0))
{
fprintf(stderr, "Critical error in removing int3, guard3 wasn't removed\n");
drakvuf->interrupted = -1;
break;
}
if ( VMI_FAILURE == vmi_slat_change_gfn(vmi, drakvuf->altp2m_idrx, container->breakpoint.guard4.memaccess.gfn, ~(addr_t)0))
{
fprintf(stderr, "Critical error in removing int3, guard4 wasn't removed\n");
drakvuf->interrupted = -1;
break;
}
}

drakvuf/src/libdrakvuf/vmi.c

Lines 1216 to 1229 in 1859dc9

if ( VMI_SUCCESS == vmi_slat_change_gfn(vmi, drakvuf->altp2m_idx, container->memaccess.gfn, ~(addr_t)0))
{
PRINT_DEBUG("Removed memtrap for GFN 0x%lx in altp2m view %u\n",
container->memaccess.gfn, drakvuf->altp2m_idx);
struct remapped_gfn* remapped_gfn = (struct remapped_gfn*)g_hash_table_lookup(drakvuf->remapped_gfns,
GSIZE_TO_POINTER(container->memaccess.gfn));
if ( remapped_gfn )
remapped_gfn->active = 0;
g_hash_table_remove(drakvuf->memaccess_lookup_trap, trap);
g_hash_table_remove(drakvuf->memaccess_lookup_gfn, GSIZE_TO_POINTER(container->memaccess.gfn));
}

for example : vmi_set_mem_event(vmi, container->memaccess.gfn, VMI_MEMACCESS_N, drakvuf->altp2m_idx)

from drakvuf.

tklengyel avatar tklengyel commented on July 28, 2024

Yea, don't do that. That disables the core functionality of DRAKVUF and it makes the breakpoints detectable by the guest.

from drakvuf.

yuno-x avatar yuno-x commented on July 28, 2024

I always encounter the same problem when I use the apimon of drakvuf.
The qemu-xen logs show the following memory-related error and qemu-xen hangs.
This happens in any recent version.

$ cat /var/log/xen/qemu-dm-*.log
VNC server running on :::5900
Locked DMA mapping while invalidating mapcache! 0000000000000eff -> 0x7f42f34f72e0 is present
qemu-system-i386: terminating on signal 1 from pid 24521 (xl)

from drakvuf.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.