Coder Social home page Coder Social logo

Obtaining Data Delays Severe about moosefs HOT 9 CLOSED

moosefs avatar moosefs commented on May 21, 2024
Obtaining Data Delays Severe

from moosefs.

Comments (9)

xandrus avatar xandrus commented on May 21, 2024

Hi,
First of all I would like to suggest you to watch operations log inside moosefs client.
Simply execute:

cat /home/data/logs/nginx/.oplog

You will be able to see what is really going on inside mfsmount and what kind of operations are executed.
Please send us results from .oplog MooseFS object.

It's look like your Nginx using cache and execute flush operation from time to time.
Please check "flush" parameter in you Nginx configuration.
http://nginx.org/en/docs/http/ngx_http_log_module.html

from moosefs.

OraCheung avatar OraCheung commented on May 21, 2024

Hi,

The oplog like:

02.17 09:23:17.971750: uid:0 gid:0 pid:24716 cmd:write (292010,201,569577): OK (201)
02.17 09:23:17.999473: uid:0 gid:0 pid:24716 cmd:write (292010,288,569778): OK (288)
02.17 09:23:18.014322: uid:0 gid:0 pid:24716 cmd:write (292010,240,570066): OK (240)
02.17 09:23:18.031706: uid:0 gid:0 pid:24716 cmd:write (292010,241,570306): OK (241)
02.17 09:23:18.124160: uid:0 gid:0 pid:24716 cmd:write (292010,355,570547): OK (355)
02.17 09:23:18.129366: uid:0 gid:0 pid:24716 cmd:write (292010,201,570902): OK (201)
02.17 09:23:18.178810: uid:0 gid:0 pid:24716 cmd:write (292010,313,571103): OK (313)
02.17 09:23:18.203245: uid:0 gid:0 pid:24716 cmd:write (292010,345,571416): OK (345)
02.17 09:23:18.219016: uid:0 gid:0 pid:24716 cmd:write (292010,199,571761): OK (199)
02.17 09:23:18.264838: uid:0 gid:0 pid:24716 cmd:write (292010,243,571960): OK (243)
02.17 09:23:18.299924: uid:0 gid:0 pid:24716 cmd:write (292010,218,572203): OK (218)
02.17 09:23:18.323139: uid:0 gid:0 pid:24716 cmd:write (292010,250,572421): OK (250)
02.17 09:23:18.331646: uid:0 gid:0 pid:24716 cmd:write (292010,249,572671): OK (249)
02.17 09:23:18.347683: uid:0 gid:0 pid:24716 cmd:write (292010,273,572920): OK (273)
02.17 09:23:18.359451: uid:0 gid:0 pid:24716 cmd:write (292010,256,573193): OK (256)
02.17 09:23:18.364929: uid:0 gid:0 pid:24716 cmd:write (292010,271,573449): OK (271)
02.17 09:23:18.368938: uid:0 gid:0 pid:24716 cmd:write (292010,268,573720): OK (268)
02.17 09:23:18.375024: uid:0 gid:0 pid:24716 cmd:write (292010,250,573988): OK (250)
02.17 09:23:18.378074: uid:0 gid:0 pid:24716 cmd:write (292010,249,574238): OK (249)
02.17 09:23:18.391258: uid:0 gid:0 pid:24716 cmd:write (292010,269,574487): OK (269)
02.17 09:23:18.441630: uid:0 gid:0 pid:24716 cmd:write (292010,284,574756): OK (284)
02.17 09:23:18.466238: uid:0 gid:0 pid:24716 cmd:write (292010,331,575040): OK (331)
02.17 09:23:18.466359: uid:0 gid:0 pid:24716 cmd:write (292010,352,575371): OK (352)
02.17 09:23:18.483821: uid:0 gid:0 pid:24716 cmd:write (292010,276,575723): OK (276)
02.17 09:23:18.498730: uid:0 gid:0 pid:24716 cmd:write (292010,374,575999): OK (374)
02.17 09:23:18.504726: uid:0 gid:0 pid:24716 cmd:write (292010,277,576373): OK (277)
02.17 09:23:18.517666: uid:0 gid:0 pid:24716 cmd:write (292010,276,576650): OK (276)
02.17 09:23:18.521152: uid:0 gid:0 pid:24716 cmd:write (292010,276,576926): OK (276)
02.17 09:23:18.522902: uid:0 gid:0 pid:24716 cmd:write (292010,206,577202): OK (206)
02.17 09:23:18.529218: uid:0 gid:0 pid:24716 cmd:write (292010,387,577408): OK (387)
02.17 09:23:18.530717: uid:0 gid:0 pid:24716 cmd:write (292010,298,577795): OK (298)
02.17 09:23:18.544620: uid:0 gid:0 pid:24716 cmd:write (292010,323,578093): OK (323)
02.17 09:23:18.581265: uid:0 gid:0 pid:24716 cmd:write (292010,236,578416): OK (236)
02.17 09:23:18.606531: uid:0 gid:0 pid:24716 cmd:write (292010,201,578652): OK (201)
02.17 09:23:18.671402: uid:0 gid:0 pid:24716 cmd:write (292010,308,578853): OK (308)
02.17 09:23:18.703426: uid:0 gid:0 pid:24716 cmd:write (292010,198,579161): OK (198)
02.17 09:23:18.776515: uid:0 gid:0 pid:24716 cmd:write (292010,199,579359): OK (199)
02.17 09:23:18.803434: uid:0 gid:0 pid:24716 cmd:write (292010,348,579558): OK (348)
02.17 09:23:18.847969: uid:0 gid:0 pid:24716 cmd:write (292010,199,579906): OK (199)
02.17 09:23:18.903167: uid:0 gid:0 pid:24716 cmd:write (292010,308,580105): OK (308)
02.17 09:23:18.982354: uid:0 gid:0 pid:24716 cmd:write (292010,425,580413): OK (425)
02.17 09:23:18.990796: uid:0 gid:0 pid:24716 cmd:write (292010,200,580838): OK (200)
02.17 09:23:18.992178: uid:0 gid:0 pid:24716 cmd:write (292010,292,581038): OK (292)
02.17 09:23:19.017782: uid:0 gid:0 pid:24716 cmd:write (292010,271,581330): OK (271)
02.17 09:23:19.024706: uid:0 gid:0 pid:24716 cmd:write (292010,297,581601): OK (297)
02.17 09:23:19.049587: uid:0 gid:0 pid:24716 cmd:write (292010,295,581898): OK (295)
02.17 09:23:19.203131: uid:0 gid:0 pid:24716 cmd:write (292010,209,582193): OK (209)
02.17 09:23:19.251906: uid:0 gid:0 pid:24716 cmd:write (292010,305,582402): OK (305)
02.17 09:23:19.276591: uid:0 gid:0 pid:24716 cmd:write (292010,324,582707): OK (324)
02.17 09:23:19.284785: uid:0 gid:0 pid:24716 cmd:write (292010,199,583031): OK (199)
02.17 09:23:19.377134: uid:0 gid:0 pid:24716 cmd:write (292010,296,583230): OK (296)
02.17 09:23:19.382819: uid:0 gid:0 pid:24716 cmd:write (292010,197,583526): OK (197)
02.17 09:23:19.382937: uid:0 gid:0 pid:24716 cmd:write (292010,289,583723): OK (289)
02.17 09:23:19.403451: uid:0 gid:0 pid:24716 cmd:write (292010,313,584012): OK (313)
02.17 09:23:19.460539: uid:0 gid:0 pid:24716 cmd:write (292010,210,584325): OK (210)
02.17 09:23:19.486978: uid:0 gid:0 pid:24716 cmd:write (292010,298,584535): OK (298)
02.17 09:23:19.547490: uid:0 gid:0 pid:24716 cmd:write (292010,306,584833): OK (306)
02.17 09:23:19.604439: uid:0 gid:0 pid:24716 cmd:write (292010,316,585139): OK (316)
02.17 09:23:19.621328: uid:0 gid:0 pid:24716 cmd:write (292010,314,585455): OK (314)
02.17 09:23:19.648566: uid:0 gid:0 pid:24716 cmd:write (292010,201,585769): OK (201)
02.17 09:23:19.764895: uid:0 gid:0 pid:24716 cmd:write (292010,329,585970): OK (329)
02.17 09:23:19.828972: uid:0 gid:0 pid:24716 cmd:write (292010,323,586299): OK (323)
02.17 09:23:19.834529: uid:0 gid:0 pid:24716 cmd:write (292010,197,586622): OK (197)
02.17 09:23:19.870466: uid:0 gid:0 pid:24716 cmd:write (292010,319,586819): OK (319)
02.17 09:23:19.954825: uid:0 gid:0 pid:24716 cmd:write (292010,403,587138): OK (403)
02.17 09:23:19.965577: uid:0 gid:0 pid:24716 cmd:write (292010,404,587541): OK (404)
02.17 09:23:19.981905: uid:0 gid:0 pid:24716 cmd:write (292010,401,587945): OK (401)
...

And when in another machine to cat nginx logs, it sometimes would return:

cat: 114.log: Input/output error

Corresponding oplog are as follows

02.17 09:36:46.840322: uid:0 gid:0 pid:0 cmd:invalidate cache (292010:0:67108864): ok
02.17 09:36:46.840817: uid:0 gid:0 pid:0 cmd:invalidate cache (292010:0:67108864): ok
02.17 09:36:47.437919: uid:0 gid:0 pid:8181 cmd:opendir (1): OK [handle:00000001]
02.17 09:36:47.438748: uid:0 gid:0 pid:8181 cmd:readdir (1,4096,270): OK (200)
02.17 09:36:47.438825: uid:0 gid:0 pid:8181 cmd:readdir (1,4096,270): OK (no data)
02.17 09:36:47.438880: uid:0 gid:0 pid:0 cmd:releasedir (1): OK
02.17 09:36:47.439335: uid:0 gid:0 pid:8181 cmd:getattr (1): OK (1.0,[drwxr-xr-x:0040755,2,65534,65534,1487295407,1487294549,1487294549,2002734])
02.17 09:36:47.439913: uid:0 gid:0 pid:8181 cmd:lookup (1,114.log): OK (0.0,292010,1.0,[-rw-r--r--:0100644,1,65534,65534,1487294770,1487295407,1487295407,10051099])
02.17 09:36:47.440560: uid:0 gid:0 pid:8181 cmd:lookup (1,114.log): OK (0.0,292010,1.0,[-rw-r--r--:0100644,1,65534,65534,1487294770,1487295407,1487295407,10051099])
02.17 09:36:47.994046: uid:0 gid:0 pid:8195 cmd:lookup (1,114.log): OK (0.0,292010,1.0,[-rw-r--r--:0100644,1,65534,65534,1487294770,1487295407,1487295407,10051099])
02.17 09:36:47.994152: uid:0 gid:0 pid:8195 cmd:open (292010) (using cached data from lookup): OK (direct_io:0,keep_cache:0) [handle:02000001]
02.17 09:36:52.994360: uid:0 gid:0 pid:8195 cmd:read (292010,131072,0): EIO (Input/output error)
02.17 09:36:52.994431: uid:0 gid:0 pid:8195 cmd:read (292010,131072,131072): EIO (Input/output error)
02.17 09:36:52.994572: uid:0 gid:0 pid:8195 cmd:read (292010,4096,0): EIO (Input/output error)
02.17 09:36:52.994694: uid:0 gid:0 pid:8195 cmd:flush (292010): OK
02.17 09:36:52.994731: uid:0 gid:0 pid:0 cmd:release (292010): OK

Sometimes it will return correctly, but the data has a delay of nearly 20 seconds

I do not use cache(either buffer) for access log.

My access log config is:

access_log /home/data/logs/access.log main;

Thanks for your help !

from moosefs.

xandrus avatar xandrus commented on May 21, 2024

Hi,
I would like to suggest you to do MooseFS update to 3.0.88 version.

Please check if you have any errors on your NIC interface or on switch port.
Good idea is to check system log on MooseFS client machine.

from moosefs.

xandrus avatar xandrus commented on May 21, 2024

Hi,
Do you have any updates according to EIO problem?

from moosefs.

OraCheung avatar OraCheung commented on May 21, 2024

Hi,
Yes, I updated, but the problem still exists. I suspect that this problem is caused by high write operations, So I changed the directory of nginx log.

from moosefs.

xandrus avatar xandrus commented on May 21, 2024

OK.
Thank you for this information.
Personally I believe that problem is connected with some other aspects.

Also I would like to add that you can try to mount MooseFS client in DIRECT mode like:

mfsmount -o mfscachemode=DIRECT -H master.host.name /mnt/mfs

or set specific objects extra attributes for log folder like:

mfsseteattr -r -f nodatacache /mnt/mfs/log_folder
This option do not require mfsmount remount operation.

Would you be so kind and tell us what is your current hardware configuration, I mean number of chunkservers, HDD's, LAN?
Also is your MooseFS cluster is installed on bare metal or VM's?

By the way.
We have MooseFS clients with over 28000 write operations per minute in production environment.

from moosefs.

OraCheung avatar OraCheung commented on May 21, 2024

Hi,
Thank for your help, and I will try again with “mfscachemode=DIRECT”.
My system configuration is as follows:
1 Master (Mem 64G)
3 chunkservers (250G, 250G,1.8T) RAID1
Network bandwidth 1G
MooseFS cluster is installed on bare metal
Mfs Client is in Docker container

from moosefs.

xandrus avatar xandrus commented on May 21, 2024

Thanks,
I will check scenario with mfsmount and Docker.

from moosefs.

OraCheung avatar OraCheung commented on May 21, 2024

Hi,
DIRECT mode can solve the problem!
When I change to direct mode, the delay is reduced from 20 seconds to less than 1 second.
Thank for your help!

from moosefs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.