• Mystic Ubuntu 16.04 LTS x64 Hang/crash

    From bcw142@21:1/145.3 to G00r00 on Friday, October 06, 2017 07:13:52
    Mystic was hung this morning under Ubuntu 16.04 LTS x64, uname -a shows:
    Linux Repair2 4.10.0-32-generic #36~16.04.1-Ubuntu SMP Wed Aug 9 09:19:02 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux as the specific setup.
    Mystic had been running since Monday, hung on Friday. What I did see was:

    top - 06:31:04 up 10 days, 21 min, 1 user, load average: 1.39, 1.29, 1.22 Tasks: 207 total, 3 running, 203 sleeping, 1 stopped, 0 zombie
    %Cpu(s): 15.9 us, 42.7 sy, 0.0 ni, 41.2 id, 0.2 wa, 0.0 hi, 0.0 si, 0.0
    st
    KiB Mem : 2035412 total, 499016 free, 557408 used, 978988 buff/cache
    KiB Swap: 4202492 total, 3633012 free, 569480 used. 1096860 avail Mem

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12856 brianw 20 0 24180 1416 1416 R 100.0 0.1 2696:35 mystic
    2359 brianw 20 0 1246480 58000 23724 S 7.3 2.8 1336:32 compiz
    867 root 20 0 471916 40064 21336 S 5.0 2.0 371:00.91 Xorg
    2402 brianw 20 0 1681956 66916 16824 S 2.3 3.3 230:53.00 audacious

    sudo killall mis mis2
    #Neither shows running but:

    top - 06:38:27 up 10 days, 28 min, 1 user, load average: 1.22, 1.19, 1.18 Tasks: 206 total, 3 running, 202 sleeping, 1 stopped, 0 zombie
    %Cpu(s): 17.7 us, 43.9 sy, 38.3 ni, 0.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0
    st
    KiB Mem : 2035412 total, 431032 free, 605336 used, 999044 buff/cache
    KiB Swap: 4202492 total, 3635568 free, 566924 used. 1047976 avail Mem

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 12856 brianw 20 0 24180 1416 1416 R 99.7 0.1 2703:57 mystic 25619 brianw 39 19 224124 154712 106500 R 78.5 7.6 0:03.21 apt-check
    2359 brianw 20 0 1246480 58216 23932 S 12.3 2.9 1337:06 com

    brianw@Repair2:/mystic$ ps a | grep mystic
    25625 pts/5 S+ 0:00 grep --color=auto mystic
    brianw@Repair2:/mystic$ sudo kill -9 12856
    brianw@Repair2:/mystic$ top

    top - 06:39:39 up 10 days, 29 min, 1 user, load average: 0.93, 1.14, 1.16 Tasks: 205 total, 3 running, 201 sleeping, 1 stopped, 0 zombie
    %Cpu(s): 8.6 us, 2.3 sy, 0.0 ni, 88.4 id, 0.7 wa, 0.0 hi, 0.0 si, 0.0
    st
    KiB Mem : 2035412 total, 482420 free, 557048 used, 995944 buff/cache
    KiB Swap: 4202492 total, 3637116 free, 565376 used. 1102032 avail Mem
    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    2359 brianw 20 0 1246480 58216 23932 R 8.6 2.9 1337:13 compiz
    867 root 20 0 472516 40600 21868 S 5.0 2.0 371:22.52 Xorg
    2402 brianw 20 0 1681956 66916 16824 S 3.3 3.3 231:06.42 audacious

    restarting mis and mis2 did nothing that I can see and it won't come up, had
    to reboot. Came right up with the reboot. Something is hanging it at 100% CPU time and shows only as 'mystic' in top. What's the next step in trying to
    find the real source of this hang? It is rare, but has happened twice on two different Ubuntu 16.04 LTS x64 systems since A35 was released.

    --- Mystic BBS v1.12 A35 (Linux/64)
    * Origin: Workpoint (21:1/145.3)
  • From it@21:1/142 to bcw142 on Tuesday, October 17, 2017 10:18:10
    I noticed this as well and the only thing that would end the process was
    kill -9 <pid> and all three of my nodes were in use with no active user.

    One thing I noticed was that they were listed as "unknown" connections (as shown in nodespy) and were stuck at the login screen, likely bots probing ports, but Mystic never timed out on them. On snooping I saw the username
    was a string of text containing "busybox" in it, and it hung there.

    --- Mystic BBS v1.12 A35 (Linux/32)
    * Origin: The Vista BBS - Vallejo, CA USA (21:1/142)
  • From g00r00@21:1/108 to bcw142 on Sunday, October 22, 2017 15:59:47
    Mystic was hung this morning under Ubuntu 16.04 LTS x64, uname -a shows: Linux Repair2 4.10.0-32-generic #36~16.04.1-Ubuntu SMP Wed Aug 9
    09:19:02 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux as the specific setup. Mystic had been running since Monday, hung on Friday. What I did see was:

    What would help me figure out where it is locking up is figuring out what the user was last doing when they disconnected on that node that gets "stuck".

    --- Mystic BBS v1.12 A36 (Windows/32)
    * Origin: Sector 7 [Mystic BBS WHQ] (21:1/108)
  • From bcw142@21:1/145 to g00r00 on Sunday, October 22, 2017 18:04:47
    On 10/22/17, g00r00 said the following...
    What would help me figure out where it is locking up is figuring out
    what the user was last doing when they disconnected on that node that
    gets "stuck".

    I'm having a look through the logs now, but I suspect I get no data when it really hangs as it's hung. Still I did run in to one thing I've seen and had cause me problems before:
    ------------------- Node 1 (Mystic v1.12 A35)
    2017.10.18 23:59:37 Connect from 187.54.53.107 (187-54-53-107.mganm702.dsl.brasiltelecom.net.br)
    2017.10.19 00:00:00 User ran out of time

    because the day changed it ran out of time in less then 30 seconds. That's
    been a problem I've run in to. I think it also pretty bans (wrong term
    likely) you for a while (a day or two) when it happens. I have to go in to
    the configuration and fix the timeouts for that user when it happens.
    Another thing I'm noticing is a lot of Connect with no close or logout or anything:
    ------------------ Node 1 (Mystic v1.12 A35)
    2017.10.18 18:51:57 Connect from 103.100.222.197 (Unknown)

    ------------------- Node 1 (Mystic v1.12 A35)
    2017.10.18 18:52:34 Connect from 103.100.222.197 (Unknown)

    ------------------- Node 1 (Mystic v1.12 A35)
    2017.10.18 18:53:08 Connect from 103.100.222.197 (Unknown)

    ------------------- Node 1 (Mystic v1.12 A35)
    2017.10.18 20:25:17 Connect from 125.99.81.58 (Unknown)

    ------------------- Node 1 (Mystic v1.12 A35)
    2017.10.18 20:25:52 Connect from 125.99.81.58 (Unknown)

    ------------------- Node 1 (Mystic v1.12 A35)
    2017.10.18 20:26:27 Connect from 125.99.81.58 (Unknown)

    I'm guessing it's hammering on the one port, dropping and trying again and again. One thing I was thinking is adding a setting so (Unknown) gets dropped quickly. Pretty much every real login has a name of some type and it not an Unknown.

    Looks like Node1.log is the best to look at for what's going on: ------------------- Node 1 (Mystic v1.12 A35)
    2017.10.08 14:23:35 Connect from 127.0.0.1 (localhost)
    2017.10.08 14:23:42 bcw142 logged in
    2017.10.08 14:24:44 Executed Door: /bin/bash -ls -i
    2017.10.08 14:54:00 Posted #1004: "Invalid data frame sometimes means
    BLOCKED" to Mystic BBS Support/Dev
    2017.10.08 14:58:59 Executed Door: /mystic/poll
    2017.10.08 14:59:11 User logged off

    So hopefully I'll find a hang in here sometime tonight.

    --- Mystic BBS v1.12 A35 (Raspberry Pi/32)
    * Origin: Mystic Pi BBS bcw142.zapto.org (21:1/145)
  • From bcw142@21:1/145.5 to g00r00 on Tuesday, October 24, 2017 17:38:19
    On 10/22/17, g00r00 said the following...

    What would help me figure out where it is locking up is figuring out
    what the user was last doing when they disconnected on that node that
    gets "stuck".

    Yes, but what I have is a hole in the logs. I've looked at 8 or more logs so far and nothing but a hole where it hung. It was Sept 8 between 12:30pm and 1:17pm when I took down the system for reboot. Even after that I had to fight with it to get it working, it stayed hung. Finally found the tasks and did
    the kill -9 and then all the mis and mis2 tasks finally stopped, then I found
    I had to kill a mystic task as well. Then restart mystic and it was up
    finally. There was no activity in nodes1-4, mis, mis2, Looking at
    fidopoll.log now for Sep 08 12:30:00 hmm E486 pattern not found.
    Sep 08 13:33:50 Polled 5 nodes
    Sep 08 14:09:25 FIDOPOLL Version 1.12 A35
    Sep 08 14:09:25 Scanning 21:1/100
    ...
    Sep 08 13:33:50 Polled 5 nodes
    Sep 08 14:09:25 FIDOPOLL Version 1.12 A35
    Sep 08 14:09:25 Scanning 21:1/100
    The above may appear normal, but it's messing a poll between 12:28 and
    13:33, there should have been one about 1pm that didn't happen as the system was hung. You can see that after 13:33 the next is 14:09, they happen at 40 minute intervals I think it was set for, but there's the gap. Those are the three lines from the log in order:
    Sep 08 12:28:26 Polled 5 nodes
    Sep 08 13:33:41 FIDOPOLL Version 1.12 A35
    Sep 08 13:33:41 Scanning 21:1/145.2
    Showing the gap in time of more than an hour instead of 40 minutes or so. That's what I've been seeing, just a lack of data for that time.
    mis log shows:
    Sep 08 11:07:05 MIS received SIGTERM; shutting down
    Sep 08 13:17:28 MIS received SIGTERM; shutting down
    mis2 log shows only hours so I don't know then 09.08 was in it. It goes from:
    + 23:57:07 TELNET 1-Closed terminal process
    + 23:57:08 TELNET 3-Closed terminal process
    + 00:00:00 TELNET 2-Closed terminal process
    + 00:06:01 TELNET Blocked connection: 51.15.91.167
    + 00:07:07 TELNET Connect on slot 1/8
    + 00:07:07 TELNET 1-Addr: 45.119.208.34
    + 00:07:07 TELNET 1-Host: Unknown
    so unknown dates I could try to track back, but I suspect a gap like all the other logs. Ah, looked only have about 2 days (so the last two 10/23-10/24)
    of log there. Out of luck on that one. Mutil has the gap:
    + Sep 08 12:29:37 Results: Linked 0 msgs in 70.72s
    + Sep 08 12:29:37 Shutdown Normal (0)

    ----------------- MUTIL v1.12 A35 Fri, Sep 08 2017 (loglevel 3)
    + Sep 08 13:33:50 Startup using import.ini
    - Sep 08 13:33:50 SKIP Import_FIDONET.NA
    node5 looks a little better:
    ------------------- Node 5 (Mystic v1.12 A35)
    2017.09.08 12:38:44 Connect from 177.238.98.192 (177.238.98.192.cable.dyn.cableonline.com.mx)

    ------------------- Node 5 (Mystic v1.12 A35)
    2017.09.08 13:11:32 Connect from 177.64.117.211 (b14075d3.virtua.com.br) Narrows the time frame better than the rest so far 12:38-13:11 so that node
    did something (logged a connection) before I took it down at 13:17.
    Node6:
    2017.09.08 12:31:11 Connect from 123.145.31.217 (Unknown)

    ------------------- Node 6 (Mystic v1.12 A35)
    2017.09.11 07:13:27 Connect from 179.232.241.2 (b3e8f102.virtua.com.br)
    Node7 not in the logs till:
    ------------------- Node 7 (Mystic v1.12 A35)
    2017.09.12 12:34:53 Connect from 81.215.206.231 (81.215.206.231.dynamic.ttnet.com.tr)
    Node8 starts with:
    ------------------- Node 8 (Mystic v1.12 A35)
    2017.09.13 03:27:33 Connect from 144.178.139.203 (static.masmovil.com) binkp.log shows normal operations at 12:45 and 13:20.
    Sep 08 12:45:55 1 Session complete (0 sent, 0 rcvd, 0 skip)
    Sep 08 13:20:03 0 Connection from United States
    Events shows the gap:
    Sep 08 12:28:26 Command line: ./mutil import.ini
    Sep 08 12:29:37 Process result 0
    Sep 08 13:33:41 Executing "Poll echomail"
    Sep 08 13:33:41 Command line: ./fidopoll forced
    Sep 08 13:33:50 Process result 0
    server_ftp.log
    Sep 08 11:37:40 1 S:220 Mystic FTP server ready
    Sep 08 11:37:50 1 C:QUIT Data:
    Sep 08 11:37:50 1 S:221 Goodbye
    Sep 08 16:07:52 0 Connection from Austria
    Sep 08 16:07:52 0 Connect: 80.110.13.205 (mail.beaufort.at)
    Sep 08 16:07:52 1 S:220 Mystic FTP server ready
    Sep 08 16:07:52 1 C:USER Data: anonymous
    Sep 08 16:07:52 1 C:PASS
    Sep 08 16:07:52 1 S:530 No account
    A bigger gap.
    server_nntp.log
    Sep 10 02:39:02 0 Connection from Germany
    Sep 10 02:39:02 0 Connect: 138.201.169.234 (static.234.169.201.138.clients.your-server.de)
    Sep 10 02:39:02 1 S:200 Mystic BBS NNTP server ready
    Sep 10 02:39:02 1 C:^C^@^@+&à^@^@^@^@^@Cookie: mstshash=hello
    Sep 10 02:39:02 1 S:500 Unknown command
    Sep 11 14:21:25 0 Connection from Germany
    Sep 11 14:21:25 0 Connect: 138.201.169.234 (static.234.169.201.138.clients.your-server.de)
    Sep 11 14:21:25 1 S:200 Mystic BBS NNTP server ready
    Sep 11 14:21:25 1 C:^C^@^@+&à^@^@^@^@^@Cookie: mstshash=hello
    Sep 11 14:21:25 1 S:500 Unknown command
    Sep 12 10:01:37 Shutting down
    (then back up again and such)
    server_pop3.log
    Sep 07 22:41:18 0 Opening socket on 0.0.0.0:110
    Sep 08 04:23:36 Shutting down
    Sep 08 04:17:27 0 Opening socket on 0.0.0.0:110
    Sep 08 09:13:10 Shutting down
    Sep 08 09:13:15 0 Opening socket on 0.0.0.0:110
    Sep 09 08:07:07 Shutting down
    Doesn't seem to show an issue.
    server_smtp.log the same as pop3 of course (save port 25)
    server_telnet.log
    Sep 08 12:59:53 0 Connection from India
    Sep 08 12:59:53 0 Connect: 122.162.97.209 (abts-north-dynamic-209.97.162.122.airtelbroadband.in)
    Sep 08 13:11:19 0 Connection from Brazil
    Sep 08 13:11:19 0 Connect: 177.64.117.211 (b14075d3.virtua.com.br)
    Sep 08 13:11:32 0 Connection from Brazil
    Sep 08 13:11:32 0 Connect: 177.64.117.211 (b14075d3.virtua.com.br)
    Sep 08 13:11:59 0 Connection from Brazil
    Sep 08 13:11:59 0 Connect: 177.64.117.211 (b14075d3.virtua.com.br)
    Sep 08 13:12:11 0 Connection from Brazil
    Sep 08 13:12:11 0 Auto banning IP 177.64.117.211
    Gap 12:59-13:11
    That's all the logs. Gaps and lack of data in that hang.

    --- Mystic BBS v1.12 A35 (Linux/32)
    * Origin: BCW Livingroom (21:1/145.5)