• HPT Dupes

    From Avon@21:1/101 to Al on Sunday, December 01, 2019 14:56:37
    Al I'm looking at some of the settings

    There's

    -dupecheck del -dupehistory 90

    But I can also see in the specific keywords for HPT there's

    AreasMaxDupeAge: max age for dupes in CommonDupeBase
    DupeBaseType: type of dupe base
    DupeHistoryDir: path for dupe files

    I am unsure of which should be set, how long dupe history can be set for, e.g.500 days? and the best type of dupebase to set? I'm picking
    commondupebase?

    The goal being to ensure max dupe data is stored/retained for robust
    checking..

    --- Mystic BBS v1.12 A43 2019/03/03 (Windows/32)
    * Origin: Agency BBS | Dunedin, New Zealand | agency.bbs.nz (21:1/101)
  • From Black Panther@21:1/186 to Avon on Saturday, November 30, 2019 19:33:48
    On 01 Dec 2019, Avon said the following...

    -dupecheck del -dupehistory 90

    AreasMaxDupeAge: max age for dupes in CommonDupeBase
    DupeBaseType: type of dupe base
    DupeHistoryDir: path for dupe files

    I am unsure of which should be set, how long dupe history can be set for, e.g.500 days? and the best type of dupebase to set? I'm picking commondupebase?

    Sorry for jumping in here. :)

    I actually use all of these. I keep the dupe messages in a Jam message base, just so I can see what's coming in, and from where.

    DupeArea DupeArea ~/mystic/msgs/dupe -b Jam
    DupeBaseType HashDupesWMsgId
    AreasMaxDupeAge 1100
    DupeHistoryDir ~/mystic msgs/dupehist

    In my message echo defaults I have:

    -dupecheck move dupehistory 1100

    The DupeHistryDir should be an empty directory, just used by HPT. It will create a .dpd file for each of your echos you carry. When HPT tosses the incoming messages, it will check that corresponding .dpd file, to determine
    if the message is a dupe or not.

    The DupeBaseType determines what information from the messages it will use
    for dupe detection. The HashDupesWMsgId, tells HPT to save the src32 of from, to, subject and MSGID, along with the actual MSGID.

    I'm not sure what the maximum number of days you can keep the dupe history,
    but I haven't had any issues with it being 1100... ;)

    Hope this helps. I'm sure Al will add some insight as well.


    ---

    Black Panther(RCS)
    Castle Rock BBS

    --- Mystic BBS v1.12 A43 2019/03/02 (Linux/64)
    * Origin: Castle Rock BBS - bbs.castlerockbbs.com (21:1/186)
  • From Al@21:4/106 to Avon on Saturday, November 30, 2019 18:33:56
    Al I'm looking at some of the settings

    There's

    -dupecheck del -dupehistory 90

    I use dupecheck del -dupehistory 90 here. It'll purge your dupebase after
    90 days old. You might also want "-dupecheck move" if you want dupe moved
    to the dupes message base for inspection. You could change 90 another
    setting if you like, 360 or 720 or so??

    But I can also see in the specific keywords for HPT there's

    AreasMaxDupeAge: max age for dupes in CommonDupeBase
    DupeBaseType: type of dupe base
    DupeHistoryDir: path for dupe files

    I am unsure of which should be set, how long dupe history can be
    set for, e.g.500 days? and the best type of dupebase to set? I'm
    picking commondupebase?

    Let me have a fresh look at the docs for these settings. Currently I have
    a dupebase directory full of areaname.dup files.

    The goal being to ensure max dupe data is stored/retained for
    robust checking..

    Yep, an important detail for sure..

    Ttyl :-),
    Al

    --- MagickaBBS v0.13alpha (Linux/x86_64)
    * Origin: The Rusty MailBox - Penticton, BC Canada (21:4/106)
  • From Avon@21:1/101 to Black Panther on Sunday, December 01, 2019 16:00:26
    On 30 Nov 2019 at 07:33p, Black Panther pondered and said...

    Sorry for jumping in here. :)

    Not at all jump in the waters fine :)

    DupeHistoryDir ~/mystic msgs/dupehist
    The DupeHistryDir should be an empty directory, just used by HPT. It will create a .dpd file for each of your echos you carry. When HPT tosses the incoming messages, it will check that corresponding .dpd file, to determine if the message is a dupe or not.

    OK thanks. But if I am correct it seems like if you set

    dupeBaseType CommonDupeBase

    then this may stop that behaviour as my hunch is is that it dumps all dupes into one big database file instead of different ones.

    I *think* it may also negate the echomailgroup the settings you have mentioned

    In my message echo defaults I have:

    -dupecheck move dupehistory 1100

    The docs talk about

    areasMaxDupeAge <integer>

    Set maximum days for storing you hashes in CommonDupeBase. Default value is 5. For any other dupe base type please use -DupeHistory option in EchoArea or Echo

    So if I go with the master dupe database I *think* I don;t need lines like -dupecheck and dupehistory for each echomail group. I may be wrong.

    --- Mystic BBS v1.12 A43 2019/03/03 (Windows/32)
    * Origin: Agency BBS | Dunedin, New Zealand | agency.bbs.nz (21:1/101)
  • From Al@21:4/106 to Black Panther on Saturday, November 30, 2019 19:01:06
    Hope this helps. I'm sure Al will add some insight as well.

    Not in this case. I looked at the dupe settings some years back and just
    went with the default. HashDupesWMsgID is used here without any keywords
    in the config execept what I use in the EchoAreaDefaults line.

    The CommonDupeBase works the same but your dupe base is all stored in one
    file. Good for speed but bad if anything happens to that file.

    Another setting I would like is a max age for importing echomail. I'd set
    mine to 90 days or so. That would save us importing old messages that get
    sent out for whatever reason.

    I don't see that in hpt but sbbsecho has a setting like that. With hpt
    you could use perl to do that I think, although I don't have anything
    like that.

    Ttyl :-),
    Al

    --- MagickaBBS v0.13alpha (Linux/x86_64)
    * Origin: The Rusty MailBox - Penticton, BC Canada (21:4/106)
  • From Avon@21:1/101 to Al on Sunday, December 01, 2019 16:03:38
    On 30 Nov 2019 at 06:33p, Al pondered and said...

    I use dupecheck del -dupehistory 90 here. It'll purge your dupebase after 90 days old. You might also want "-dupecheck move" if you want dupe moved to the dupes message base for inspection. You could change 90 another setting if you like, 360 or 720 or so??

    That's if you're not using the following switch - right?

    dupeBaseType CommonDupeBase

    Let me have a fresh look at the docs for these settings. Currently I have a dupebase directory full of areaname.dup files.

    Thanks :)

    Also for items like

    areafixFromPkt <bool>

    what is the correct syntax , is it 'true' or is 'yes' OK also
    Do we know if a true/false or yes/no must be stated else the keyword is
    ignored if nothing set? There seems to be no default for some of these.

    --- Mystic BBS v1.12 A43 2019/03/03 (Windows/32)
    * Origin: Agency BBS | Dunedin, New Zealand | agency.bbs.nz (21:1/101)
  • From Al@21:4/106 to Avon on Saturday, November 30, 2019 19:10:00
    I am unsure of which should be set, how long dupe history can be
    set for, e.g.500 days? and the best type of dupebase to set? I'm
    picking commondupebase?

    I would go with HashDupesWMsgId. CommonDupeBase is good too, except that
    if that file is ever lost or damaged you lose all your dupe data.

    The default HashDupesWMsgId will store your dupe hashes in a separate
    file per area.

    Ttyl :-),
    Al

    --- MagickaBBS v0.13alpha (Linux/x86_64)
    * Origin: The Rusty MailBox - Penticton, BC Canada (21:4/106)
  • From Avon@21:1/101 to Al on Sunday, December 01, 2019 16:11:35
    On 30 Nov 2019 at 07:01p, Al pondered and said...

    The CommonDupeBase works the same but your dupe base is all stored in one file. Good for speed but bad if anything happens to that file.

    Yes I am not sure which is the better way to go, dupe bases per base or a master dupe base.

    --- Mystic BBS v1.12 A43 2019/03/03 (Windows/32)
    * Origin: Agency BBS | Dunedin, New Zealand | agency.bbs.nz (21:1/101)
  • From Avon@21:1/101 to Al on Sunday, December 01, 2019 16:15:18
    On 30 Nov 2019 at 07:10p, Al pondered and said...

    I would go with HashDupesWMsgId. CommonDupeBase is good too, except that if that file is ever lost or damaged you lose all your dupe data.

    The default HashDupesWMsgId will store your dupe hashes in a separate
    file per area.

    Interestingly Mystic uses just one dupe data base and as far as I can tell
    it's been fine doing so. But yeah, I take your point. Not sure asides having dupe data per base, and less single point dependency on one file, if there is much benefit in doing the multiple dupe file option.

    It sounds like you can inspect the database files and see hash info and a
    text message ID in the system you use, but suspect that that would not be possible in the commondupebase?

    --- Mystic BBS v1.12 A43 2019/03/03 (Windows/32)
    * Origin: Agency BBS | Dunedin, New Zealand | agency.bbs.nz (21:1/101)
  • From Al@21:4/106 to Avon on Saturday, November 30, 2019 19:20:04
    That's if you're not using the following switch - right?

    dupeBaseType CommonDupeBase

    From what I understand yes. I haven't used that option so I may be wrong.

    Let me have a fresh look at the docs for these settings.
    Currently I have a dupebase directory full of areaname.dup
    files.

    Actually, areaname.dpd files..

    Also for items like

    areafixFromPkt <bool>

    what is the correct syntax , is it 'true' or is 'yes' OK also
    Do we know if a true/false or yes/no must be stated else the
    keyword is ignored if nothing set? There seems to be no default for
    some of these.

    For boolian settings false/true no/yes and 0/1 should work. I have always
    used no/yes myself.. :)

    Ttyl :-),
    Al

    --- MagickaBBS v0.13alpha (Linux/x86_64)
    * Origin: The Rusty MailBox - Penticton, BC Canada (21:4/106)
  • From Al@21:4/106 to Avon on Saturday, November 30, 2019 19:30:24
    Interestingly Mystic uses just one dupe data base and as far as I
    can tell it's been fine doing so. But yeah, I take your point. Not
    sure asides having dupe data per base, and less single point
    dependency on one file, if there is much benefit in doing the
    multiple dupe file option.

    The only benefit is that you don't have all your eggs in one basket. The
    cost is that it'll need to open and read multiple files when running. I
    find no disadvantage there. Most times when I am tossing only a few areas
    are opened.

    It sounds like you can inspect the database files and see hash info
    and a text message ID in the system you use, but suspect that that
    would not be possible in the commondupebase?

    There are only hashes stored in those files. Not much for a human to see.
    If you want to look over dupes use "-dupecheck move" to store them in the
    dupe base. Then you can see all the paths and times. If you use
    "-dupecheck del" they will be deleted.

    Ttyl :-),
    Al

    --- MagickaBBS v0.13alpha (Linux/x86_64)
    * Origin: The Rusty MailBox - Penticton, BC Canada (21:4/106)
  • From Avon@21:1/101 to Al on Sunday, December 01, 2019 16:41:28
    On 30 Nov 2019 at 07:30p, Al pondered and said...

    The only benefit is that you don't have all your eggs in one basket. The cost is that it'll need to open and read multiple files when running. I find no disadvantage there. Most times when I am tossing only a few areas are opened.

    Yep agreed, I will run with what both you and Dan are doing.

    If you want to look over dupes use "-dupecheck move" to store them in the dupe base. Then you can see all the paths and times. If you use

    Yep doing this also, thanks!

    --- Mystic BBS v1.12 A43 2019/03/03 (Windows/32)
    * Origin: Agency BBS | Dunedin, New Zealand | agency.bbs.nz (21:1/101)
  • From Avon@21:1/101 to Al on Sunday, December 01, 2019 21:47:13
    On 30 Nov 2019 at 07:20p, Al pondered and said...

    Also for items like
    areafixFromPkt <bool>

    what is the correct syntax , is it 'true' or is 'yes' OK also
    Do we know if a true/false or yes/no must be stated else the
    keyword is ignored if nothing set? There seems to be no default for some of these.

    For boolian settings false/true no/yes and 0/1 should work. I have always used no/yes myself.. :)

    True failed the TParser checker but Yes was OK to use - FYI.

    --- Mystic BBS v1.12 A43 2019/03/03 (Windows/32)
    * Origin: Agency BBS | Dunedin, New Zealand | agency.bbs.nz (21:1/101)
  • From Oli@21:1/151 to Avon on Sunday, December 01, 2019 09:56:22
    On Sun, 1 Dec 2019 16:11:35 +1300
    "Avon -> Al" <0@101.1.21> wrote:

    The CommonDupeBase works the same but your dupe base is all
    stored in one file. Good for speed but bad if anything happens
    to that file.

    Yes I am not sure which is the better way to go, dupe bases per base
    or a master dupe base.

    I don't have any experience with hpt's dupe detection, but Squish also has dupe-ID files per message base (and crashmail has only a master dupe base). I'm
    doubt that you can measure any significant performance difference. having separate dupe bases is helpful if you want to restore a single message base or want to rescan a single message base.
    Let's say you have to restore FSX_GEN from yesterday backup. You can restore the JAM message base with the area's dupe base and then rescan mails from another node. If you have a master dupe base, the dupe base and the message base is out of sync.

    ---
    * Origin: (21:1/151)
  • From Al@21:4/106 to Avon on Sunday, December 01, 2019 01:59:34
    For boolian settings false/true no/yes and 0/1 should work. I
    have always used no/yes myself.. :)

    True failed the TParser checker but Yes was OK to use - FYI.

    Oh, good to know.

    Ttyl :-),
    Al

    --- MagickaBBS v0.13alpha (Linux/x86_64)
    * Origin: The Rusty MailBox - Penticton, BC Canada (21:4/106)