Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

memory leak storing message on self.fp #35

Open
joaoreis81 opened this issue Dec 27, 2019 · 27 comments
Open

memory leak storing message on self.fp #35

joaoreis81 opened this issue Dec 27, 2019 · 27 comments

Comments

@joaoreis81
Copy link

Hi,

I use the milter-template.py to process messages and sign messages with dkim. I can store the message on memory, or disk or wherever using the "self.fp" and it works nice, using string or bytes.
The problem I'm having is that there seems to be a memory leak when doing "self.ftp.write(chunk)" - storing the body part in "self.fp" in the "body" function. Memory grows progressively while the milter processes emails messages.
If I comment "self.ftp.write(chunk)" in body function, the memory usage stays stable.

I've tryed to "del" the variable, explicit close or flush the file descriptor, saving in disk, variables, using garbage collector, but nothing seems to work. I need the "body" part to sign with DKIM.

any ideas?

@sdgathman
Copy link
Owner

sdgathman commented Dec 31, 2019

While milter-template.py is just a documentation example, it should work for the trivial things it does. Are you using py3? My production milters that DKIM sign messages do not have a memory leak. Technically, "memory leaks" are not possible in python code - only what I call "data cancer", ever growing permanent data structures. C modules can have memory leaks. Does your leak happen with the original milter-template.py ? Or with your modified version that uses pydkim?

The StringIO (or BytesIO) object should be collected when libmilter closes the connection. The first test would be to log something in the close() callback to see if it ever gets called. You can also try explicitly closing self.fp

   def close(self):
     # always called, even when abort is called.  Clean up
     # any external resources here.
+    if self.fp:
+      self.fp.close()
+      self.fp = None
     return Milter.CONTINUE

But that should not be needed.

Have you tried using dkim-milter.py ? https://github.com/sdgathman/milter/blob/master/dkim-milter.py

Also, maybe this should be discussed on the mailing list.

@joaoreis81
Copy link
Author

Yes, I'm using python3, however I noticed the same problem with pymilter running on python2.
A minor adaptation of the milter-template follows (https://github.com/joaoreis81/milter-template), with few modifications that reproduce the problem. On a reasonably loaded postfix server, after 30 minutes of messaging the process memory increased to 500MB
Screenshot_2020-01-03_11-59-33

In my last tests, I am running milter-template as a container and the increase in memory occurs both when running directly in the operating system and the container environment. We can certainly exclude this as a possible cause of the problem described.

As stated earlier, all file descriptors cleanup tests were done but unsuccessful for memory release, it looks like a "memory cancer" really.

I have been using Red Hat based operating systems (fedora, centos) and the problem always comes up. I will perform some tests changing distribution to verify the result.

I am passing this thread to the mailing list as directed.

Thanks in advance
João Reis.

@sdgathman
Copy link
Owner

After a number of experiments on the mailing list, it is pretty clear that the issue is not related to pymilter (manually deleting self.fp does not change the outcome). However, this should be a test case for the test frameworks. This kind of problem should be reproducible in a test framework. I will leave this open until we can confirm this.

@joaoreis81 How are you measuring memory consumption? This needs to be part of a test case.

@arpitest
Copy link

arpitest commented Apr 7, 2020

i'm also seeing serious memory leaks on various systems, using pymilter (python 3.6). my application uses keras/tensorflow for neural network-based spam filtering, for long time i thought the leak comes from the tensorflow code...
unfortunately it's very difficult to reproduce in test environment, as there are no milter clients in python which could connect to the pymilter-based filter and simulate mail flow.
in production environment i see high memory usage, until it reaches the limits and OOM kills:
memory-pinpoint=1551725740,1586285740

@sdgathman
Copy link
Owner

@arpitest There are 2 test frameworks (one with a test MilterBase and the newer with a test milterContext). These should be able to reproduce problems with milters themselves - both are used in the test suite for the python modules.

@sdgathman
Copy link
Owner

sdgathman commented Apr 8, 2020

@joaoreis81 Working on getting this up in podman. I can't pass up such a detailed reproducer!

Yes, I'm using python3, however I noticed the same problem with pymilter running on python2.
A minor adaptation of the milter-template follows (https://github.com/joaoreis81/milter-template), with few modifications that reproduce the problem. On a reasonably loaded postfix server, after 30 minutes of messaging the process memory increased to 500MB

@sdgathman
Copy link
Owner

@joaoreis81 The example container is up, but doesn't include any test data. I'll try copying to an active mail server. My production machines use sendmail. I wonder if postfix is the problem? Maybe it doesn't call the close callback or something.

@kitterma
Copy link
Collaborator

kitterma commented Apr 8, 2020 via email

@sdgathman
Copy link
Owner

Running the milter-template.py under a testctx with 1000 iterations of test2 (a large message), there is no memory growth whatsoever.

@arpitest
Copy link

arpitest commented Apr 8, 2020 via email

@sdgathman
Copy link
Owner

I am running the milter container connected to my home mail server. It has received 74 messages so far. No obvious memory growth.

@sdgathman
Copy link
Owner

@arpitest Pure python can easily (and often does) have "data cancer", where a data structure grows without limit. That is not technically a "leak" (where the memory become unreachable from running code). But it has the same effect on memory consumption!

@arpitest
Copy link

arpitest commented Apr 9, 2020

Running the milter-template.py under a testctx with 1000 iterations of test2 (a large message), there is no memory growth whatsoever.

our production server processes 200000-500000 emails/day (big university), more than 10GB/day volume, but the mem leak is not more than a few 100 MB/day. so it can be only a few bytes/email leak, not easily visible by testing only 1000 iterations.

@sdgathman
Copy link
Owner

I've run enough messages through the container to bound the memory loss per message to less than 32 bytes (so it is clearly unrelated to self.fp). Since @arpitest is seeing a problem with 200000+ messages, it is possible that a small object is leaking. The obvious candidate is milterContext, which is about the right size. I have counters to track alloc / delete for this object, so I will print the debug counters to see if they drift apart. I had asked the original reporter to do that, but they didn't get to it.

@joaoreis81
Copy link
Author

sorry for the delay, I have not been able to give the necessary attention to this difficulty in recent months and I haven't been able to continue with the tests either. The problem occurs in a high volume of messages, I have implementations with pymilter showing the same problem since 2016. The postfix versions used were from the default packages of centos 7 (postfix 2.10) and late versions (postfix 3.2.x) I will try to reproduce the problem on a lab server and see how I can make access available to you.

@joaoreis81
Copy link
Author

I have already modified the container to integrate with postfix and I am now running some tests against some samples that I can make available. Being able to reproduce the problem in these samples, I put the information of how to run it.

@joaoreis81
Copy link
Author

the test environment is ready and memory growth has been successfully reproduced, it is available at https://github.com/joaoreis81/milter-template. I noticed a few points:

  • samples of different messages, size / charset / mimes are required;
  • it is necessary to leave it running for a period of 12 to 24 hours, depending on the variety of message samples and quantity;
  • it is necessary to send several messages simultaneously (I used 10 processes);

I can provide samples of messages, let me know that I send by email privately.

@joaoreis81
Copy link
Author

joaoreis81 commented May 21, 2020

@arpitest @kitterma @sdgathman
I can successfully reproduce the memory growth problem in my test environment with postfix and pymilter. Some tests were done, but there was no apparent conclusion. Would you have any suggestions for tests so that we can progress in the resolution?

one important thing is, if I comment line "self.fp.write(chunk)" in the body stage the memory growth doesn't occurs.

  def body(self, chunk):
    #self.fp.write(chunk)
    return Milter.CONTINUE

@sdgathman
Copy link
Owner

I haven't forgotten this. Almost done with porting the rest of the pymilter suite to py3 and installing on a live (but non-critical) mail server.

You get quite a lot of growth from multiple threads with simultaneous messages. However, it should stabilize for a given number of threads.

@sdgathman
Copy link
Owner

Testing a py3 upgrade of a critical mail server in test environment. Should I release 1.0.5 without a fix for this (if it is indeed pymilter and not postfix)? I think so.

@branjala
Copy link

branjala commented Jan 14, 2022

Hello,

Did you find what is the root cause of this issue ?

We have the same issue: our performance tests show that the resident memory usage increases linearly during the tests.
We tried to release self.fp resource but it doesn't fix it.
We noticed that there are some memory leaks fixed in libmilter as stated here https://fossies.org/linux/sendmail/RELEASE_NOTES
But we are not able to test it yet.

Thanks in advance.

Our environment:
Centos 7
python-pymilter-common 1.0.3
python2-pymilter 1.0.3

@arpitest
Copy link

Testing a py3 upgrade of a critical mail server in test environment. Should I release 1.0.5 without a fix for this (if it is indeed pymilter and not postfix)? I think so.

imho postfix isn't the cause. i've switched from your pymilter to a pure python milter implementation (using asyncio) and the memleaks gone, it's running for more than a year already and still using less than 200MB, while it was +1GB/day leak with pymilter :(
so either pymilter or libmilter causes it, probably the later.

@sunil-saini
Copy link

Someone able to find the root cause of this issue ?

Wanted to use this in production for ARC implementation using dkimpy

But memory keeps on increasing monotonically indicating leak

Screenshot 2024-01-21 at 1 31 22 AM

Environment is -

AWS Linux AMI2
Python 3.7.16
Milter-0.8.18
pymilter-1.0.5
postfix3-3.5.3-1

Postfix doesn't seems to be the problem because we have other milters running like opendkim/opendmarc and some other golang written ones, are working without such issue

Milter code -

#!/usr/bin/env python3


import sys
from io import BytesIO
from multiprocessing import Process as Thread
from multiprocessing import Queue
from time import time_ns

import Milter

# queue for logging
logq = Queue()

class arcMilter(Milter.Base):
	"Milter to check and sign ARC. Each connection gets its own instance."
	def __init__(self):
		# Integer incremented with each call
		self.id = Milter.uniqueID()
		self.init_time = time_ns()
		self.fp = None

	# each connection runs in its own thread and has its own titanMilter
	# instance. Python code must be thread safe.  This is trivial if only stuff
	# in titanMilter instances is referenced
	@Milter.noreply
	def connect(self, IPname, family, hostaddr):
		self.log("connect from {} at {}".format(IPname, hostaddr))
		try:
			self.H = None
			self.fp = None
		except Exception as e:
			self.log("exception in connect : ", str(e))

		return Milter.CONTINUE

	def hello(self, heloname):
		self.log("helo : ", heloname)
		self.H = heloname
		return Milter.CONTINUE

	# multiple messages can be received on a single connection
	# envfrom (MAIL FROM in the SMTP protocol) mark the start
	# of each message.
	def envfrom(self, mailfrom, *params):
		self.log("mail from : ", mailfrom)
		try:
			if self.init_time is None:
				self.init_time = time_ns()

			self.F = mailfrom.strip('<').strip('>')
			self.R = []
			self.fp = BytesIO()
		except Exception as e:
			self.log("exception in envfrom : ", str(e))

		return Milter.CONTINUE

	@Milter.noreply
	def envrcpt(self, to, *params):
		self.log("rcpt to : ", to)
		try:
			rcpt = to.strip('<').strip('>')
			self.R.append(rcpt)
		except Exception as e:
			self.log("exception in envrcpt : ", str(e))

		return Milter.CONTINUE

	@Milter.noreply
	def header(self, name, val):
		# self.log("header : ", name, val)
		try:
			# add header to buffer
			if self.fp:
				self.fp.write(("%s: %s\n" % (name, val)).encode())
		except Exception as e:
			self.log("exception in header processing : ", str(e))

		return Milter.CONTINUE

	@Milter.noreply
	def eoh(self):
		self.log("eoh")
		# terminate headers
		if self.fp:
			self.fp.write(b"\n")
		return Milter.CONTINUE

	@Milter.noreply
	def body(self, chunk):
		self.log("body")
		try:
			if self.fp:
				self.fp.write(chunk)
		except Exception as e:
			self.log("exception in body processing : " + str(e))

		return Milter.CONTINUE

	def eom(self):
		self.log("eom")
		if not self.fp:
			return Milter.CONTINUE

		# move pointer in beginning of buffer to read it all
		self.fp.seek(0)
		txt = self.fp.read()

		# close buffer now to avoid leak
		self.fp.close()
		self.log("message", txt)

		##
		# add ARC functionality here
		##

		return Milter.CONTINUE

	def close(self):
		end_time = time_ns()
		latency = (end_time-self.init_time) / 1_000_000_000
		self.log("close with duration {}".format(latency))
		# always called, even when abort is called.  Clean up
		# any external resources here
		self.init_time = None

		# close fp if not already closed
		if self.fp and not self.fp.closed:
			self.fp.close()

		return Milter.CONTINUE

	def abort(self):
		self.log("abort")

		# close fp if not already closed
		if self.fp and not self.fp.closed:
			self.fp.close()

		# client disconnected prematurely
		return Milter.CONTINUE

	## === Support Functions ===

	def log(self, *msg):
		logq.put((self.id, msg))


def background():
	while True:
		t = logq.get()
		if not t:
			break
		id, msg = t
		print('[%d] %s' % (id, ' '.join([str(m) for m in msg])))


def main():
	try:
		bt = Thread(target=background)
		bt.start()
		Milter.factory = arcMilter
		Milter.set_flags(Milter.CHGHDRS + Milter.ADDHDRS)
		sys.stdout.flush()
		print("ARC Milter starting")
		Milter.runmilter("arcMilter", "inet:[email protected]", 30)
		logq.put(None)
		bt.join()
		print("ARC Milter stopped")
	except Exception as e:
		print('exception in main')


if __name__ == "__main__":
	main()

Thought somehow io.BytesIO might be creating leaks, so used simple self.fp = b'' and kept appending headers and body to it, but still leak was there

Milter implementation is very straightforward, not extra functionality is still added

Other thing i noticed is that, when larger mails (>1MB) are processed, memory spikes and then never released

@sdgathman
Copy link
Owner

Previous experiments found that commenting out write to self.fp stops the "leak". Looking at your curve (kudos!), it does not look monotonic at all, but logarithmic. Since a large email causes more memory use, I wonder if Cpython simply allows memory to grow more than you'd like before garbage collecting? While not recommended for production, try a gc in close?

Previously, I suspected that maybe the milter_context objects were not getting collected. That would not explain the larger "leak" from large emails (would explain a small amount per email) - because self.fp is explicitly closed. (So it's not a matter of milter_context hanging around to reference self.fp contents.)

@sunil-saini
Copy link

@sdgathman

"monotonically" i meant memory was never released, it was ever increasing only

Yes, i tried doing manual gc.collect() from eom and close both but there seems to be no impact of that

@delmezov
Copy link

delmezov commented May 7, 2024

Hi, instead of self.fp.close() use self.fp = io.BytesIO() in close() and abort() possibly also in eom()

def close(self):
        ...
        self.fp = io.BytesIO()
        ...
	return Milter.CONTINUE

def abort(self):
        ...
        self.fp = io.BytesIO()
        ...
	return Milter.CONTINUE

@sunil-saini
Copy link

Hey @delmezov

using self.fp = io.BytesIO() really fixed the memory leak, thanks

have deployed this 3 days back in production and memory usage is constantly ~1GB only

Screenshot 2024-05-24 at 1 18 29 PM

do need to use this in abort as well ? because abort calls close anyway

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants