aboutsummaryrefslogtreecommitdiffstats
path: root/lib/prserv
AgeCommit message (Collapse)Author
2017-08-31prserv/serv: Gracefully handle the PR server exiting quicklyRichard Purdie
If the server exits quickly its PID may no longer exist. Handle this gracefully. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2017-08-31prserv/serv: Rename self.quit -> self.quitflagRichard Purdie
self has a quit function and a variable. Separate this to two different things as the current setup is prone to breakage. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2017-08-31prserv/serv: Send sentinel to stop handler threadRichard Purdie
Shutdown from SIGTERM currently has to wait for the handler thread to timeout. Add a sentinel value which triggers it to loop and allows for a quick exit. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2017-08-31prserv/serv: Shut down any existing server before restartingRichard Purdie
This allows for cleaner code in cooker as any existing server is dealt with before a new one is started. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2017-08-31prserv/cooker: Drop unused paramRichard Purdie
Drop pointless unused function parameter. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2017-07-24prserv/serv: Improve process exit handlingRichard Purdie
The server shutdown is currenlty laggy and race prone. This patch: * adds a waitpid so that no zombie server is left around if its not running in daemon mode. * adds a quit "sentinal" using a pipe so that we're not sitting in a socket poll() until timeout in order just to quit. * use a select() call to poll the socket and the pipe for a quit signal. The net result of this change is that the prserv exits with the cooker server and it does so immediately and doesn't wait for the select/poll calls to timeout. This makes bitbake a lot more responsive for startup/shutdown and doesn't cause UI timeout errors as often when prserv is used. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2017-07-21server: Rework the server API so process and xmlrpc servers coexistRichard Purdie
This changes the way bitbake server works quite radically. Now, the server is always a process based server with the option of starting an XMLRPC listener on a specific inferface/port. Behind the scenes this is done with a "bitbake.sock" file alongside the bitbake.lock file. If we can obtain the lock, we know we need to start a server. The server always listens on the socket and UIs can then connect to this. UIs connect by sending a set of three file descriptors over the domain socket, one for sending commands, one for receiving command results and the other for receiving events. These changes meant we can throw away all the horrid server abstraction code, the plugable transport option to bitbake and the code becomes much more readable and debuggable. It also likely removes a ton of ways you could hang the UI/cooker in weird ways due to all the race conditions that existed with previous processes. Changes: * The foreground option for bitbake-server was dropped. Just tail the log if you really want this, the codepaths were complicated enough without adding one for this. * BBSERVER="autodetect" was dropped. The server will autostart and autoconnect in process mode. You have to specify an xmlrpc server address since that can't be autodetected. I can't see a use case for autodetect now. * The transport/servetype option to bitbake was dropped. * A BB_SERVER_TIMEOUT variable is added which allows the server to stay resident for a period of time after the last client disconnects before unloading. This is used if the -T/--idle-timeout option is not passed to bitbake. This change is invasive and may well introduce new issues however I believe the codebase is in a much better position for further development and debugging. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2017-01-06prserv/persist_data/utils: Drop obsolete python2 importsRichard Purdie
These imports were from python 2.6 and earlier, 2.4 in some cases. Drop them since we're all python3 now. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2017-01-05prserv/serv: Tweak stdout manipulation to be stream safeRichard Purdie
We've been seeing oe-selftest failures under puzzling circumstances. It turns out if you run oe-selftest on a machine with xmlrunner installed and have the recent tinfoil2 changes, the launching of PR server crashes leading to selftest hanging if using an autoloaded PR server. The reason is that xmlrunner uses an io.StringIO object as stdout/stderr instead of the usual io.TextIOWrapper and StringIO lacks a fileno() method. We have to deal with both cases and in the python way, we try and then seek forgivness if we see an AttributeError or UnSupportedOperation exception. Unfortunately we have to deal with both cases as we may be performing a traditiional double fork() from the commandline, or a larger python program. [YOCTO #10866] Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2016-11-30bitbake: remove True option to getVar callsJoshua Lock
getVar() now defaults to expanding by default, thus remove the True option from getVar() calls with a regex search and replace. Search made with the following regex: getVar ?\(( ?[^,()]*), True\) Signed-off-by: Joshua Lock <joshua.g.lock@intel.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2016-06-01bitbake: Convert to python 3Richard Purdie
Various misc changes to convert bitbake to python3 which don't warrant separation into separate commits. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2016-05-12daemonize/prserv/tests/fetch: Convert file() -> open()Richard Purdie
Use python3 compatible functions. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2016-02-26prserv: Add dump_db()Haris Okanovic
Returns a script (string) that reconstructs the state of the entire database at the time this function is called. The script language is defined by the backing database engine, which is a function of server configuration. Returns None if the database engine does not support dumping to script or if some other error is encountered in processing. The SQLite3 implementation in db.py calls iterdump() [1] to generate a script. iterdump() is the library equivalent of the `sqlite3 .dump` shell command, and the scripts are compatible. Execute the script in an empty SQLite3 database using the sqlite3 utility to restore a backup of prserv. Use case: Backup a live PR server database in a non-racy way, such that one could snapshot the entire database after a set of bitbake builds all using a shared server. I.e. All changes made prior to the start of a dump_db() operation should be committed and captured in the script. Subsequent changes made during the backup process are not guaranteed to be captured. Testing: ~7MB database backs up in ~1s while PR server is under load from 32 thread bitbake builds on two separate machines. [1] https://docs.python.org/2/library/sqlite3.html#sqlite3.Connection.iterdump Signed-off-by: Haris Okanovic <haris.okanovic@ni.com> Reviewed-by: Ken Sharp <ken.sharp@ni.com> Reviewed-by: Bill Pittman <bill.pittman@ni.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2016-02-04bitbake: prserv: do not clear umask when daemonizingDiego Santa Cruz
Clearing the umask when daemonizing is not the correct thing to do, as it will create files writable by anyone by default. For instance the pid file was being created with mode 777. This could also potentially affect the sqlite database. Better let the calling process decide on the umask. [YOCTO #9036] Signed-off-by: Diego Santa Cruz <Diego.SantaCruz@spinetix.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2016-02-04bitbake: prserv: SIGTERM handling hung processDiego Santa Cruz
The current SIGTERM handler hungs the process instead of making it exit. The problem seems to be that the handler thread is not signaled to quit, so it stays there doing its work, as it is not a daemon thread. Setting the quit variable fixes this. While at it, to not use the SystemExit exception to terminate upon SIGTERM but instead left the quit flag do its job. This way the PID file is properly removed. [YOCTO #9035] Signed-off-by: Diego Santa Cruz <Diego.SantaCruz@spinetix.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2016-02-04bitbake: prserv: -wal and -shm sqlite lost when daemonizingDiego Santa Cruz
When daemonizing the PR service the -wal and -shm sqlite files were being deleted, although they should never be. While the daemonized process keeps the file descriptors open and thus a clean exit does not loose any data, a power outage would loose all data in the WAL. Removing these files also breaks sqlite collaboration between processes and furthermore prevents taking proper backups without stopping the PR service. The reason this happens is that the DB connection is opened in the initial process, before forking for daemonization. When the DB connection is closed by the exiting parent processes it can delete the -wal and -shm files if it considers itself to be the last connection to the DB. This is easily fixed by opening the DB connection after all forking. [YOCTO #9034] Signed-off-by: Diego Santa Cruz <Diego.SantaCruz@spinetix.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2016-01-29bitbake: Set process names to be meaninfulRichard Purdie
This means that when you view the process tree, the processes have meaningful names, aiding debugging: $ pstree -p 30021 bash(30021)───KnottyUI(115579)───Cooker(115590)─┬─PRServ(115592)───{PRServ Handler}(115593) ├─Worker(115630)───bash:sleep(115631)───run.do_sleep.11(115633)───sleep(115634) └─{ProcessEQueue}(115591) $ pstree -p 30021 bash(30021)───KnottyUI(117319)───Cooker(117330)─┬─Cooker(117335) ├─PRServ(117332)───{PRServ Handler}(117333) ├─Parser-1:2(117336) └─{ProcessEQueue}(117331) Applies to parse threads, PR Server, cooker, the workers and execution threads, working within the 16 character limit as best we can. Needed to tweak the bitbake-worker magic values to tell the workers apart. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2015-09-26prserv/serv.py: Better messaging when starting/stopping the server with port=0Leonardo Sandoval
When starting the server using port=0, the server actually starts with a different port, so print a message with this new value. When stopping the server with port=0, advise the user which ports the server is listening to, so next time it tries to close it, user can pick up the correct one. [YOCTO #8560] Signed-off-by: Leonardo Sandoval <leonardo.sandoval.gonzalez@linux.intel.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2015-09-26prserv/serv: Close the DB connection out of class destructorLeonardo Sandoval
When launching the PR server daemon, the PRData __del__ function was being called (no reason found yet) where the DB connection closed, thus following PR updates were not getting into the DB. This patch closes the connection explicitly, not relaying on the __del__ function execution. Closing the connection in turn causes all WAL file transactions to be moved into the database (checkpoint), thus effectively updating the database. [YOCTO #8215] Signed-off-by: Leonardo Sandoval <leonardo.sandoval.gonzalez@linux.intel.com> Signed-off-by: Ross Burton <ross.burton@intel.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2015-09-23prserv/serv: Start/Stop daemon using ip instead of hostLeonardo Sandoval
In cases where hostname is given instead of an IP (i.e. localhost instead of 127.0.0.1) when stopping the server with bitbake-prserv --stop, the server shows a misleading message indicating that the daemon was not found, where it is actually stopped. This patch converts host to IP values before starting/stopping the daemon, so it will always work on IP, not on hostnames, avoiding problems like the latter. [YOCTO #8258] Signed-off-by: Leonardo Sandoval <leonardo.sandoval.gonzalez@linux.intel.com> Signed-off-by: Ross Burton <ross.burton@intel.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2015-09-09prserv: SIGTERM and deamonization fixesRichard Purdie
Worryingly, if you SIGKILL the bitbake cooker, an autostarted PR server will remain behind. It turns out there are a few things we should do: * The PR service doesn't need to daemonize when started from cooker, it just complicated the process lifecycle. Add a fork() method to handle this and use the non-daemon mode for the singleton. * Reset the sigterm and sigint handlers. Bitbake cooker installs its own which we inherit meaning PR server was ignoring SIGTERM. Installing our own handlers which include a sync makes most sense here. Since we're in the code, make it sync the database on SIGINT. * Use the new bb.utils.signal_on_parent_exit() call so that we get a SIGTERM when the parent (usually cooker) exits and we can shutdown too. Alternatives would be having an open pipe or polling os.getppid() for changes but this seems more effective. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2015-07-12prserv/db: Document history modesRichard Purdie
I keep having to dig into the archives to remember this information. Add it as a comment to the file instead. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2015-05-08prserv: serv.py: remove unused and duplicate importsMaxin B. John
Remove unused xmlrpclib, atexit and duplicated threading module imports Signed-off-by: Maxin B. John <maxin.john@enea.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2015-01-21prserv/serv: Improve exit handlingRichard Purdie
Currently, I'm not sure how the prserver managed to shut down cleanly. These issues may explain some of the hangs people have reported. This change: * Ensures the connection acceptance thread monitors self.quit * We wait for the thread to exit before exitting * We sync the database when the thread exits * We do what the comment mentions, timeout after 30s and sync the database if needed. Previously, there was no timeout (the 0.5 applies to sockets, not the Queue object) Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2014-11-06prserv: Use WAL modeRichard Purdie
Ideally, we want the PR service to have minimal influence from queued disk IO. sqlite tends to be paranoid about data loss and locks/fsync calls. There is a "WAL mode" which changes the journalling mechanism and would appear much better suited to our use case. This patch therefore switches the database to use WAL mode. With this change, write overhead appears significantly reduced. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2014-11-04prserv/serv: Ensure sync happens in the correct threadRichard Purdie
The sync/commit calls are happening in the submission thread which can race against the handler. The handler may start new transactions which then causes the submission thread to error with "cannot start a transaction within a transaction". The fix is to move the calls to the correct thread. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2014-10-28prserv: don't wait until exit to syncBen Shelton
In the commit 'prserv: Ensure data is committed', the PR server moved to only committing transactions to the database when the PR server is stopped. This improves performance, but it means that if the machine running the PR server loses power unexpectedly or if the PR server process gets SIGKILL, the uncommitted package revision data is lost. To fix this issue, sync the database periodically, once per 30 seconds by default, if it has been marked as dirty. To be safe, continue to sync the database at exit regardless of its status. Signed-off-by: Ben Shelton <ben.shelton@ni.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2014-09-29prserv/serv: Improve error message when prserver cannot bind to supplied ↵Konrad Scherer
host address If localhost resolves to a remote address (due to a misconfigured network), starting the pr server will fail without useful information. To reproduce, add '<bogus ip> localhost' to /etc/hosts and run 'bitbake -p'. The error message will be: ERROR: Timeout while attempting to communicate with bitbake server ERROR: Could not connect to server False: Running 'bitbake-prserv --host=localhost --port=0 --start' will fail with: error: [Errno 99] Cannot assign requested address Since these errors does not show the IP address of the attempted socket binding, this results in a lot of wasted time looking at firewall rules, etc. This patch results in the following error message if the socket binding fails: PR Server unable to bind to <bogus ip>:0 Signed-off-by: Konrad Scherer <Konrad.Scherer@windriver.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2014-07-03lib: fix no newline at end of fileRobert Yang
Add a '\n' to the last line of the file to fix: No newline at end of file Signed-off-by: Robert Yang <liezhi.yang@windriver.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2014-05-03prserv/db: Avoid fsync() callsRichard Purdie
If the power were to fail, it doesn't matter to us much if the data makes it to disk or not, we'd have other problems. However an fsync() call on a multi build autobuilder is painful so lets avoid them. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2014-03-28prserv: Fix exit race issuesRichard Purdie
We shouldn't immediately remove the pid file when stopping the server, if we do, this causes a traceback within the server itself which can then hang. Fix this by removing the stale pid file as the last thing we do. Also: * don't printing a new "waiting" line every 0.5 seconds. * make the loop more granular since the user can 'feel' the 0.5 seconds [YOCTO #5984] Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2013-11-18serv.py: Give pr-server up to 5 seconds to commit dataKonrad Scherer
The default value of 0.5 seconds before sending the pr-server a SIGTERM is not enough to guarantee that sqlite has committed all the pr data to the database. By polling the pid to see if it is still running, this allows the pr-server process to shutdown cleanly and finish the final pr data commit. Signed-off-by: Konrad Scherer <Konrad.Scherer@windriver.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2013-09-08prserv: Ensure data is committedRichard Purdie
In exclusive mode, we need to complete the transaction for writes to make it to the database. Therefore add sync calls to ensure this happens. Autocommit mode is significantly (100 times) slower so caching the data is of significant benefit. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2013-09-01prserv/serv: Settle on two threads for optimal performanceRichard Purdie
Using the threading mixin class resulted in large amounts of memory being used by the PR server for no good reason. Using a receiver thread and a thread to do the actual database operations on a single connection gives the same performance with a much saner memory overhead so switch to this. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2013-09-01serv/db: Don't use BEGIN/COMMITRichard Purdie
Since we don't support using multiple servers on the same database file, don't use the BEGIN/COMMIT syntax and allow writes to the database to work ~100 times faster with no transaction locking. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2013-09-01serv/db: Take an excluside lock on the databaseRichard Purdie
We only support one server using the database at a time so take an exclusive lock and avoid later lock overhead. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2013-09-01serv/db: Fix looping upon database locked issuesRichard Purdie
If the database is locked we will get an immediate error indicating so, there is no retry timeout. The looping code is therefore useless, the loop count is near instantly exceeded. Using a time based retry means we can wait a sensible time, then gracefully exit. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2013-08-30prserv: Allow 'table is locked' matching for retry loopRichard Purdie
Try and avoid errors like "ERROR: database table is locked: PRMAIN_nohist" by retrying if we see the string "is locked". Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2013-08-29prserv/serv: Multithread the serverRichard Purdie
This makes the PR server multithreaded and able to handle multiple connections at once which means its no longer a build bottle neck when serving one connection at a time. I've experimented and database connection for each thread seems to cause the least issues, pushing the contention for sqllite to handle itself. This means moving the db/table connection code into the actual function methods. It doesn't abstract well as a function since we need the db object around for the lifetime of the function as well as the table else we lose the connection. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2013-08-29prserv/db: Threading fixesRichard Purdie
Enabling threading for the PRServer causes a number of issues. Firstly is the obtuse error: sqlite3.InterfaceError: Error binding parameter 0 - probably unsupported type which is due to the class not being derived from object. See: http://docs.python.org/2/library/sqlite3.html#registering-an-adapter-callable Secondly, we want to enable multithreadded access to the database so we do this when we open it. This opens the way up to multithreading the PR server. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2013-08-28serv.py: Fix regression from 972bc43e6d5bJason Wessel
commit 972bc43e6d5b1207b944b3baa8f9805adb35dda7 (serv.py: Fix hang when spawned dynamically with bitbake) introduced a regression, because the wrong patch was submitted. The syntax was incorrect in the original patch. The logger iterator must be used with a call to getLogger(). [YOCTO #5059] Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2013-08-28serv.py: Fix hang when spawned dynamically with bitbakeJason Wessel
The PRServer has the possibility to hang indefinitely blocking on a semaphore processing a xmlrpc request to send an event back to the main bitbake instance. This was observed during a "bitbake -e" on a heavily loaded machine and the main bitbake instance and cooker exited before the PRServer emitted its first log. The stack trace is provided below as to show what happens every time a logger.info() is executed in the PRServer. Not only does it write to the stream handler but it also tries to send the event to the main event processor. self._notempty.acquire() self.queue.put(event) _ui_handlers[h].event.send(event) fire_ui_handlers(event, d) fire(record, None) self.emit(record) hdlr.handle(record) self.callHandlers(record) self.handle(record) self._log(INFO, msg, args, **kwargs) (self.dbfile, self.host, self.port, str(os.getpid()))) self.work_forever() pid = self.daemonize() self.prserv.start() singleton.start() self.prhost = prserv.serv.auto_start(self.data) cooker.pre_serve() bb.cooker.server_main(self.cooker, self.main) self.run() code = process_obj._bootstrap() self._popen = Popen(self) self.serverImpl.start() server.detach() server = start_server(servermodule, configParams, configuration) ret = main() It was never intended for the PRServer to send its logs anywhere but its own log file. The event processing is an artifact of how the PRServer was forked and it inherits the event log handlers. The simple fix is to clean up and purge all the log handlers after the fork() but before doing any of the typical PRServer work or logging. Signed-off-by: Jason Wessel <jason.wessel@windriver.com> Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2013-08-26server/xmlrpc/prserv: Add sane timeout to default xmlrpc serverRichard Purdie
The standard python socket connect has long timouts which make sense for remote connections but not local things like the PR Service. This adds a timeout parameter to the common xmlrpc server creation function and sets it to a more reasonable 5 seconds. Making the PR server instantly exit is a good way to test the effect of this on bitbake. We can remove the bodged timeout in the PRServer terminate function which has the side effect of affecting global scope. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2013-08-23prserv/serv: Fix pid file removalRichard Purdie
Mark Hatle spotted there were pid files being left around. This patch fixes things so the removal function is called correctly, the code contained a typo. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2013-06-12prserv: Adapt autostart to bitbake-workerRichard Purdie
With the change to bitbake-worker we need to ensure the workers know how to contact the PR service, the magic 0 port and singleton is no longer enough. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2013-05-30prserv: Unbreak after bb.server changesRichard Purdie
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2013-05-09prserv: Drop StandardError usageRichard Purdie
StandardError doesn't exist in python 3, use Exception instead. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2013-05-03prserv/cooker: Handle PRService errors cleanlyRichard Purdie
Current if the PR Service fails to start, bitbake carries on regardless or hangs with no error message. This adds an exception and then handles it correctly so the UIs correctly handle the error and exit cleanly. [YOCTO #4010] Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2013-02-06prserv/serv.py: Fix logging in daemon modeRichard Purdie
In deamon mode we need to ensure the logging module is sending log data to the log file. These changes ensure this happens correctly. Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
2013-02-06bitbake: Always use separate process for PR ServiceRichard Purdie
Using the threading module interacts badly with multiprocessing used elsewhere in bitbake under certain machine loads. This was leading to bitbake hanging on Ctrl+C when the PR Server was being used. This patch converts it to always use the daemonize code which then means the threading code isn't required. [YOCTO #3742] Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>