Preview: PySpark Remote Configuration

9 min

initializing the kensu (py)spark collector this way of initializing kensu (py)spark collector in advantegeous because using this method to initialize the kensu (py)spark collector offers several advantages seamless integration there's no need to alter the customer's existing spark job code, making the process straightforward and reducing the risk of errors this simplicity results in minimal manual integration effort comprehensive coverage this approach can be applied to all spark jobs, ensuring a complete view of the spark job landscape it addresses the challenge of potentially overlooking the activation of tracking for certain jobs, especially when managing numerous spark jobs dynamic configuration via kensu ui any modifications to the kensu agent can be achieved directly through the integrations tab in the kensu ui this eliminates the need to redeploy the customer's spark job every time there's a change in agent settings installation and configuration download and install the public preview jar the public preview jar is available here for spark 2 4 0 https //public usnek com/n/repository/kensu public/releases/kensu spark collector/alpha/kensu spark collector 1 4 0 alpha231018 0926 20 0241d63 spark 2 4 0 jar please follow the pyspark docid\ ogv7xhhwgtngjizejom5e instructions for the installation a dd mandatory config options to enable remote config (e g via spark submit conf properties or spark defaults conf , etc), the rest of config will be fetched from kensu ui these are required to l oad kensu spark listeners c onfigure kensu host & token p s installation of kensu pyspark / kensu py python library is not required (optional) if you are using this method enable for single job for testing to test a single spark submit job without affecting others, you may provide the conf via conf arguments jar="kensu spark collector 1 4 0 alpha231018 0926 20 0241d63 spark 2 4 0 jar" pyspark python=/home/py35kensu/bin/python3 5 spark submit \\ \ verbose \\ \ conf spark sql queryexecutionlisteners=org apache spark sql kensu kensuzerocodelistener \\ \ conf spark extralisteners=org apache spark sql kensu kensuzerocodelistener \\ \ conf spark kensu agentapihost=https //playground kensuapp com \\ \ conf spark kensu agenttoken=pat token \\ \ jars $jar master local\[2] main py mandatory config options spark sql queryexecutionlisteners and spark extralisteners must be set via spark conf (either spark default, or conf params to spark submit ), while the following mandatory params could passed either via spark conf, or environment variable or java system property description property name env var enable kensu query execution listener spark sql queryexecutionlisteners enable spark listener spark extralisteners kensu api host spark kensu agentapihost ksu agent api host kensu api external application token (pat) spark kensu agenttoken ksu agent agent token verify installation after running the job, go to kensu ui, you should see your application in integrations tab you may fine tune the kensu agent configuration for that application from there, by clicking "configure" by setting application group & token you may control who see's the ingested data from this application parameters configure the default spark agent behaviour, e g if to compute statistics and which ones sharing config for all jobs one way to automatically add config to all spark jobs is via spark defaults conf because this file is automatically loaded by apache spark, so there would be no need to modify the spark submit command for each job e g in cloudera vm i had to modify the /opt/cloudera/parcels/cdh 6 3 0 1 cdh6 3 0 p0 1279813/etc/spark/conf dist/spark defaults conf file spark defaults conf spark sql queryexecutionlisteners org apache spark sql kensu kensuzerocodelistener spark extralisteners org apache spark sql kensu kensuzerocodelistener spark kensu agentapihost https //playground kensuapp com spark kensu agenttoken pat token optional config properties these are optional, but could be used to provide extra info spark property environment variable default description spark kensu application id ksu application id pyspark file /full/path/script name py each application must have a unique and stable application id if not provided one will be infered by parsing spark submit command and trying to extract the pyspark py script name spark kensu process name ksu process name pyspark file /full/path/script name py similar to application id, only it's less important to be unique, as it's just used to display the name, but do not affect the logic spark kensu project name ksu project name spark kensu code location ksu code location spark kensu code version ksu code version current datetime, e g thu jun 01 17 08 00 eest 2023 if not provided explicitly in kensu ui, will use the current datetime example (via environment variables) ksu application id="spark 2 4 0 zerocode 1/main py" ksu process name="spark 2 4 0/main py" ksu project name="some project" ksu code location="some repository info" example (via spark properties) \ conf spark kensu application id="" \ conf spark kensu process name="spark 2 4 0/main py" \\ \ conf spark kensu project name="spark 2 4 0 zerocode" \\ \ conf spark kensu code location=$(pwd) \\

PySpark

Collectors