如何在pyspark中设置spark.sql.files conf

乔哈马

我正在尝试在 Spark 集群上运行 Hail ( https://hail.is/ )。当我尝试创建 HailContext 时,我收到一个错误,声称我必须设置两个配置参数:spark.sql.files.openCostInBytesspark.sql.files.maxPartitionBytes

$ pyspark --jars s3://<bucket_name>/hail-all-spark.jar --conf spark.driver.memory=4g --conf spark.executor.memory=4g 
Python 2.7.13 (default, Jan 31 2018, 00:17:36) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Downloading s3://<bucket_name>/hail-all-spark.jar to /tmp/tmp2718520966373391304/hail/hail-all-spark.jar.
18/03/01 10:19:27 INFO S3NativeFileSystem: Opening 's3://<bucket_name>/hail-all-spark.jar' for reading
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/03/01 10:19:32 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
18/03/01 10:20:06 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.2.1
      /_/

Using Python version 2.7.13 (default, Jan 31 2018 00:17:36)
SparkSession available as 'spark'.
>>> sc.addPyFile('s3://<bucket_name>/hail-python.zip')
>>> from hail import *
>>> hc = HailContext(sc)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<decorator-gen-478>", line 2, in __init__
  File "/mnt/tmp/spark-eebd27bf-b387-4717-9ae5-e94f81438aee/userFiles-fb511f51-35b3-436a-aa5d-d0d84de40851/hail-python.zip/hail/typecheck/check.py", line 245, in _typecheck
  File "/mnt/tmp/spark-eebd27bf-b387-4717-9ae5-e94f81438aee/userFiles-fb511f51-35b3-436a-aa5d-d0d84de40851/hail-python.zip/hail/context.py", line 88, in __init__
  File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
  File "/usr/lib/spark/python/pyspark/sql/utils.py", line 63, in deco
    return f(*a, **kw)
  File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling z:is.hail.HailContext.apply.
: is.hail.utils.HailException: Found problems with SparkContext configuration:
  Invalid config parameter 'spark.sql.files.openCostInBytes': too small. Found 0, require at least 50G
  Invalid config parameter 'spark.sql.files.maxPartitionBytes': too small. Found 0, require at least 50G
    at is.hail.utils.ErrorHandling$class.fatal(ErrorHandling.scala:6)
    at is.hail.utils.package$.fatal(package.scala:27)
    at is.hail.HailContext$.checkSparkConfiguration(HailContext.scala:116)
    at is.hail.HailContext$.apply(HailContext.scala:169)
    at is.hail.HailContext.apply(HailContext.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
    at py4j.Gateway.invoke(Gateway.java:280)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:214)
    at java.lang.Thread.run(Thread.java:748)

我应该如何正确设置这些参数?使用--conf spark.sql.files.openCostInBytes=60g创建一个IllegalArgumentException

$ pyspark --jars s3://<bucket_name>/hail-all-spark.jar --conf spark.driver.memory=4g --conf spark.executor.memory=4g --conf spark.sql.files.openCostInBytes=60g
Python 2.7.13 (default, Jan 31 2018, 00:17:36) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-11)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Downloading s3://<bucket_name>/hail-all-spark.jar to /tmp/tmp4400881534115197439/hail/hail-all-spark.jar.
18/03/01 10:26:32 INFO S3NativeFileSystem: Opening 's3://<bucket_name>/hail-all-spark.jar' for reading
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
18/03/01 10:26:38 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
Traceback (most recent call last):
  File "/usr/lib/spark/python/pyspark/shell.py", line 45, in <module>
    spark = SparkSession.builder\
  File "/usr/lib/spark/python/pyspark/sql/session.py", line 183, in getOrCreate
    session._jsparkSession.sessionState().conf().setConfString(key, value)
  File "/usr/lib/spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
  File "/usr/lib/spark/python/pyspark/sql/utils.py", line 79, in deco
    raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.IllegalArgumentException: u"Error while instantiating 'org.apache.spark.sql.hive.HiveSessionStateBuilder':"
乔哈马

解决办法是设置spark.sql.files.openCostInBytesspark.sql.files.maxPartitionBytes60000000000代替60g'

$ pyspark --conf spark.sql.files.openCostInBytes=60g --conf spark.sql.files.maxPartitionBytes=60g

本文收集自互联网,转载请注明来源。

如有侵权,请联系[email protected] 删除。

编辑于
0

我来说两句

0条评论
登录后参与评论

相关文章

来自分类Dev

如何在 play-framework conf/application.conf 文件中设置 spark 独立主 URL?

来自分类Dev

如何设置pyspark环境会话BLOB的conf

来自分类Dev

如何在AWS Glue中设置多个--conf表参数?

来自分类Dev

如何在dhcpd6.conf中设置默认网关

来自分类Dev

Databricks Spark Conf

来自分类Dev

如何使用自定义spark-defaults.conf设置

来自分类Dev

如何创建.conf文件?

来自分类Dev

如何在 Pyspark spark.sql 数据框中同质化数据

来自分类Dev

播放-执行sql语句,而无需在application.conf中写入数据库配置

来自分类Dev

如何在PySpark中运行.sql文件

来自分类Dev

Spark SQL-如何在查询中设置变量以在整个过程中重复使用?

来自分类Dev

在revel.Controller中,如何在app.conf中设置端口

来自分类Dev

如何在路由器resolv.conf中覆盖comcast的不稳定DNS设置?

来自分类Dev

如何在 19.04 中恢复无人值守升级默认设置和 conf 文件?

来自分类Dev

无法在 spark scala 中读取 conf 文件

来自分类Dev

在Laravel中动态设置conf var

来自分类Dev

在tmux.conf中设置颜色

来自分类Dev

Nginx.conf中的Nginx设置

来自分类Dev

在mongod.conf中设置日记路径?

来自分类Dev

在xorg.conf中设置BUSID

来自分类Dev

如何在SPARK SQL中舍入小数

来自分类Dev

如何在 Play Framework 的 `conf/application.conf` 中查找变量?

来自分类Dev

如何编辑resolv.conf?

来自分类Dev

rsyslog.conf中的unixtimestamp

来自分类Dev

nginx.conf中'='的含义

来自分类Dev

在fluentd中更改conf文件

来自分类Dev

替换.conf文件中的整行

来自分类Dev

rsyslog.conf中#的含义

来自分类Dev

是否可以在mongodb.conf中定义rs.conf()?