[ADAP-873] [Regression] `1.6`does not work with `method: thrift` due to `pyhive`'s lack of `Cursor.fetchmany()` method #885

dataders · 2023-09-06T20:06:09Z

Is this a regression in a recent version of dbt-spark?

I believe this is a regression in dbt-spark functionality
I have searched the existing issues, and I could not find an existing issue for this regression

Current Behavior

reports & discussion

@sid-deshmukh originally opened dbt-labs/dbt-external-tables#234, but I believe this issue to be with dbt-spark, not dbt-external-tables.

@timvw and @jelstongreen also reported in a #db-databricks-and-spark thread in Community Slack there were experiencing similar issues

for reference, here's our internal dbt Labs Slack thread

stacktrace

compiling fails with the following stacktrace. dbt calls .get_result_from_cursor() which calls cursor.fetchall() which in PyHive is passed to it's Cursor._fetch_more() (pyhive/hive.py#L507), where it fails.

columns = [_unwrap_column(col, col_schema[1]) for col, col_schema in
           zip(response.results.columns, schema)]

full stacktrace

  File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/dbt/clients/jinja.py", line 302, in exception_handler
    yield
  File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/dbt/clients/jinja.py", line 257, in call_macro
    return macro(*args, **kwargs)
  File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/jinja2/runtime.py", line 763, in __call__
    return self._invoke(arguments, autoescape)
  File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/jinja2/runtime.py", line 777, in _invoke
    rv = self._func(*arguments)
  File "<template>", line 52, in macro
  File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/jinja2/sandbox.py", line 393, in call
    return __context.call(__obj, *args, **kwargs)
  File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/jinja2/runtime.py", line 298, in call
    return __obj(*args, **kwargs)
  File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/dbt/adapters/base/impl.py", line 290, in execute
    return self.connections.execute(sql=sql, auto_begin=auto_begin, fetch=fetch, limit=limit)
  File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/dbt/adapters/sql/connections.py", line 149, in execute
    table = self.get_result_from_cursor(cursor, limit)
  File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/dbt/adapters/sql/connections.py", line 129, in get_result_from_cursor
    rows = cursor.fetchall()
  File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/dbt/adapters/spark/connections.py", line 197, in fetchall
    return self._cursor.fetchall()
  File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/pyhive/common.py", line 137, in fetchall
    return list(iter(self.fetchone, None))
  File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/pyhive/common.py", line 106, in fetchone
    self._fetch_while(lambda: not self._data and self._state !=
                      self._STATE_FINISHED)
  File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/pyhive/common.py", line 46, in _fetch_while
    self._fetch_more()
  File "/Users/user/PycharmProjects/dbt-data-pipeline/venv/lib/python3.8/site-packages/pyhive/hive.py", line 481, in _fetch_more
    zip(response.results.columns, schema)]
TypeError: 'NoneType' object is not iterable

Expected/Previous Behavior

things work (ostensibly because pyhive's cursor.fetch() does not invoke ._fetchmore() like .fetchmany() does

Steps To Reproduce

dbt-spark 1.6.0
using method: thrift
doing any sort of jinja compilation (which is almost anything)

Relevant log output

No response

Environment

- OS:
- Python:
- dbt-core (working version):
- dbt-spark (working version):
- dbt-core (regression version):
- dbt-spark (regression version):

Additional Context

this problem ever happening again could be solved by dbt-labs/dbt-core#8471

The text was updated successfully, but these errors were encountered:

timvw · 2023-09-19T10:02:17Z

I have seen this happen with sparksession as well when using the "show" command...

lmarcondes · 2023-11-02T01:31:51Z

Not sure if there's still interest on this, but looking into the PyHive code it doesn't seem to handle queries with empty result sets correctly. I've forked and issued a PR here but it seems the library's been pretty much unsupported for a few years now

With the changes Jinja is able to compile and results are correctly received

❯ dbt run-operation stage_external_sources --log-level debug --print
01:23:03  Running with dbt=1.6.7
01:23:03  running dbt with arguments {'printer_width': '80', 'indirect_selection': 'eager', 'write_json': 'True', 'log_cache_events': 'False', 'partial_parse': 'True', 'cache_selected_only': 'False', 'profiles_dir': '/home/lmarcondes/.dbt', 'fail_fast': 'True', 'warn_error': 'True', 'log_path': '/home/lmarcondes/Documents/projects/votacao-2022/src/capivara-etl-models/capivara/logs', 'debug': 'False', 'version_check': 'True', 'use_colors': 'True', 'use_experimental_parser': 'False', 'no_print': 'None', 'quiet': 'False', 'log_format': 'default', 'static_parser': 'True', 'warn_error_options': 'WarnErrorOptions(include=[], exclude=[])', 'introspect': 'True', 'target_path': 'None', 'invocation_command': 'dbt run-operation stage_external_sources --log-level debug --print', 'send_anonymous_usage_stats': 'False'}
01:23:03  Registered adapter: spark=1.6.0
01:23:03  checksum: a051d2bc88277f3be74306f0393e0e8e6f29724fe11a36c13ebfccd4b87560d8, vars: {}, profile: , target: , version: 1.6.7
01:23:03  Partial parsing enabled: 0 files deleted, 0 files added, 0 files changed.
01:23:03  Partial parsing enabled, no changes found, skipping parsing
01:23:03  Found 1 model, 5 sources, 0 exposures, 0 metrics, 557 macros, 0 groups, 0 semantic models
01:23:03  Acquiring new spark connection 'macro_stage_external_sources'
01:23:03  Spark adapter: NotImplemented: add_begin_query
01:23:03  Spark adapter: NotImplemented: commit
01:23:03  1 of 5 START external source default.caged_for
01:23:03  On "macro_stage_external_sources": cache miss for schema ".default", this is inefficient
01:23:03  Using spark connection "macro_stage_external_sources"
01:23:03  On macro_stage_external_sources: /* {"app": "dbt", "dbt_version": "1.6.7", "profile_name": "capivara", "target_name": "local", "connection_name": "macro_stage_external_sources"} */
show table extended in default like '*'
  
01:23:03  Opening a new connection, currently in state init
01:23:03  Spark adapter: Poll status: 2, query complete
01:23:03  SQL status: OK in 0.0 seconds
01:23:03  While listing relations in database=, schema=default, found: caged_exc, caged_for, caged_mov, links_2o_turno
01:23:03  1 of 5 (1) refresh table default.caged_for
01:23:03  Using spark connection "macro_stage_external_sources"
01:23:03  On macro_stage_external_sources: /* {"app": "dbt", "dbt_version": "1.6.7", "profile_name": "capivara", "target_name": "local", "connection_name": "macro_stage_external_sources"} */

                 
        refresh table default.caged_for
    
            
01:23:08  Spark adapter: Poll status: 1, sleeping
01:23:13  Spark adapter: Poll status: 1, sleeping
01:23:18  Spark adapter: Poll status: 1, sleeping
01:23:23  Spark adapter: Poll status: 1, sleeping
01:23:28  Spark adapter: Poll status: 1, sleeping
01:23:33  Spark adapter: Poll status: 1, sleeping
01:23:38  Spark adapter: Poll status: 1, sleeping
01:23:43  Spark adapter: Poll status: 1, sleeping
01:23:48  Spark adapter: Poll status: 1, sleeping
01:23:53  Spark adapter: Poll status: 1, sleeping
01:23:58  Spark adapter: Poll status: 1, sleeping
01:24:03  Spark adapter: Poll status: 1, sleeping
01:24:08  Spark adapter: Poll status: 1, sleeping
01:24:12  Spark adapter: Poll status: 2, query complete
01:24:12  SQL status: OK in 69.0 seconds
01:24:12  1 of 5 (1) OK
01:24:12  2 of 5 START external source default.caged_mov
01:24:12  2 of 5 (1) refresh table default.caged_mov
01:24:12  Using spark connection "macro_stage_external_sources"
01:24:12  On macro_stage_external_sources: /* {"app": "dbt", "dbt_version": "1.6.7", "profile_name": "capivara", "target_name": "local", "connection_name": "macro_stage_external_sources"} */

github-actions · 2024-04-30T01:44:16Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions · 2024-05-07T01:45:00Z

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

dataders added bug Something isn't working triage regression labels Sep 6, 2023

github-actions bot changed the title ~~[Regression] 1.6does not work with method: thrift due to pyhive's lack of Cursor.fetchmany() method~~ [ADAP-873] [Regression] 1.6does not work with method: thrift due to pyhive's lack of Cursor.fetchmany() method Sep 6, 2023

dataders mentioned this issue Sep 6, 2023

Doesn't work with spark 3.4.0 dbt-labs/dbt-external-tables#234

Closed

dataders removed the triage label Sep 7, 2023

github-actions bot added the Stale label Apr 30, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ADAP-873] [Regression] `1.6`does not work with `method: thrift` due to `pyhive`'s lack of `Cursor.fetchmany()` method #885

[ADAP-873] [Regression] `1.6`does not work with `method: thrift` due to `pyhive`'s lack of `Cursor.fetchmany()` method #885

dataders commented Sep 6, 2023 •

edited

Loading

timvw commented Sep 19, 2023

lmarcondes commented Nov 2, 2023

github-actions bot commented Apr 30, 2024

github-actions bot commented May 7, 2024

[ADAP-873] [Regression] 1.6does not work with method: thrift due to pyhive's lack of Cursor.fetchmany() method #885

[ADAP-873] [Regression] 1.6does not work with method: thrift due to pyhive's lack of Cursor.fetchmany() method #885

Comments

dataders commented Sep 6, 2023 • edited Loading

Is this a regression in a recent version of dbt-spark?

Current Behavior

reports & discussion

stacktrace

Expected/Previous Behavior

Steps To Reproduce

Relevant log output

Environment

Additional Context

timvw commented Sep 19, 2023

lmarcondes commented Nov 2, 2023

github-actions bot commented Apr 30, 2024

github-actions bot commented May 7, 2024

[ADAP-873] [Regression] `1.6`does not work with `method: thrift` due to `pyhive`'s lack of `Cursor.fetchmany()` method #885

[ADAP-873] [Regression] `1.6`does not work with `method: thrift` due to `pyhive`'s lack of `Cursor.fetchmany()` method #885

dataders commented Sep 6, 2023 •

edited

Loading