Installing and Configuring the Hive Client for PXF

hive download a file

Download and unpack the file. There are five files inside it. We'll use the comedies file today. A download manager is recommended for downloading multiple files. Microsoft Download Manager. Manage all your internet downloads with this. or use this hive -e 'select * from your_Table' | sed 's/[\t]/,/g' > /home/yourfile.csv. You can also specify property set hive.cli.print.header=true.

Hive download a file - sorry

Usage

DB-API

frompyhiveimportpresto# or import hivecursor=presto.connect('localhost').cursor()cursor.execute('SELECT * FROM my_awesome_data LIMIT 10')printcursor.fetchone()printcursor.fetchall()

DB-API (asynchronous)

frompyhiveimporthivefromTCLIService.ttypesimportTOperationStatecursor=hive.connect('localhost').cursor()cursor.execute('SELECT * FROM my_awesome_data LIMIT 10',async=True)status=cursor.poll().operationStatewhilestatusin(TOperationState.INITIALIZED_STATE,TOperationState.RUNNING_STATE):logs=cursor.fetch_logs()formessageinlogs:printmessage# If needed, an asynchronous query can be cancelled at any time with:# cursor.cancel()status=cursor.poll().operationStateprintcursor.fetchall()

In Python 3.7 async became a keyword; you can use async_ instead:

cursor.execute('SELECT * FROM my_awesome_data LIMIT 10',async_=True)

SQLAlchemy

First install this package to register it with SQLAlchemy (see ).

fromsqlalchemyimport*fromsqlalchemy.engineimportcreate_enginefromsqlalchemy.schemaimport*# Prestoengine=create_engine('presto://localhost:8080/hive/default')# Hiveengine=create_engine('hive://localhost:10000/default')logs=Table('my_awesome_data',MetaData(bind=engine),autoload=True)printselect([func.count('*')],from_obj=logs).scalar()

Note: query generation functionality is not exhaustive or fully tested, but there should be no problem with raw SQL.

Passing session configuration

# DB-APIhive.connect('localhost',configuration={'hive.exec.reducers.max':'123'})presto.connect('localhost',session_props={'query_max_run_time':'1234m'})# SQLAlchemycreate_engine('presto://user@host:443/hive',connect_args={'protocol':'https','session_props':{'query_max_run_time':'1234m'}})create_engine('hive://user@host:10000/database',connect_args={'configuration':{'hive.exec.reducers.max':'123'}},)# SQLAlchemy with LDAPcreate_engine('hive://user:password@host:10000/database',connect_args={'auth':'LDAP'},)

Testing

Run the following in an environment with Hive/Presto:

./scripts/make_test_tables.sh virtualenv --no-site-packages env source env/bin/activate pip install -e . pip install -r dev_requirements.txt py.test

WARNING: This drops/creates tables named , , and , plus a database called .

Updating TCLIService

The TCLIService module is autogenerated using a file. To update it, the file can be used: . When left blank, the version for Hive 2.3 will be downloaded.

Источник: [https://torrent-igruha.org/3551-portal.html]

Hive download a file