Performance issue when using "hdfs setfacl -R"?

We use "hdfs setfacl -R" for file ACL control. As the data directory is big with 60,000+ sub-directories and files, the command is very time-consuming. Seems it can not finish in hours, we can not image this command will cost several days.
Any settings can help improve this?

1 Answer

Try increasing heap size of the client via HADOOP_CLIENT_OPTS. The default is 128M IIRCThis might improve the performance.You can bump it upto 1G.

answer Jan 17, 2018 by Anderson

Similar Questions

0 votes

What happens to a read operation when the file is moved to trash in HDFS?

I have a basic question regarding the HDFS file read. I want to know what happens, when the following steps are followed:

Client opens the file for reading and starts reading the file.
In the meantime, someone deletes the file and file moves to the trash folder

Will Step 1. succeed? I feel, since the client has already opened the file and file still exists in .trash, the client should continue to read the file.

0 votes

What is the meaning of underreplication in HDFS?

0 votes

For the frequently accessed HDFS files the blocks are cached in

0 votes

Which of the following is not permitted on HDFS files

0 votes

How to get info about which data in hdfs or file system that a MapReduce job visits?

I was trying to implement a Hadoop/Spark audit tool, but l met a problem that I can't get the input file location and file name. I can get username, IP address, time, user command, all of these info from hdfs-audit.log. But When I submit a MapReduce job, I can't see input file location neither in Hadoop logs or Hadoop ResourceManager.

Does hadoop have API or log that contains these info through some configuration ?If it have, what should I configure?

Performance issue when using "hdfs setfacl -R"?

Your comment on this post:

1 Answer

Your comment on this answer:

Your answer

Preview